Evaluating Recommendations

Putting some thoughts down in words of trying to wrap my head around how to measure if a recommendation is good or bad, and how to formulate the problem in a way so that models can be trained.

As food for my thoughts, I’m reading through Common pitfalls in training and evaluating recommender systems, from where the block quotes are taken, which kinda’ leads me of into a different direction and makes the post something of a stream of consciousness.

More …

Downloading Kaggle Datasets to SageMaker

  • Generate API key in Kaggle UI and upload to your root folder via web interface.
  • Agree to competition in Kaggle UI if you haven’t already.
  • Start a notebook. Execute following commands in a notebook cell replacing the competition name with the competition you are interested in;
%%bash
pip install kaggle

# Move API key to where Kaggle expects it
mv /home/ec2-user/SageMaker/kaggle.json /home/ec2-user/.kaggle

# Download datasets, optionally specify destination folder using --path
kaggle competitions download -c planet-understanding-the-amazon-from-space
  • Realise that the instance you started doesn’t have enough space for the datasets. Oh, crap. Anyhow, if you were forward thinking than me then you should be good to go.

Food Recommendations at Delivery Hero

Notes from an excellent talk by Gugulethu Ncube from Delivery Hero at the Berlin RecSys Meetup tonight on how they work with food recommendation. Rephrased in my words and with my thoughts interspersed, so anything crazy-sounding should with all likelihood be attributed to me.

Goals for recommendations:

  • Provide users with new restaurants to order from as ordering from multiple restaurants makes customers more loyal

Collaborative vs. Content-based:

“Collaborative filtering produces very strange results”

  • Seems to end up giving quite uninteresting recommendations, such as very popular chains.
  • Reason behind is the extreme sparsity of data, and the distribution of data that is very tied to geographical location of restaurants.
  • This also explains why chains ends up in top, as they have so much more data from their different locations.

FoodRank - Content based

In a nutshell; ranking how good restaurants are at your favourite food.

More …

FastAI - Notes - Deep Learning week 1-2

Some interesting tricks not mentioned in the Deep Learning.ai course (based on recent papers), and apparently at time of recording at least only implemented in the fastAI library that sits on top of PyTorch.

More …