Matrix Factorization and Advanced Techniques - Week 6

Interview with Francesco Ricci

Context is in general any kind of information or condition that might influence the users perception of an item.

Examples of factors;

  • Distance
  • Temperature
  • Weather
  • Season
  • Weekday
  • Time of day
  • Crowdedness
  • Companion
  • Mood
  • Sleepiness

…und so weiter…

Adding context can be for example adding additional dimensions to the classic 2-dimensional user-item model. Ratings can be grouped by context for examples, meaning less data for each condition, but hopefully more relevant data.

Following on this, you can look at having a similarity measurement between contexts to determine which data can be shared.

Paradigms for incorporating context

  • Contextual pre-filtering
    • Limit the data for the model based on context (as mentioned above), train one 2D model per context
  • Contextual post-filtering
    • Train the model on all data, filter out items post-recommendation based on context
  • Contextual modelling
    • True multidimensional modeling using tensor factorization instead of matrix factorization.

A lot of applications for contextual recommendations comes from mobile applications, as the device has many sensors and follows the user throughout their day.

Context is used automatically by our minds in order to decode ambiguous messages (example by Kahneman on reading hand writing either as B or 13 depending on surrounding text).

The music you’ve listened to recently will influence the music you want to hear next.

Interview with Bamshad Mobasher

  • Context can be represented as an explicit enumerated set of static attributes (i.e. “extensional”)
    • E.g. time, date, location etc…
    • Implications:
      • Relevant contextual variables need to be identified at the design stage
      • Must identify the contextual information before making recommendations
  • Context can alternatively be seen as interactional, and that the user moves through a contextual journey towards some type of goal
    • In that sense, it might not be observed outside of the activity, but can be a latent variable inferred by the users current interactions
    • Explicit representation here becomes less important as
      • recognizing behavior arising from the context
      • Adapting to the needs of the user within the context
      • Recognizing and adapting to context transitions
  • Context adaption can also happen in interactive recommendations, for example in a dialogue. In this scenario, reinforcement learning can be relevant.

Interview with Anmol Bhasin (LinkedIn)

Recommends people, jobs, news articles, groups etc… Thus, they run a recommendation platform that can be used for multiple use-cases. A lot of the infrastructure and features are shared, or built for reuse. It’s consumed by the rest of the platform via APIs. Built into this is also A/B testing functionality.

Hybrid recommenders are the norm. Multiple recommender systems of different families (content-filtering, collaborative, popularity/trending, social) generate candidates, which then are ranked by an optimization algorithm that predicts a conversion event. Baked into the final selection are also other constraints and considerations, such as keeping people on the site vs links leading to external sites.

The magic is in the features

The algorithms are pretty much of the shelf. What makes the results great are having good data and features as input.

Oftentimes, separate models might be created for various segments, for example based on geographical areas. For jobs for example, for students the current location might not be very important, for a doctor on the other hand, they are unlikely to move to a different country for a job (especially if their degree is not valid there).

This can be done with decision trees, with linear/logistic regression models fitted for each of the leaves.

Models are evaluated both offline and online. A/B testing helps ensure that results are real. For multiple tests running in parallel, using different hash seeds for test/control group assignment between each test they try to ensure the tests are orthogonal to each other, so they can isolate the effect of each test. “Lazy orthogonal multivariate testing”.

Similarly, they account for the novelty effect. Thus, they ensure each test runs long enough for the novelty to wear off.

For social recommendations, i.e. where users are connected in graphs and usually heavily clustered, randomized testing becomes more complex, as something that might be highly relevant in one cluster won’t be relevant to a randomly drawn user from another cluster.

Power analysis is used to help decide how big and long tests must be. It’s determined by the effect size (delta), variance of metric and the sample size. In order to have enough power to detect an uplift of 1.5% in a day, you will need X thousand members for pageviews/members.

Interview with Paul Lamere (Spotify)

Recommending music is special in that people generally listen to the same music over and over, which is not the same for many other types of recommendations. Spotify works with helping people discovering new music as well as making consuming their favorite music conveniently.

Users fall in two types. Hardcore music fans looking for new music and that are picky vs. the casual listener who mostly treat Spotify as a radio station. For the former, the discover weekly feature tries to present a playlist with new songs they might like. For the latter, a lot of radio features tries to play the music the person already likes, but every now and then also mix in something new to introduce some novelty.

Terrestrial radio has a lot of experience playing music to listeners. Their experience is that repeating music is highly effective in keeping users.

Other parts of music recommendations are audio analysis in order to ensure smooth transitions between songs (gradual change in tempo or genre for example).

For Spotify, the great amount of playlists created by users gives them a great data set to use to understand how to generate new playlists.

Using word2vec like embedding algorithms works great on playlists to generate song embeddings, that then can be utilized in many different ways, such as clustering of songs or generating smooth transitions.

Music listening is also highly contextual, so identifying context and recommending suitable music is one big focus.

Our phones are becoming these perfect context-identification machines. They know when you’re in the car, they know when you’re running, they know where you are, they know who your friends are, they know your calendar. It’s actually kinda’ scary when you think about how much your phone knows about you.

Comments