Evaluating Recommender Systems - Week 4

Design Evaluation

  • Match evaluation to the question, problem or challenge
  • Which metric matches to what you actually care about?

Example: News Recommendation

  • Are we recommending useful articles?
  • Are there ways to improve the recommender?
    • Diversity, serendipity, familiarity, perceived personalisation, balkanization?
      • Balkanization: Are we dividing people into groups that lose common ground?
  • You can’t design a good recommender without understanding the domain

Example: Medical Recommender

  • What can we do without a live system?
  • You might find that you recommend the best doctors to everyone - but the doctors have a limited number of patients they can meet.
    • “Common problem in the dating space where some users are consistently highly ranked. We can’t tell everyone to go date Tom Hanks because he doesn’t really have the time.”
    • How to match people where they are, not just whom they’d like.
    • How does this change the relevant metrics for evaluation?

Example: Music Recommendation

  • Not the same problem - everyone can listen to the same song
  • With music - people oftentimes want to listen to things they’ve listened to previously, which is different from many other domains.
    • At the same time, you probably don’t want one song on repeat.
  • What is the relative cost of a bad recommendation? A bad song recommendation matters much less than a bad doctor recommendation.
  • It starts with a deep understanding of the domain in order to find the right metric to answer the question you have.

Comments