Evaluating Recommender Systems - Week 4

Design Evaluation

Are we recommending useful articles?
Are there ways to improve the recommender?
- Diversity, serendipity, familiarity, perceived personalisation, balkanization?
  - Balkanization: Are we dividing people into groups that lose common ground?
You can’t design a good recommender without understanding the domain

What can we do without a live system?
You might find that you recommend the best doctors to everyone - but the doctors have a limited number of patients they can meet.
- “Common problem in the dating space where some users are consistently highly ranked. We can’t tell everyone to go date Tom Hanks because he doesn’t really have the time.”
- How to match people where they are, not just whom they’d like.
- How does this change the relevant metrics for evaluation?

Not the same problem - everyone can listen to the same song
With music - people oftentimes want to listen to things they’ve listened to previously, which is different from many other domains.
- At the same time, you probably don’t want one song on repeat.
What is the relative cost of a bad recommendation? A bad song recommendation matters much less than a bad doctor recommendation.
It starts with a deep understanding of the domain in order to find the right metric to answer the question you have.