Evaluating Recommender Systems - Week 4
28 Mar 2020
Design Evaluation
- Match evaluation to the question, problem or challenge
- Which metric matches to what you actually care about?
Example: News Recommendation
- Are we recommending useful articles?
- Are there ways to improve the recommender?
- Diversity, serendipity, familiarity, perceived personalisation, balkanization?
- Balkanization: Are we dividing people into groups that lose common ground?
- You can’t design a good recommender without understanding the domain
Example: Medical Recommender
- What can we do without a live system?
- You might find that you recommend the best doctors to everyone - but the doctors have a limited number of patients they can meet.
- “Common problem in the dating space where some users are consistently highly ranked. We can’t tell everyone to go date Tom Hanks because he doesn’t really have the time.”
- How to match people where they are, not just whom they’d like.
- How does this change the relevant metrics for evaluation?
Example: Music Recommendation
- Not the same problem - everyone can listen to the same song
- With music - people oftentimes want to listen to things they’ve listened to previously, which is different from many other domains.
- At the same time, you probably don’t want one song on repeat.
- What is the relative cost of a bad recommendation? A bad song recommendation matters much less than a bad doctor recommendation.
- It starts with a deep understanding of the domain in order to find the right metric to answer the question you have.
Comments