Since week 3 were only a very short assignment, no notes. Week 4 deals with hybrid recommender systems.
In practice, the strength of different algorithms are oftentimes combined.
Example: Popularity
- Collaborative filters trained on ratings predicts whether a user would like a move if the saw it.
- Overall popularity though may be a better predictor for whether they will watch it or not.
- Blending overall popularity with collaborative filtering might produce a good list of movies that user both will watch, and like.
Means of Hybridization
Most Common:
- Combine item scores
- Combine item ranks
- Integrated models
Many others possible:
- Conditionally switch algorithms
- Deep integration (e.g. putting content-based computations inside a collaborative filter)
Combining Item Scores
Simple and common: linear blends.
…doesn’t have to be static variables either but can be functions;
This is called feature-weighted linear stacking. A common example for this is a bayesian approach where general popularity is having a high impact for users with not much rating data available, but a lower weight for users we know much about.
Combining Item Ranks
Combine based on order instead of score.
- Mix ranked output
- Convert ranks to positions/percentiles, treat as scores
Integrating Models
Several matrix factorization techniques use both rating and content data.
- SVD-Feature
- Factorization Machines
- GPMF
Common pattern: assign latent feature vectors to item characteristics (e.g. genre, author).
Hybrids with Robin Burke
Weighted Hybrid
The final output is a weighted combination of the output of multiple systems.
Switching Hybrids
Similar to the weighted stacking mentioned above, a solution where one system dominates depending on the context can be used, i.e. feature-weighted linear stacking.
Cascaded Recommender Combinations
A first system might just generate broad buckets of better and worse items with no internal ranking in the buckets. These buckets can then be handled to a second recommender to generate the more fine-grained rankings for the items.
This sounds like the common problem of isolating the problem space - first generate a much smaller list of candidate items from the global space, and then only do the heavy lifting score calculations on this smaller list.
Another possibility is that for one item, one system finds out which other items might be related to this item, and those items are then scored by a system focused on whether you are likely to be interested in the items or not.
Feature-Combination Hybrids
You can generate pseudo-users or items for use in collaborative filtering. For example for one author, you can generate a pseudo-user that likes all his books. When the author releases a new book, the users who have ended up similar to the pseudo user can then get the new book recommended, as the pseudo user automatically added that book as a ‘liked’ book. More general, this can be used to insert genres or tags into a collaborative system.
Feature-Augmentation
Treats the output from one system as an input to another. Not very clear how this would differ from above, but it sounds like the output pseudo-users that would be the output from the collaborative system then would be used as input for a content based system.
Meta-Level Hybrid
Building up a content-based user profile, this user profile can then be compared to other user profiles in a collaborative filtering way. This differs from comparing the individual item likes which is the default collaborative filtering approach.
Matrix Factorization on Hybrid Data
Another approach is to perform dimensionality reduction on heterogeneous data, i.e. hybridize the data rather than the algorithm. This is also sometimes called side information. Besides the core ratings matrix between user-items, we can add additional information such as movie genre or review scores.
This since the matrix functionality machine learning algorithms are very general.
Bias Models
User and item biases.
Personalized scores, but not rankings. These parameters can be learned via gradient descent.
Linear Models
Can be extended with various features.
SVD++
SVD++ integrates ‘is it rated?’ with rating value.
SVDFeature
SVDFeature takes this further by adding latent features for each observable feature of a user or item.
Bias is derived from features too:
Features comes from any available data.
More Techniques
- Generalized Probabilistic Matrix Factorization (GPMF) integrates extra data into probabilistic matrix factorization.
- Matrix co-factorization or collective matrix factorization process multiple matrices.
Matrix Factorization Hybrids with George Karypis
An assumption for matrix factorization is that the observed data is randomly distributed (i.e. what you’ve rated so far should have been picked randomly), which generally doesn’t hold which means accurate recovery of the original matrix is not fully possible.
The lower-dimensional matrix will be good at recovering those dimensions that lies close to what the user has rated previously. Dimensions with much sparser data will not be as reliable. Makes intuitive sense…
One consequence of this is that serendipity in matrix factorization approaches is relatively low.
Interview with Arindam Banerjee
You have a set of entities, and then also a set of relations between entities. The relation between users and movies is for example the ratings matrix.
Similarly, you might have entities for and relations between movies and actors, reviewers and movies etc… In the end, modeling all this data in a way to fit it into a recommender system is the challenge, which is very open-ended.
The goal will be to embed all entities in a common vector space using, for example, probabilistic matrix factorization (PMF) or generalized probabilistic matrix factorization (GPMF). This can then be used to reconstruct the original entities.
Interview with Yehuda Koren
What did make a difference in winning the Netflix price in hindsight?
- Using different models together to balance out biases of each individual model.
- Finding unique effects in the data and use it for feature engineering. Anchoring effect; “all ratings given on the same day tend to be inter-related”. I assume for the same person (so depending on mood for example). If not finding this in the data exploration, you would not be able to model for it. How many items users rated on the same day? Were movies reviewed individually or in bulk together with others? Will a user predict a movie or not?
- Having a framework for blending multiple models also allowed splitting the work, so that each person independently could work on their models.
Other insights.
- Getting the top 5 list of recommendations right is what matter, not the total RMSE.
- Defining good metrics and measuring the user impact are still open challenges for recommender systems. “The offline metrics are misleading us every time.”
Comments