Item-Item Collaborative Filtering (IICF).
UUCF oftentimes have issues of sparsity for larger datasets. There are strategies trying to address this like dimensionality reduction, but remains a challenge, but this also lead to the idea to look at item to item filtering.
This can also help with computing performance, as UUCF computations have high complexity and need to be recomputed frequently (especially for new users in order to give them useful recommendations).
Item-Item Collaborative Filtering
The similarity between different items are usually fairly stable when you have more users than items, in which case the average item will have many more ratings than the average user. As a rule of thumb though, if the system has far more items than users, then IICF will likely not work as well.
Similarly, intuitively items generally don’t change rapidly which would cause a change in ratings, unless they are somehow time-bound or some special event causes people to reevaluate the item.
Two Step Process
- Compute similarity between pairs of items
- Correlation between rating vectors
- Only for co-rated cases, only useful for multi-level ratings.
- Cosine of item rating vectors
- Works both with multi-level and unary ratings
- Or adjusted ratings (normalized before computing cosine)
- Correlation between rating vectors
- Predict user-item rating
- Weighted sum of rated “item-neighbors”
- Linear regression to estimate rating
- …or others
Item-Item Top-N
- Simplify model by limiting items to small neighbourhoods of k most-similar items (e.g. 20)
- For a profile set of item, compute/merge/sort the k-most similar items for each profile item
Benefits of Item-Item
- Works well
- Good prediction Accuracy
- Good performance on top-N predictions
- Efficient implementation
- At least in cases where
- Benefits of precomputability
- Broad applicability and flexibility
- As easy to apply to a shopping cart as to a user profile
Core Assumptions and Limitations
- Item-item relationships need to be stable
- Based on stable user preferences
- Oftentimes these issues are temporal issues
- Christmas trees are only of interest during Christmas
- A calendar for 2020 will not be of interest in 2021
- Main limitation/complaint: low serendipity
- Recommends items very similar to what you’ve already expressed interest in
Algorithm
- Precompute item similarities
- Look for items similar to items user already expressed interest in
Basic formula;
Computing Similarity
We want the cosine similarity over normalized vectors. This means the dot product of the item ratings over the product of the vector lengths.
Expanded and centered (removing the item mean from each rating term), we get the Pearson correlation formula.
…with the standard deviation…
usw…
Neighbourhood
- For N(i, u), find k most similar items to i that u has rated
So with the mean ratings subtracted, and using neighbourhood, the scoring formula becomes:
Note: In the actual assignment, it’s the users mean rating () that is used in both places in this formula. Looking back to the original paper, it doesn’t cover normalizing the output scoring to the users scale which leaves us in the dark as to which one is actually preferred.
Components of algorithm
- Item similarity function (interpolation weight )
- Model builder
- Neighbourhood selection strategy
- Item score aggregation function
Item Similarities
- Usually cosine similarity between item rating vectors
- Normalize user ratings
- Subtract item mean (historically, user mean)
- Treat missing values as 0
Score Items
- Score items S(i, u)
- Average normalized ratings, denormalize when done
- Can consider other methods like linear regression
Picking Neighbors
- Too small k gives inaccurate scores
- Too large k gives too much noise
- k = 20 often works well
Building the model
- Pre-compute similarities for all pairs of items
- Since they are slow to change makes this feasible
- Naively;
- If the similarity function is symmetric (), you only need to compute one direction for each pair
- Can skip pairs of item if there are no users who has rated both of them
- Truncating the Model
- Don’t need to keep the whole I^2 model
- Need enough neighbors to find neighbors at score time
- Since user hasn’t rated everything, need neighbors per item in model.
- Can drop nonpositive neighbors
- Balance memory use with accuracy and coverage
- Mild runtime improvements as well
Pseudo Algorithm
Initialize W to empty weight matrix
For i in I
for j in I so that i != j
if sim(i,j) > 0
w_ij = sim(i,j)
clear all but M largest values of row w_i
Skipping sparse or symmetric pairs optimization not included.
Item-item with Unary Data
Need a few tweaks for unary data (data that describes whether a user has consumed an item or not), for example for implicit signals.
Rating Interpretation
- Rating values: user-item rating matrix
- Need some matrix to represent data
- Logical (1/0) user-item matrix
- Purchase count matrix
- Problem; what is a 0 for unary data?
- Unclear if null and 0 differ?
- We just ignore that for item-item
Normalization
- Standard mean-centering is not meaningful
- Can normalize vectors to unit vectors instead
- Intuition; users who like many items provide less information about any particular item pair
- Could also consider taking the log of counts to decrease the impact of high counts.
- Can normalize vectors to unit vectors instead
Similarity
- Cosine Similarity still works
- Conditional probability is an alternative, makes it similar to association rules. This breaks symmetry of the function as
Aggregating scores
- Weighted average works for non-binary counts
- For binary, sum neighbour similarities
- Neighbourhood selection unchanged (most similar)
Hybrids and Extensions
Item-item is good for extending.
- Simple parts with well-defined interfaces provides flexibility
- Easy to understand what extensions do
Trustworthiness or Authority/Importance
- Goal: incorporate trustworthiness into item relatedness computation
- User’s global reputation, not per-user trust
- Solution: weight users by trust before computing item similarities
- High-trust users have more impact
- Massa and Avesani 2004, Trust-aware collaborative filtering for recommender systems.
Given trust factor for user u;
Same formula can also be used for ranking user importance;
- Goal: Incorporate ‘importance’ into recommender
- Solution: weight user vectors by the users importance score
- Ekstrand et al., 2010. Automatically Building Research Reading Lists.
The importance score can follow algorithms such as PageRank or others.
Restructuring Item-Item CBF
- Basic item-item algorithm structure doesn’t care how similarity is computed
- So why not use content-based similarity
- Resulting algorithm really isn’t a collaborative filter, but it can work pretty well!
- Example; using Lucene to compare documents as neighbourhood & similarity function
Deriving Weights
- Item-Item compares individual item pairs
- Alternative approach; infer coefficients from data
- Find coefficients that minimize squared error
- Learn coefficients with standard machine learning / optimization algorithm (gradient descent)
Real-World Experiences
- Look at which entity we have more data on - if the set of users is much larger than the set of items, then the user data will be much more sparse than the item data, and vice versa. Choose algorithm to avoid sparse data.
- Extends simple association rules - instead of just looking at one product and see what else is associated with it, this allows looking at a collection of products and suggesting complements.
- Conservative recommendations might be too obvious and not provide serendipity
- It’s fast and stable
- Predictions can be re-scaled. OrdRec is an approach to map a ranked list onto an arbitrary prediction distribution.
- Matrix factorization techniques present an interesting (and different) point in this design space
Comments