Collaborative Filtering - IICF - Week 3

Item-Item Collaborative Filtering (IICF).

UUCF oftentimes have issues of sparsity for larger datasets. There are strategies trying to address this like dimensionality reduction, but remains a challenge, but this also lead to the idea to look at item to item filtering.

This can also help with computing performance, as UUCF computations have high complexity and need to be recomputed frequently (especially for new users in order to give them useful recommendations).

Item-Item Collaborative Filtering

The similarity between different items are usually fairly stable when you have more users than items, in which case the average item will have many more ratings than the average user. As a rule of thumb though, if the system has far more items than users, then IICF will likely not work as well.

Similarly, intuitively items generally don’t change rapidly which would cause a change in ratings, unless they are somehow time-bound or some special event causes people to reevaluate the item.

Two Step Process

Compute similarity between pairs of items
- Correlation between rating vectors
  - Only for co-rated cases, only useful for multi-level ratings.
- Cosine of item rating vectors
  - Works both with multi-level and unary ratings
  - Or adjusted ratings (normalized before computing cosine)
Predict user-item rating
- Weighted sum of rated “item-neighbors”
- Linear regression to estimate rating
- …or others

Item-Item Top-N

Simplify model by limiting items to small neighbourhoods of k most-similar items (e.g. 20)
For a profile set of item, compute/merge/sort the k-most similar items for each profile item

Benefits of Item-Item

Works well
- Good prediction Accuracy
- Good performance on top-N predictions
Efficient implementation
- At least in cases where $\mid U \mid >> \mid I \mid$
- Benefits of precomputability
Broad applicability and flexibility
- As easy to apply to a shopping cart as to a user profile

Core Assumptions and Limitations

Item-item relationships need to be stable
- Based on stable user preferences
- Oftentimes these issues are temporal issues
  - Christmas trees are only of interest during Christmas
  - A calendar for 2020 will not be of interest in 2021
Main limitation/complaint: low serendipity
- Recommends items very similar to what you’ve already expressed interest in

Algorithm

Precompute item similarities
Look for items similar to items user already expressed interest in

Basic formula; $S(u, i) = \frac{\sum_{j \in N}r_{uj}w_{ij}}{\sum_{j \in N}{w_{ij}}}$

Computing Similarity

We want the cosine similarity over normalized vectors. This means the dot product of the item ratings over the product of the vector lengths.

$w_{ij} = sim(i,j) = cos(\hat{\vec{r_i}}\hat{\vec{r_j}}) = \frac{\hat{\vec{r_i}} \cdot \hat{\vec{r_j}}}{|| r_i || || r_j ||}$

Expanded and centered (removing the item mean from each rating term), we get the Pearson correlation formula.

$w_{ij} = \frac{\sum_{u}{(r_{ui}-\overline{r_i})(r_{uj}-\overline{r_j})}}{\sigma_i \sigma_j}$

…with the standard deviation…

$\sigma_i = \sqrt{\sum_u{(r_{ui}-\overline{r_i})^2}}$

usw…

Neighbourhood

For N(i, u), find k most similar items to i that u has rated

So with the mean ratings subtracted, and using neighbourhood, the scoring formula becomes:

$S(u, i) = \frac{\sum_{j \in N(i,u)}(r_{uj} - \overline{r_j})w_{ij}}{\sum_{j \in N(i,u)}{w_{ij}}} + \overline{r_i}$

Note: In the actual assignment, it’s the users mean rating ( $\overline{r_u}$ ) that is used in both places in this formula. Looking back to the original paper, it doesn’t cover normalizing the output scoring to the users scale which leaves us in the dark as to which one is actually preferred.

Components of algorithm

Item similarity function (interpolation weight $w_{ij}$ )
Model builder
Neighbourhood selection strategy
Item score aggregation function

Item Similarities

Usually cosine similarity between item rating vectors
Normalize user ratings
- Subtract item mean (historically, user mean)
- Treat missing values as 0

Score Items

Score items S(i, u)
Average normalized ratings, denormalize when done
Can consider other methods like linear regression

Picking Neighbors

Too small k gives inaccurate scores
Too large k gives too much noise
k = 20 often works well

Building the model

Pre-compute similarities for all pairs of items
- Since they are slow to change makes this feasible
Naively;
- If the similarity function is symmetric ( $w_{ij} = w_{ji}$ ), you only need to compute one direction for each pair
- Can skip pairs of item if there are no users who has rated both of them
Truncating the Model
- Don’t need to keep the whole I^2 model
- Need enough neighbors to find neighbors at score time
  - Since user hasn’t rated everything, need $M >> k$ neighbors per item in model.
- Can drop nonpositive neighbors
- Balance memory use with accuracy and coverage
  - Mild runtime improvements as well

Pseudo Algorithm

Initialize W to empty weight matrix
For i in I
  for j in I so that i != j
    if sim(i,j) > 0
      w_ij = sim(i,j)
    clear all but M largest values of row w_i

Skipping sparse or symmetric pairs optimization not included.

Item-item with Unary Data

Need a few tweaks for unary data (data that describes whether a user has consumed an item or not), for example for implicit signals.

Rating Interpretation

Rating values: user-item rating matrix
Need some matrix to represent data
- Logical (1/0) user-item matrix
- Purchase count matrix
Problem; what is a 0 for unary data?
- Unclear if null and 0 differ?
- We just ignore that for item-item

Normalization

Standard mean-centering is not meaningful
- Can normalize vectors to unit vectors instead
  - Intuition; users who like many items provide less information about any particular item pair
- Could also consider taking the log of counts to decrease the impact of high counts.

Similarity

Cosine Similarity still works
- Conditional probability is an alternative, makes it similar to association rules. This breaks symmetry of the function as $p(r_i \mid r_j)\neq p(r_j \mid r_i)$

Aggregating scores

Weighted average works for non-binary counts
For binary, sum neighbour similarities $s(i; u) = \sum_{j \in N}{w_{ij}}$
Neighbourhood selection unchanged (most similar)

Hybrids and Extensions

Item-item is good for extending.

Simple parts with well-defined interfaces provides flexibility
Easy to understand what extensions do

Trustworthiness or Authority/Importance

Goal: incorporate trustworthiness into item relatedness computation
- User’s global reputation, not per-user trust
Solution: weight users by trust before computing item similarities
High-trust users have more impact
Massa and Avesani 2004, Trust-aware collaborative filtering for recommender systems.

Given trust factor $\rho_u$ for user u;

$w_{ij} = \frac{\sum_{u}{\rho_u(r_{ui}-\overline{r_i})(r_{uj}-\overline{r_j})}} {\sqrt{\sum_u{\rho_u(r_{ui}-\overline{r_i})^2}}\sqrt{\sum_u{\rho_u(r_{uj}-\overline{r_j})^2}}}$

Same formula can also be used for ranking user importance;

Goal: Incorporate ‘importance’ into recommender
Solution: weight user vectors by the users importance score
Ekstrand et al., 2010. Automatically Building Research Reading Lists.

The importance score can follow algorithms such as PageRank or others.

Restructuring Item-Item CBF

Basic item-item algorithm structure doesn’t care how similarity is computed
So why not use content-based similarity
Resulting algorithm really isn’t a collaborative filter, but it can work pretty well!
Example; using Lucene to compare documents as neighbourhood & similarity function

Deriving Weights

Item-Item compares individual item pairs
Alternative approach; infer coefficients from data
- Find coefficients $w_{ij}$ that minimize squared error
- Learn coefficients with standard machine learning / optimization algorithm (gradient descent)

Real-World Experiences

Look at which entity we have more data on - if the set of users is much larger than the set of items, then the user data will be much more sparse than the item data, and vice versa. Choose algorithm to avoid sparse data.
Extends simple association rules - instead of just looking at one product and see what else is associated with it, this allows looking at a collection of products and suggesting complements.
Conservative recommendations might be too obvious and not provide serendipity
It’s fast and stable
Predictions can be re-scaled. OrdRec is an approach to map a ranked list onto an arbitrary prediction distribution.
Matrix factorization techniques present an interesting (and different) point in this design space