the wisdom of the few : a collaborative filtering approach based on expert opinions from the web

The Wisdom of the Few : The Wisdom of the Few : A collaborative A collaborative Filtering Approach Based on Expert Opinions from the WebFiltering Approach Based on Expert Opinions from the Web

Xavier Amatriain, Josep M. Pujol, Nuria Oliver(Telefonica Research,Spain),

Neal Lathia (University College of London,UK), Haewoon Kwak(KAIST,Korea)

Session : Recommenders ,SIGIR’09 July 19-23,2009

Intelligent Database Systems LabSchool of Computer Science & EngineeringSeoul National University, Seoul, Korea

2010-02-10SungEun Park

Copyright 2008 by CEBT

Collaborative Filtering Approach

The wisdom of the crowd?

Try expert opinions on CF as an external data source.

Neighbor CF vs. Experts CF

Excellent Point

Experiment and analysis on dataset and the Expert-CF method.

Brief SummaryBrief Summary

User

User

User

User

User

Expert1

Expert2

Expert3


ContentsContents

Goal

Dataset Analysis

Expert Nearest-neighbors Approach

Experiment

Mean Average Error

Top-N Recommendation Precision

User Study

Discussion


GoalGoal

The goal is not to increase CF accuracy

But..

How can preferences of a large population be predicted by using a very small set of users.

Understand the potential of an independent and uncorrelated data set to generate recommendations.

Analyze whether professional raters are good predictors for general users.

Discuss how this approach addresses some of the traditional pitfalls in CF.


DatasetDataset

Expert data set : Rottentomatoes.com

Aggregates the opinions of movie critics from various media sources.

169 experts out of 1750 experts :threshold of rating >250

User data set : Netflix.com movie reviews

8,000 out of 17,770 movies

– Matching movies to ones in Rotten tomatoes

– 20% of the movies have one rating.


Dataset Analysis(1) Dataset Analysis(1)

Number of Ratings and Data Sparsity

The expert matrix is less sparse than the user matrix and more evenly distributed.

– Sparsity Coefficient : Users 0.01 vs. Expert 0.07

– One expert has more ratings than users.

Ratings Count

User

CDF(cumulative distribution function) : 임의 변수 u 의 확률 밀도함수 (p.d.f.) P[u] 의 누적 적분으로 정의


Dataset Analysis(2) Dataset Analysis(2)

Average Rating Distribution

Experts tend to act similarly.

Their overall opinions on the movies is more varied

Users dataset

Experts dataset

Range < 10%

0.45 0.4

Average Rating

0.55 0.6

Range > 10%

0.7 0.8

User

Users dataset

Experts dataset

Range < 10%

Under 0.55 0.4

Average Rating

0.7 0.6

Range > 10%

Above 0.7 0.8

Ratings

Movie


Dataset Analysis(3)Dataset Analysis(3)

Rating Standard Deviation

Experts tend to agree more than regular users

– Lower standard deviation

Experts tend to deviate less from their personal average rating

Wider curve


Expert Nearest-neighbors ApproachExpert Nearest-neighbors Approach

Similarity between each user and the experts set.

Similarity

– a,b : users

– Na, Nb : the number of items rated by each user

– Na∪b : the number of co-rated items

Look only for the experts whose similarity to the given user is greater than δ.

risk of finding very few neighbors

confidence threshold τ : the minimum number of expert neighbors who must have rated the item in order to trust their prediction.

Adjusting factor for To take into accountthe # of items co-rated by both users


Expert Nearest-neighbors ApproachExpert Nearest-neighbors Approach

sim: V × V → R, we define a set of experts E = {e1, ...,

ek} ⊆ V and a set of users U = {u1, ..., uN} ⊆ V . Given a

particular user u ⊆ U and a value δ, we find the set of experts E′ ⊆ E such that: ∀e ⊆ E′ ⇒ sim(u, e) ≥ δ.

E′′ ⊆ E′ such that ∀e ⊆ E′′ ⇒ rei ≠ ◦, where rei is the rating

of item i by expert e ⊆ E′, and ◦ is the value of the unrated item.

E′′ = e1...en,

– No prediction when n < τ

– Predicted Rating computed when n ≥ τ

Predicted rating


ExperimentExperiment

Mean Average Error and Coverage

Top-N Recommendation Precision

User Study


Experiment(1): Experiment(1): Mean Average Error and Mean Average Error and Coverage Coverage

Interplay between τ and δ

τ = 10 and δ = 0.01



5-fold cross-validation : user data set (by random sampling) into 80% training - 20% testing sets

τ = 10 and δ = 0.01

Worst-case baseline : “critics’ choice” recommendation

A significant accuracy improvement than using the experts’ average.

Expert-CF shows 0.08 higher MAE but 6% higher coverage


Expert-CF is better for users with MAE > 1.0

NN-CF is better for users with small MAE (less than 0.5)

Minority population around 10%

Expert-CF and NN-CF are similar in the rest of the range.


Expert-CF is better

NN-CF is better


Experiment(2): Experiment(2): Top-N Recommendation Top-N Recommendation PrecisionPrecision

Classify items as being recommendable or not recommendable given a threshold

For a given user, compute all predictions and present those greater or equal than σ to the user

if there is no item in the test set that is worth recommending to a given target user, we simply return an empty list.

1. For a given user, compute all predictions and present those greater than or equal to σ to the user

2. For all predicted items that are present in the user’s test set, look at whether it is a true positive (actual user rating greater or equal to σ) or a false positive (actual user rating less than σ).

3. Compute the precision of our classifications using the classical definition of this measure


Experiment(2): Experiment(2): Top-N Recommendation Top-N Recommendation PrecisionPrecision

For σ = 4, NN-CF clearly outperforms expert-CF.

For σ = 3, the precision in both methods is similar.

For users willing to accept recommendations for any above average(3) item, the expert-based method appears to behave as well as a standard NN-CF

σ


Experiment(3) : User StudyExperiment(3) : User Study

asked 57 participants to rate 100 preselected movies.

1. Random List: A random sequence of movie titles

2. Critics choice: The movies with the highest mean rating given by the experts.

3. Neighbor-CF: Similar users in the Netflix to each survey respondents’

4. Expert-CF: Similar to (3), but using the expert dataset instead of the Netflix ratings.

Generate the recommendation lists based on limited user feedback : The average number of ratings was of 14.5 per participant → Cold Start Condition.



The expert-CF approach is the only method that obtains an average rating higher than 3

Critics’ choice and expert-CF are the only approaches that are qualified as very good are expert based.

Overall quality : average responseoverall quality ofthe recommendation lists

50%



The ratings to the question ”the list contains movies I think I would like or not like.”

an important aspect on the evaluation of a recommender system is how often the user is disappointed by the results

Recommending wrong items mines the user’s assessment of the system and compromises its usability.

The expert-CF approach generates the least negative response when compared to the other methods.



An analysis of variance : test whether the differences between the four recommendation lists are statistically significant or not.

The null hypothesis : the average user evaluation for the four different lists is the same.

– P-values smaller than 0.01 : a rejection of the null hypothesis

– the differences on the user satisfaction from the three baseline methods is not statistically significant

– the differences on the user satisfaction from Expert-CF are statistically meaningfulRando

mkNN CF

Critics’ Choice

Expert-CF

0.000051

0.42

Random kNN CF Critics’ Choice

Expert-CF

0.0000214

0.0024 0.008

P-values


DiscussionDiscussion

using a limited set of external experts to generate the predictions

1. Data Sparsity

– domain experts are more likely to have rated a large percentage of the items

2. Noise and Malicious Ratings

– reducing noise : Experts are expected to be more consistent and conscious with their ratings

3. Cold Start Problem

– Motivated expert users typically rate a new item entering the collection as soon as they know of its existence and therefore minimize item cold-start.


DiscussionDiscussion

4. Scalability

– Computing the similarity matrix : O(N2M)

for N users in an M-item collection

– Experts is much smaller scale than users (169 experts vs. 500, 000 potential neighbors)

5. Privacy

– Only needs the target users’ profile and the current expert ratings.


ConclusionConclusion

A reduced set of expert ratings can predict the ratings of a large population.

The method’s performance is comparable to traditional CF algorithms, even when using an extremely small expert set.

addresses some of the shortcomings of traditional CF: data sparsity, scalability, noise in user feed-back, privacy and the cold-start problem.

Q&AQ&A

Thank you…

the wisdom of the few : a collaborative filtering approach based on expert opinions from the web

Documents

set of experts e e

user data set

set of users u

e e simu

expert opinions

small set of users

uncorrelated data set

cebtdatasetexpert data