recommender systems and collaborative filtering

60
1 Recommender Systems and Collaborative Filtering Jon Herlocker Assistant Professor School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR (also President, MusicStrands, Inc.)

Upload: mimis

Post on 19-Jan-2016

60 views

Category:

Documents


4 download

DESCRIPTION

Recommender Systems and Collaborative Filtering. Jon Herlocker Assistant Professor School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR (also President, MusicStrands, Inc.). Personalized Recommender Systems and Collaborative Filtering (CF). Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recommender Systems and Collaborative Filtering

1

Recommender Systems and Collaborative

Filtering

Recommender Systems and Collaborative

Filtering

Jon HerlockerAssistant Professor

School of Electrical Engineering and Computer Science

Oregon State UniversityCorvallis, OR

(also President, MusicStrands, Inc.)

Page 2: Recommender Systems and Collaborative Filtering

2

Personalized Recommender Systems and Collaborative Filtering (CF)

Personalized Recommender Systems and Collaborative Filtering (CF)

Page 3: Recommender Systems and Collaborative Filtering

3

OutlineOutline

• The recommender system space

• Pure collaborative filtering (CF)

• CF algorithms for prediction• Evaluation of CF algorithms• CF in web search (if time)

Page 4: Recommender Systems and Collaborative Filtering

4

Recommender SystemsRecommender Systems• Help people make decisions

– Examples: • Where to spend attention• Where to spend money

• Help maintain awareness– Examples:

• New products• New information

• In both cases– Many options, limited resources

Page 5: Recommender Systems and Collaborative Filtering

5

Stereotypical Integrator of RS Has: Stereotypical Integrator of RS Has: Large product (item) catalog

– With product attributesLarge user base

– With user attributes (age, gender, city, country, …)

Evidence of customer preferences– Explicit ratings (powerful, but

harder to elicit)– Observations of user activity

(purchases, page views, emails, prints, …)

Page 6: Recommender Systems and Collaborative Filtering

6

Users Items

Observed preferences

The RS SpaceThe RS Space

Item-ItemLinks

User-UserLinks

Links derived from similar attributes,

similar content, explicit cross references

Links derived from similar attributes,

explicit connections

(Ratings, purchases, page views, laundry

lists, play lists)

Page 7: Recommender Systems and Collaborative Filtering

7

Users Items

Observed preferences

Individual PersonalizationIndividual Personalization

Item-ItemLinks

User-UserLinks

Page 8: Recommender Systems and Collaborative Filtering

8

Users Items

Observed preferences

Classic CFClassic CF

Item-ItemLinks

User-UserLinks

In the end, most models will be hybrid

Page 9: Recommender Systems and Collaborative Filtering

9

Collaborative FilteringCollaborative Filtering

Collaborative FilteringProcess

Community Opinions

Items you’ve experienced

Predictions

Unseen items

Your OpinionsYou

Page 10: Recommender Systems and Collaborative Filtering

10

Find a Restaurant!Find a Restaurant!

Pizza Pipeline

Local Boyz

El Tapio

Adinky Deli

Izzys Cha-da Thai

Jon D A B D ? ? Tami A F D F Mickey A A A A A A Goofy D A C John A C A C A Ben F A F Nathan D A A

Page 11: Recommender Systems and Collaborative Filtering

11

PizzaPipeline

LocalBoyz

ElTapio

AdinkyDeli

Izzys Cha-daThai

Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A

Find a Restaurant!Find a Restaurant!

Page 12: Recommender Systems and Collaborative Filtering

12

PizzaPipeline

LocalBoyz

ElTapio

AdinkyDeli

Izzys Cha-daThai

Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A

Find a Restaurant!Find a Restaurant!

Page 13: Recommender Systems and Collaborative Filtering

13

PizzaPipeline

LocalBoyz

ElTapio

AdinkyDeli

Izzys Cha-daThai

Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A

Find a Restaurant!Find a Restaurant!

Page 14: Recommender Systems and Collaborative Filtering

14

PizzaPipeline

LocalBoyz

ElTapio

AdinkyDeli

Izzys Cha-daThai

Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A

Find a Restaurant!Find a Restaurant!

Page 15: Recommender Systems and Collaborative Filtering

15

PizzaPipeline

LocalBoyz

ElTapio

AdinkyDeli

Izzys Cha-daThai

Jon D A B D A ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A

Find a Restaurant!Find a Restaurant!

Page 16: Recommender Systems and Collaborative Filtering

16

PizzaPipeline

LocalBoyz

ElTapio

AdinkyDeli

Izzys Cha-daThai

Jon D A B D A FTami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A

Find a Restaurant!Find a Restaurant!

Page 17: Recommender Systems and Collaborative Filtering

17

Advantages of Pure CFAdvantages of Pure CF• No expensive and error-prone

user attributes or item attributes • Incorporates quality and taste• Works on any rate-able item• One data model => many

content domains• Serendipity• Users understand it!

Page 18: Recommender Systems and Collaborative Filtering

18Predictive Algorithms for Collaborative Filtering

Predictive Algorithms for Collaborative Filtering

Page 19: Recommender Systems and Collaborative Filtering

19

Predictive Algorithms for Collaborative Filtering

Predictive Algorithms for Collaborative Filtering• Frequently proposed taxonomy for

collaborative filtering systems– Model-based methods

• Build a model offline• Use model to generate recommendations• Original data not needed at predict-time

– Instance-based methods • Use the ratings directly to generate

recommendations

Page 20: Recommender Systems and Collaborative Filtering

20

Model-Based AlgorithmsModel-Based Algorithms• Probabilistic Bayesian

approaches, clustering, PCA, SVD, etc.

• Key ideas– Reduced dimension

representations (aggregations) of original data

– Ability to reconstruct an approximation of the original data

Page 21: Recommender Systems and Collaborative Filtering

21

Stereotypical model-based approachesStereotypical model-based approaches

• Lower dimensionality => faster performance

• Can explain recommendations• Can over-generalize• Not using the latest data• Force a choice of aggregation

dimensions ahead of time

Page 22: Recommender Systems and Collaborative Filtering

22

Instance-Based MethodsInstance-Based Methods• Primarily nearest neighbor

approaches• Key ideas

– Predict over raw ratings data (sometimes called memory-based methods)

– Highly personalized recommendations

Page 23: Recommender Systems and Collaborative Filtering

23

Stereotypical model-based approachesStereotypical model-based approaches• Use most up-to-date ratings• Are simple and easy to explain

to users• Are unstable when there are

few ratings• Have linear (w.r.t. users and

items) run-times• Allow a different aggregation

method for each user, possibly chosen at runtime

Page 24: Recommender Systems and Collaborative Filtering

24

Evaluating CF Recommender Systems

Evaluating CF Recommender Systems

Page 25: Recommender Systems and Collaborative Filtering

25

Evaluation – User TasksEvaluation – User Tasks• Evaluation depends on the

user task• Most common tasks

– Annotation in context • Predict ratings for individual items

– Find good items• Produce top-N recommendations

• Other possible tasks– Find all good items– Recommend sequence– Many others…

Page 26: Recommender Systems and Collaborative Filtering

26

Novelty and Trust - ConfidenceNovelty and Trust - Confidence• Tradeoff

– High confidence recommendations• Recommendations are obvious• Low utility for user• However, they build trust

– Recommendations with high prediction yet lower confidence

• Higher variability of error• Higher novelty => higher utility for

user

Page 27: Recommender Systems and Collaborative Filtering

27

Test Users

Community

Ratings Data

Training Set

Test Set

Page 28: Recommender Systems and Collaborative Filtering

28

Predictive Accuracy MetricsPredictive Accuracy Metrics• Mean absolute error (MAE)

• Most common metric • Characteristics

– Assumes errors at all levels in the ranking have equal weight

– Sensitive to small changes– Good for “Annotate in Context” task– May not be appropriate for “Find Good

Items” task

N

rpE

N

iii

1

Page 29: Recommender Systems and Collaborative Filtering

29

Classification Accuracy MetricsClassification Accuracy Metrics• Precision/Recall

– Precision: Ratio of “good” items recommended to number of items recommended

– Recall: Ratio of “good” items recommended to the total number of “good” items

• Characteristics– Simple, easy to understand– Binary classification of “goodness”– Appropriate for “Find Good Items”– Can be dangerous due to lack of ratings

for recommended items

Page 30: Recommender Systems and Collaborative Filtering

30

ROC CurvesROC Curves

• “Relative Operating Characteristic” or “Receiver Operating Characteristic”

• Characteristics– Binary classification– Not a single number metric– Covers performance of system at

all points in the recommendation list

– More complex

Page 31: Recommender Systems and Collaborative Filtering

31

Figure 1. A possible representation of the density functions for relevant and irrelevant items.

Predicted Level of Relevance

Probability

Non-Relevant Relevant

Filter cutoff

Page 32: Recommender Systems and Collaborative Filtering

32

Page 33: Recommender Systems and Collaborative Filtering

33

Prediction to Rating Correlation MetricsPrediction to Rating Correlation Metrics• Pearson, Spearman, Kendall• Characteristics

– Compare non-binary ranking to non-binary ranking

– Rank correlation metrics suffer from “weak orderings”

– Can only be computed on rated items

– Provide a single score

Page 34: Recommender Systems and Collaborative Filtering

34

Half-life Utility MetricHalf-life Utility Metric

• Characteristics– Explicitly incorporates idea of decreasing

user utility– Tuning parameters reduce comparability– Weak orderings can result in different

utilities for the same system ranking– All items rated less than the max contribute

equally– Only metric to really consider non-uniform

utility

jj

jaa

drR

1/1

,

2

0,max

Page 35: Recommender Systems and Collaborative Filtering

35

Does it Matter What Metric You Use?Does it Matter What Metric You Use?• An empirical study to gain

some insight…

Page 36: Recommender Systems and Collaborative Filtering

36

Analysis of 432 variations of an algorithm on a 100,000 rating movie dataset

Page 37: Recommender Systems and Collaborative Filtering

37

Comparison among results provided by all the per-user correlation metrics and the mean average precision per user metric. These metrics have strong linear

relationships with each other.

Page 38: Recommender Systems and Collaborative Filtering

38

Page 39: Recommender Systems and Collaborative Filtering

39

Comparison between metrics that are averaged overall rather than per-user. Note the linear relationship between the different metrics.

Page 40: Recommender Systems and Collaborative Filtering

40

A comparison of representative metrics from the three subsets that were depicted in the previous slides. Within each of the subsets, the metrics strongly agree, but this figure shows that metrics from different subsets do not correlate well

Page 41: Recommender Systems and Collaborative Filtering

41

Does it Matter What Metric You Use?Does it Matter What Metric You Use?• Yes.

Page 42: Recommender Systems and Collaborative Filtering

42

Want to try CF?Want to try CF?

• CoFE “Collaborative Filtering Engine” – Open source Java– Easy to add new algorithms– Includes testing infrastructure

(this month)– Reference implementations of

many popular CF algorithms– One high performance algorithm

• Production ready (see Furl.net)

http://eecs.oregonstate.edu/iis/CoFE

Page 43: Recommender Systems and Collaborative Filtering

43

Improving WebSearch Using CF

With Janet Webster, OSU Libraries

Page 44: Recommender Systems and Collaborative Filtering

44

Controversial ClaimControversial Claim• Improvements in text

analysis will substantially improve the search experience

• Focus on improving results of Mean Average Precision (MAP) metric

Page 45: Recommender Systems and Collaborative Filtering

45

Evidence of the ClaimEvidence of the Claim• Human Subjects Study by

Turpin and Hersh (SIGIR 2001)– Compared human performance of

• 1970s search model (basic TF/IDF)• Recent OKAPI search model with

greatly improved MAP

– Task: locating medical information

– No statistical difference

Page 46: Recommender Systems and Collaborative Filtering

46

Bypass the Hard Problem!Bypass the Hard Problem!• The hard problem –

Automatic analysis of text – Software “understanding”

language

• We propose: Let humans assist with the analysis of text!– Enter Collaborative Filtering

Page 47: Recommender Systems and Collaborative Filtering

47

Page 48: Recommender Systems and Collaborative Filtering
Page 49: Recommender Systems and Collaborative Filtering
Page 50: Recommender Systems and Collaborative Filtering

50

The Human ElementThe Human Element• Capture and leverage the experience

of every user– Recommendations are based on human

evaluation• Explicit votes• Inferred votes (implicit)

• Recommend (question, document) pairs– Not just documents– Human can determine if questions have

similarity

• System gets smarter with each use– Not just each new document

Page 51: Recommender Systems and Collaborative Filtering

51

Research IssuesResearch Issues• Basic Issues

– Is the concept sound? – What are the roadblocks?

• More mathematical issues– Algorithms for ranking recommendations (question,

document, votes) – Robustness with unreliable data

• Text/Content Analysis– Improved NLP for matching questions– Incorporating more information into information context

• More social issues– Training users for new paradigm– Privacy– Integrating with existing reference library practices and

systems– Offensive material in questions– Most effective user interface metaphors

Page 52: Recommender Systems and Collaborative Filtering

52

Initial ResultsInitial Results

Page 53: Recommender Systems and Collaborative Filtering

53

Initial ResultsInitial Results

Page 54: Recommender Systems and Collaborative Filtering

Average visited documents: 2.196

First click - recommendation

(141 – 71.6%)

First click - Google result

(56 – 28.4%)

Average ratings: 14.727 Average ratings: 20.715

Only Google Results (706 - 59.13%)

Google results + recommendations (488 - 40.87%)

Average visited documents: 1.598

Clicked

(172 – 24.4%)

No clicks

(534 - 75.6%)

Clicked

(197 – 40.4%)

No click(291 – 59.6%)

Three months SERF usage – 1194 search transactions

Page 55: Recommender Systems and Collaborative Filtering

Average visited documents: 2.196

First click - recommendation

(141 – 71.6%)

First click - Google result

(56 – 28.4%)

Average ratings: 14.727 Average ratings: 20.715

Only Google Results (706 - 59.13%)

Google results + recommendations (488 - 40.87%)

Average visited documents: 1.598

Clicked

(172 – 24.4%)

No clicks

(534 - 75.6%)

Clicked

(197 – 40.4%)

No click(291 – 59.6%)

Three months SERF usage – 1194 search transactions

Page 56: Recommender Systems and Collaborative Filtering

Average visited documents: 2.196

First click - recommendation

(141 – 71.6%)

First click - Google result

(56 – 28.4%)

Average ratings: 14.727 Average ratings: 20.715

Only Google Results (706 - 59.13%)

Google results + recommendations (488 - 40.87%)

Average visited documents: 1.598

Clicked

(172– 24.4%)

No clicks

(534 - 75.6%)

Clicked

(197 – 40.4%)

No click(291 – 59.6%)

Three months SERF usage – 1194 search transactions

Page 57: Recommender Systems and Collaborative Filtering

Average visited documents: 2.196

First click - recommendation

(141 – 71.6%)

First click - Google result

(56 – 28.4%)

Average rating: 14.727(49% Voted as Useful)

Average rating: 20.715(69% Voted as Useful)

Only Google Results (706 - 59.13%)

Google results + recommendations (488 - 40.87%)

Average visited documents: 1.598

Clicked

(172 – 24.4%)

No clicks

(534 - 75.6%)

Clicked

(197 – 40.4%)

No click(291 – 59.6%)

Three months SERF usage – 1194 search transactions

Vote of yes = 30, vote of no = 0

Page 58: Recommender Systems and Collaborative Filtering

58

SERF Project SummarySERF Project Summary• No large leaps in language

understanding expected– Understanding the meaning of

language is *very* hard

• Collaborative filtering (CF) bypasses this problem– Humans do the analysis

• Technology is widely applicable

Page 59: Recommender Systems and Collaborative Filtering

59

Talk MessagesTalk Messages

• Model for learning options in recommender systems

• Survey of popular predictive algorithms for CF

• Survey of evaluation metrics• Empirical data showing metrics

can matter• Evidence showing CF could

significantly improve web search

Page 60: Recommender Systems and Collaborative Filtering

60

Links & Contacts Links & Contacts

CoFE– http://eecs.oregonstate.edu/iis/

CoFESERF

– http://osulibrary.oregonstatate.edu/

Jon [email protected]

[email protected]+ 1 (541) 737-8894