![Page 2: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/2.jpg)
Agenda
● Introduction● Mahout / Taste● Taste Architecture● Algorithms● Evaluating algorithms● Questions?
![Page 3: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/3.jpg)
Recommendation Engines
● Amazon
● Stumbleupon
● Youtube
● Last.fm
● Netflix
● Digg
● Google News
![Page 4: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/4.jpg)
CollaborativeFiltering
Clustering
Classification
Is this SPAM?
![Page 5: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/5.jpg)
Users & Items
![Page 6: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/6.jpg)
Preferences
I rateI am buying
Explicit Implicit
3 stars
![Page 7: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/7.jpg)
Item-based recommendation
Which are books are read
by people that also read
![Page 8: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/8.jpg)
User-based recommendation
We've got similar tastes, read any good books?
![Page 9: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/9.jpg)
User neighborhood
![Page 10: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/10.jpg)
Taste Architecture
DataModel
Recommender
ItemSimilarityor UserSimilarity
234, 854, 4.0234, 598, 3.0234, 458, 5.0235, 289, 4.0… , … , ...
Preferences CSV file
3 stars
![Page 11: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/11.jpg)
Preferences
● Preference● long userId;● long itemId;● float value;
● PreferenceArray● Implicit
BooleanUserPreferenceArray & BooleanItemPreferenceArray
![Page 12: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/12.jpg)
DataModels
● FileDataModel
● GenericJDBCDataModel
● MySQLDataModel
![Page 13: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/13.jpg)
Similarity Algorithms
Class Explicit Implicit
TanimotoCoefficientSimilarity
LogLikelihoodSimilarity
EuclidianDistanceSimilarity
PearsonCorrelationSimilarity
SpearmanCorrelationSimilarity
UncenteredCosineSimilarity
Slope One
![Page 14: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/14.jpg)
Similarity Algorithms
Class Explicit Implicit
TanimotoCoefficientSimilarity
LogLikelihoodSimilarity
EuclidianDistanceSimilarity
PearsonCorrelationSimilarity
SpearmanCorrelationSimilarity
UncenteredCosineSimilarity
Slope One
![Page 15: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/15.jpg)
TanimotoCoefficientSimilarity
#Users preferring A AND B
Divided by
#Users preferring A XOR B
T(A,B) =
![Page 16: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/16.jpg)
LoglikelihoodSimilarity
● Hypothesis A = “Items are similar”
● Hypothesis B = “Items are not similar”
● L(A,B) = log (max likelihood A) – log (max likelihood B)
● See “Accurate methods for statistics of suprise and coincidence” ~ Ted Dunning
![Page 17: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/17.jpg)
● MySQLJDBCItemSimilarity
● Generic*Similarity● GenericItemSimilarity.ItemItemSimilarity
● GenericUserSimilarity.UserUserSimilarity
Precomputed Similarities
![Page 18: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/18.jpg)
long itemId = 345;
GenericItemBasedRecommender itemRec = …itemRec.mostSimilarItems(itemId, 5);
long userId = 103;
GenericUserBasedRecommender userRec = …userRec.recommend(userId, 5);
Recommenders
![Page 19: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/19.jpg)
● User/Item-based recommendation
● Refresh logic
● Access to DataModel
● Recommended because
Recommenders
![Page 20: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/20.jpg)
Evaluating algorithms
Eval %
Originaldataset
Train %
Recommender
Testdataset
Trainingdataset
Estimatedpreference
Actualpreference
3.0
![Page 21: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/21.jpg)
Evaluating algorithms
● AverageAbsoluteDifference or RMSRecommenderEvaluator● Evaluation %● Training %● RecommenderBuilder● DataModelBuilder● DataModel
![Page 22: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/22.jpg)
Evaluation Demo
● Helper classes for doing evaluation
● TODO - Evaluation of implicit data
● Suggestions welcome
![Page 23: Introduction to Collaborative Filtering usingblog.trifork.com › wp-content › uploads › 2010 › 07 › scholten...Recommendation Engines Amazon Stumbleupon Youtube Last.fm Netflix](https://reader030.vdocuments.net/reader030/viewer/2022041109/5f0ced1d7e708231d437d085/html5/thumbnails/23.jpg)
References
Mahout in Action EAP
http://blog.jteam.nl
Mailinglist