![Page 1: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/1.jpg)
Measuring Surprise in
Recommender Systems
Marius Kaminskas, Derek Bridge
Workshop on ‘Recommender Systems Evaluation: Dimensions and Design’
October 10, 2014
![Page 2: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/2.jpg)
Introduction
• Beyond-accuracy objectives:
• novelty, diversity, serendipity
• How to measure them?
• user studies: expensive to conduct, small-scale
• offline studies: cheap to conduct, datasets available, but need evaluation metrics
• Our focus: metrics for offline evaluation of serendipity
![Page 3: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/3.jpg)
Serendipity
• “The faculty of making happy and unexpected discoveries by accident” [Oxford English Dictionary]
• Serendipitous item = surprising + relevant
![Page 4: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/4.jpg)
Measuring Recommendation Surprise
• Comparing recommended items to a baseline recommender
• motivation: serendipitous items are difficult to predict
• Measuring the recommended item’s distance from a set of expected items
• motivation: an item is surprising if it is different from
what the user expects
![Page 5: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/5.jpg)
Measuring Recommendation Surprise
• Comparing recommended items to a baseline recommender
• motivation: serendipitous items are difficult to predict
• Measuring the recommended item’s distance from a set of expected items
• motivation: an item is surprising if it is different from
what the user expects
![Page 6: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/6.jpg)
Our Goals
• Investigate alternative surprise metric definitions
• existing approaches exploit average distance between the target item and the set of expected items
• we hypothesize that averaging the distance results in information loss
• Measure surprise of recommendations produced by the state-of-the-art recommendation algorithms
![Page 7: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/7.jpg)
Proposed Surprise Metrics
• Co-occurrence-based surprise
• lower-bound distance variant
• average distance variant
![Page 8: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/8.jpg)
Proposed Surprise Metrics
• Content-based surprise
• lower-bound distance variant
• average distance variant
![Page 9: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/9.jpg)
Experiments
• Datasets: MovieLens 1M and LastFM 1K
• Recommendation algorithms
• matrix factorization, user-based k-NN (k=50), item-based k-NN (k=50)
• Evaluation methodology:
• ‘one plus random’: one 5-star item + 1000 random items
• recommend top-10 items • measure recall and surprise
![Page 10: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/10.jpg)
Results: surprise value comparison
• MF recommendations are the most accurate, but least surprising
• For Scont average and lower-bound distance results are consistent:
• UB recommendations are the most surprising
• For Sco-occ the results are inconsistent:
• sensitivity to rare items results in extreme metric values: close to 1 or -1
• this results in different outcomes for the lower-bound and average distance
metric variants
![Page 11: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/11.jpg)
Results: the impact of user’s profile size
![Page 12: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/12.jpg)
Conclusions
• Results demonstrate the trade-off between
recommendation accuracy and serendipity
• Matrix factorization produces the most accurate but least
surprising
• User-based k-NN produces the least accurate but most
surprising recommendations
• As the user’s profile size increases, information may be
lost when using average distance metric
• Co-occurrence-based metric is sensitive to rare items and
needs to be modified
![Page 13: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/13.jpg)
Future Work
• Comparing the proposed metrics against existing
serendipity metrics
• Measuring other beyond-accuracy objectives – diversity,
novelty, coverage – and their relation to serendipity
• Conducting a user study to confirm effectiveness of the
proposed metrics
![Page 14: Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program › paper03_slides.pdf · 2014-10-11 · Measuring Surprise in Recommender Systems](https://reader034.vdocuments.net/reader034/viewer/2022042402/5f1350460a41164f04607037/html5/thumbnails/14.jpg)
Thank you
• Questions?