setting goals and choosing metrics for recommender system evaluations

13
Setting Goals and Choosing Metrics for Recommender System Evaluations Gunnar Schröder, Maik Thiele, Wolfgang Lehner Gunnar Schröder T-Systems Multimedia Solutions Dresden University of Technology UCERSTI 2 Workshop at the 5th ACM Conference on Recommender Systems Chicago, October 23th, 2011

Upload: gunnar-fabritius-schroeder

Post on 20-Jan-2015

1.378 views

Category:

Education


0 download

DESCRIPTION

Recommender systems have become an important personalization technique on the web and are widely used especially in e-commerce applications. However, operators of web shops and other platforms are challenged by the large variety of available algorithms and the multitude of their possible parameterizations. Since the quality of the recommendations that are given can have a significant business impact, the selection of a recommender system should be made based on well-founded evaluation data. The literature on recommender system evaluation offers a large variety of evaluation metrics but provides little guidance on how to choose among them. The paper which is presented in this presentation focuses on the often neglected aspect of clearly defining the goal of an evaluation and how this goal relates to the selection of an appropriate metric. We discuss several well-known accuracy metrics and analyze how these reflect different evaluation goals. Furthermore we present some less well-known metrics as well as a variation of the area under the curve measure that are particularly suitable for the evaluation of recommender systems in e-commerce applications.

TRANSCRIPT

Page 1: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender

System EvaluationsGunnar Schröder, Maik Thiele, Wolfgang Lehner

Gunnar SchröderT-Systems Multimedia SolutionsDresden University of Technology

UCERSTI 2 Workshopat the 5th ACM Conference on

Recommender SystemsChicago, October 23th, 2011

Page 2: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

How Do You Evaluate Recommender Systems?

Qualitative TechniquesQuantitative Techniques

RMSE

MAE

Precision

Recall

Area under the Curve

ROC Curves

Mean Average Precision

F1-Measure

Accuracy Metrics Non-Accuracy Metrics

User-Centric Evaluation

But why do you do it exactly this way?

Page 3: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

Some of the Issues This Paper Tries to Touch

A large variety of metrics have been published Some metrics are highly correlated [Herlocker 2004] Little guidance for evaluating recommenders and choosing

metrics

Which aspects of the usage scenario and the data influence the choice?

Which metrics are applicable? What do these metrics express? What are differences among them? Which metric represents our use-case best? How much do the metrics suffer from biases?

Page 4: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

Factors That Influence the Choice of Evaluation Metrics

Choice of metrics

Preference dataExplicit Implicit Unary Binary Numerical

Recommender task and interactionPrediction Classification Ranking Similarity Presentation

Objectives for recommender usageBusiness goals User interests

Page 5: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

Major Classes of Evaluation Metrics

Prediction Accuracy Metrics Ranking Accuracy Metrics Classification Accuracy Metrics Non-Accuracy Metrics

5.0 4.8 4.7 4.3 3.8 3.2 2.4 2.1 1.6 1.2

Page 6: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

Why Precision, Recall and F1-Measure May Fool You

Ideal recommender (example a – f) vs. Worst-case recommender (ex. g – l )

Four recommendations (R1 – R4) e.g. Precision@4 Ten items with a varying ratio of relevant items (1 – 9 relevant

items)

Precision, recall and F1-measure are very sensitive to the ratio of relevant items

They fail to distinguish between an ideal recommender and a worst-case recommender if the ratio of relevant items is varied

Figure 3

Page 7: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

What is the Ideal Length for a Top-k Recommendation List?

A typical ranking produced by a recommender on a set of ten item with four items being relevant

The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)

Figure 1

Page 8: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

What is the Ideal Length for a Top-k Recommendation List?

A typical ranking produced by a recommender on a set of ten item with four items being relevant

The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)

1.

2.

2.

2.

3.

part of Figure 1

Page 9: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

What is the Ideal Length for a Top-k Recommendation List?

A typical ranking produced by a recommender on a set of ten item with four items being relevant

The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)

Markedness = Precision + InvPrecision – 1 Informedness = Recall + InvRecall – 1 Matthew’s Correlation =

[Powers 2007]

part of Figure 1

1.

1.1. 2.2.

2.3. 3.

3.

3.

3.

Page 10: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

From Simple Classification Measures to Partial Ranking Measures

Moving a single relevant item among the recommenders ranking (examples a - j)

Idea: Consider both classification and ranking for the top-k recommendations

Area under the Curve => Limited Area under the Curve

Boolean Kendall’s Tau => Limited Boolean Kendall’s Tau

Figure 2

Page 11: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

A Further More Complex Example to Study at Home

Conclusions: For classification use markedness, informedness and

Matthew’s correlation instead of precision, recall and F1 measure

Limited area under the curve and limited boolean Kendall’s tau are useful metrics for top-k recommender evaluations

Figure 4

Page 12: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

Conclusion and Contributions

Important aspects that influence the metric choice Objectives for recommender usage Recommender task and interaction Aspects of preference data

Some problems of Precision, Recall and F1-Measure The advantages of markedness, informedness and Matthew’s

correlation

Two new metrics that measure the ranking of a limited top-k list Limited area under the curve, limited boolean Kendall’s tau

Guidelines for choosing a metric (See paper)

Page 13: Setting Goals and Choosing Metrics for Recommender System Evaluations

Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

Thank You Very Much!

Do not hesitate to contact me, if you have any questions, comments or answers!

Slides are available via e-mail or slideshare