il im, alexander hars yonsei university, inventivio gmbh
DESCRIPTION
The Effectiveness of Collaboration Filtering Based Recommendation Systems Across Different Domains and Search Modes Does a One-Size Recommendation System Fit All ?. Il Im, Alexander Hars Yonsei University, Inventivio Gmbh - PowerPoint PPT PresentationTRANSCRIPT
The Effectiveness of Collaboration Filtering Based The Effectiveness of Collaboration Filtering Based Recommendation Systems Across Different Recommendation Systems Across Different Domains and Search ModesDomains and Search ModesDoes a One-Size Recommendation System Fit All ?Does a One-Size Recommendation System Fit All ?
Il Im, Alexander Hars
Yonsei University, Inventivio Gmbh
ACM Transactions on Information Systems, Vol. 26, No. 1, Ariticle 4, Nov ACM Transactions on Information Systems, Vol. 26, No. 1, Ariticle 4, Nov 20072007
2008. 03. 28.
Summarized by Jaehui Park, IDS Lab., Seoul National University
Presented by Jaehui Park, IDS Lab., Seoul National University
Copyright 2008 by CEBT
OutlineOutline
Introduction
Past studies on CF
Research issues and Hypothesis
Empirical study
Result and Discussion
Conclusion and Implication
2
Copyright 2008 by CEBT
IntroductionIntroduction
Collaboration Filtering (CF)
One of the major technology for personalization that generates recommendations for users based on others’ evaluation or preferences.
Major limitation
– CF has been used mostly for consumer products Understanding the difference in CF across various domain
– The lack of studies about user side factors The differences in user’s evaluations would affect the accuracy of
recommendations. : e.g. different intention
This article
– compares the differences of recommendations by CF between different domains : research papers, and consumer product
– examines user side factors and their effects on CF systems
3
Copyright 2008 by CEBT
Past studies on CFPast studies on CF
Goldberg et al. [1992] applied the technology for IR
Miller et al. [1997] generated recommendations for users based on the evaluations of others with similar profiles using the ratings of an appropriate reference group rather than the
average rating of al users.
Main stream Focused on algorithms for generating recommendations
Focused on the applications and use of CF
Shortcoming of past CF
– There has been little research about how the effectiveness of CF might vary in these different domains Mainly consumer products, such as CDs and movies, use CF
Don’t have much text information
Have little attributes
– Assumption that users’ evaluations remain constant E.g. if Tom liked “Star Wars”, he should like it forever in any occasion
[Miller et al. 1997]
4
Copyright 2008 by CEBT
Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment
Many factors may affect the accuracy of CF
Hypothesis 1 The accuracy of a CF system increases as the total number of
users increase.
– The probability of finding people with similar preferences.
– critical mass : A certain number of people for certain level of recommendation:
– The accuracy may increase in different patterns depending on the product domains and other factors
–
5
Copyright 2008 by CEBT
Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment
Hypothesis 2 The accuracy of CF as a function of the number of users will be greater
for knowledge domains, such as research papers, than for consumer product domains, such as movies.
– Preference heterogeneity : the pattern of preference of consumer
– Different levels of heterogeneity may result in the different patterns in H1’s figure.
–
– The people’s preferences in a movie domain is more homogeneous than that of a research paper Loosely-coupled cluster will result in less accurate recommendations than
tightly-coupled clusters
6
Copyright 2008 by CEBT
Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment
Hypothesis 3
After some threshold, the accuracy of CF as a function of the users will be greater for the problemistic search mode than for the scanning mode.
– “What types of motivation do people have when conducting an information search?” [Vandenbosch and Huff 1997] [El Sawy 1985]’s categorization
Scanning : browsing through data in order to understand trends or sharpen their general understanding of the business (without specific questions)
Problemistic search : stimulated by a problem and directed towards any particular problem (with specific questions)
In the scanning mode, users’ evaluations would be more homogeneous More overlaps in users’ interests
In the problemistic mode, heterogeneous
– Performance argument higher performance in scanning mode than problemistic mode : Similar criteria
-> higher correlation -> higher performance [Miller 1997]
higher performance in problemistic mode than scanning mode : In heterogeneity domains (e.g. problemistic search), each cluster will have high correlations
– Critical mass may resolve this7
Copyright 2008 by CEBT
Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment
Hypothesis 4
The accuracy of a CF system is better for the users in a same search mode than for users in mixed search modes.
– If users in different search modes were in mixed mindsets, the recommendations would not be as accurate as for the users in a same search mode because their evaluations were from different evaluation criteria
8
Copyright 2008 by CEBT
Empirical StudyEmpirical Study
Setting
Data from two domains : movie and research paper
– 492 movies and 2000 abstracts of academic articles (IS Journal)
Similarity index : correlation coefficient
Reference group selection : best-n-neighbor
Users’ evaluation criteria
– Movies Scanning mode : ‘in general’
Problemistic search : ‘for the specific occasion chosen’
– Papers Scanning mode
Overall usefulness
Relevance of the paper for general IS research
Problemistic search Usefulness
Relevalce of the paper for the subject’s specific research project
Accuracy calculation : Simulation method
Accuracy measures : MAE, NMAE9
Copyright 2008 by CEBT
Results and DiscussionsResults and Discussions
People evaluate items with broader (higher average) but similar (smaller standard deviations) criteria in the scanning mode and with narrow and diverse criteria in the problemistic search mode
Avg evaluation : Scanning mode > Problemistic search mode
Std Dev : Scanning mode < Problemistic search mode
The research papers received lower ratings than the movies
The research paper is probably a more heterogeneous domain 10
Copyright 2008 by CEBT
Results and DiscussionsResults and Discussions
Number of Users and the Accuracy of CF Systems
11
Copyright 2008 by CEBT
Results and DiscussionsResults and Discussions
Number of Users and the Accuracy of CF Systems (EachMovie)
12
Copyright 2008 by CEBT
Results and DiscussionsResults and Discussions
Number of Users and the Accuracy of CF Systems (Book-Crossing)
13
Copyright 2008 by CEBT
Results and DiscussionsResults and Discussions
the Accuracy of CF Systems
14
Copyright 2008 by CEBT
Results and DiscussionsResults and Discussions
Mode of Search
15
Copyright 2008 by CEBT
Results and DiscussionsResults and Discussions
Summary
16
Copyright 2008 by CEBT
Conclusion and ImplicationConclusion and Implication
Identifying key factors that would influence the accuracy of CF systems
Investigation the impact of those factors on accuracy
Limitation
Domain selection, Data-set size, Book-crossing data-set
Subjects selection, evaluation scale
Implication
The performance of CF systems is not domain-independent.
– Pilot test to estimate the suitability for the intended domain
The search mode of the users strongly influences the accuracy of the results.
– Collecting information about user’s search mode is not easy
Future research direction
More research on other product domain
How the patterns of evaluations affect the accuracy of CF system
How search modes can be identified with minimal intrusion to users17