comparing collaborative and content- based filtering for...
TRANSCRIPT
![Page 1: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/1.jpg)
Comparing Collaborative and Content-based Filtering for Recommendation on
Social Bookmarking Websites
Toine Bogers and Antal van den Bosch
ILK / TiCC Tilburg University
![Page 2: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/2.jpg)
Overview
• Recommendation task + data sets
• What information sources do we have?
– Usage patterns
– Tags
– Metadata
• Recommendations for recommendation
• What is it?
• What did we do with it? • What did we find?
![Page 3: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/3.jpg)
Recommendation task & data sets
• Focused on Top-N item recommendation for social bookmarking websites
• Four data sets
– (bookmarks)
– (bookmarks)
– CiteULike (scientific articles)
– BibiSoomy (scientific articles)
• Evaluated using Mean Average Precision (MAP)
![Page 4: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/4.jpg)
Usage patterns�What is it?
• Represent the items that users have added to their profiles
• Profile vectors – User profiles – Item profiles
• No explicit ratings available – Only binary information (1 or 0) – Or rather: unary!
UI
items
users
![Page 5: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/5.jpg)
Usage patterns�What did we do with it?
• Baseline: standard k-NN algorithm – User-based CF vs. item-based CF
– Cosine similarity – Unweighted vs. IDF-weighted profile vectors
![Page 6: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/6.jpg)
Usage patterns�What did we find?
• User-based vs. item-based – User-based CF slightly better on three data sets
– Not statistically significant – Item-based CF significantly better on CiteULike
• Bookmarks vs. scientific articles – Recommending bookmarks is more difficult
– More open domain and greater topical diversity
• IDF-weighting had no effect
![Page 7: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/7.jpg)
• Tags are keywords assigned to an item by a user • Profile vectors – User tag profiles
– Item tag profiles
• Values are tag occurrence counts
Tags�What is it?
UT
tags
users
IT
tags
items
![Page 8: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/8.jpg)
Tags�What did we do with it?
• Tag overlap between users/items as similarity – User-based vs. item-based filtering
– Similarity metrics • Jaccard overlap
• Dice’s coefficient • Cosine similarity
– Unweighted vs. IDF-weighted profiles (for cosine)
![Page 9: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/9.jpg)
Tags�What did we find?
• CF with tag overlap – User-based CF performs significantly worse
– Item-based CF performs much better • Often statistically significant improvements
– Except on CiteULike: CF without tags better
• Similarity metric relatively unimportant – Cosine similarity slightly better
• IDF-weighting again had no effect
![Page 10: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/10.jpg)
Metadata�What is it?
• Textual description of different aspects of an item • Examples – Bookmarks: <TITLE>, <URL>, <DESCRIPTION>, ...
– Scientific articles: <JOURNAL>, <YEAR>, <ABSTRACT>, ...
• Two types of metadata – Intrinsic, i.e., directly relating to the content
• E.g., <TITLE>, <DESCRIPTION>, <JOURNAL>, <AUTHOR>, ...
– Extrinsic, i.e., administrative information • E.g., <PAGES>, <MONTH>, <EDITION>, ...
![Page 11: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/11.jpg)
• Content-based filtering – Profile-centric matching • Collate all of user’s metadata into a user profile • All metadata assigned to an item → item profile
• Match and rank item profiles to user profiles
– Post-centric matching • Construct metadata representations of each post
• Match each of the user’s posts against all other posts • Match, rank, and aggregate all retrieved posts
Metadata�What did we do with it?
!"#$%$%&'$()*'+",-.)/0123)'4/)"'+",-.)/
!"#$%&'(&)*"+(,-.*(/+)0
/$*$.#"$(5
*#(16$%&78 9
: 0 ;
< = ;
> ;
()/('+#$"/("#$%$%&'+#$"/
7
0
9
0
:
0
9
=
<
=
7
;
:
;
>
;
<
;
7
8
9
8
:
8
>
8
![Page 12: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/12.jpg)
• Content-based filtering – Profile-centric matching • Collate all of user’s metadata into a user profile • All metadata assigned to an item → item profile
• Match and rank item profiles to user profiles
– Post-centric matching • Construct metadata representations of each post
• Match each of the user’s posts against all other posts • Match, rank, and aggregate all retrieved posts
Metadata�What did we do with it?
!"#$%$%&'()*+*,-./0'1*0"2*'()*+*
!"#$%&'()*+,(-.*$/0(*1.,2
*$3$4#"$+5
3#+-6$%&
7
7
7
8
9'9'9
:
:
:
:
8
;
8
8
9'9'9
,
,
<
,
+0*+'(#$"*+"#$%$%&'(#$"*
7
,
8
,
;
,
8
<
=
<
7
>
;
>
?
>
=
>
7
:
8
:
;
:
?
:
![Page 13: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/13.jpg)
Metadata�What did we do with it?
• Hybrid filtering – Combine CF with metadata-based approach
– User-based CF with metadata-based similarities • Textual similarity between user profiles
– Item-based CF with metadata-based similarities • Textual similarity between item profiles
![Page 14: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/14.jpg)
Metadata�What did we find?
• Content-based filtering – Profile-level matching better than post-level
• Hybrid filtering – Item-based CF with metadata similarities works best
• No clear winner over all data sets • Metadata – All intrinsic metadata combined works best – Best fields: <TAGS>, <TITLE>, <AUTHOR>, <URL>, <ABSTRACT>
– Extrinsic metadata contributes little
![Page 15: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/15.jpg)
Recommendations for recommendation
• Using tag overlap in item-based CF works well – Easy to implement/adapt
• Metadata-based recommendation often better than CF – Not significantly – No clear winning algorithm – Easiest to implement using existing search engine
• Recommender fusion is promising – Investigate different combination techniques
![Page 16: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/16.jpg)
Questions? Comments? �Recommendations?
![Page 17: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/17.jpg)
Recommendation task
!"#$%"&%'("&
)"
*+")&
,"-#))"./
012#.
!"#$%"&
$,#3%'.4
*+")&
"5$",+6
7#,"&
%'("&+8'6&
914&
6:44"62#.
;#)1'.
"5$",+6
!",6#.1%'<"0&
6"1,-8
;"$+8&
=,#>6'.4
?@AB
*9A7
9CD
?@AB *9A7 9CD
!"#$%&$'''
()*&"$+$$'''
![Page 18: Comparing Collaborative and Content- based Filtering for ...toinebogers.com/content/slides/200910-comparing-CF-and-CBF.pdf · Comparing Collaborative and Content-based Filtering for](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa662bf0dfcc335b439d0de/html5/thumbnails/18.jpg)
Data sets
• Evaluated using Mean Average Precision (MAP)
Delicious BibSonomy CiteULike BibSonomy
# users 1,243 192 1,322 167
# items 152,698 11,165 38,419 12,982
# tags 42,820 13,233 28,312 5,165
# posts 238,070 29,096 84,637 29,720
Scientific articles Bookmarks