karen h. l. tso-sutter, leandro balby marinho* and lars schmidt-thieme information systems and...

46
aren H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thiem Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz 1, 31141 Hildesheim, Germany Tag-aware Recommender Systems by Fusion of Collaborative Filtering Algorithms (Cited by 5) in Proceedings of 23rd Annual ACM Symposium on Applied Computing (SAC'08), Fortaleza, Brazil. Presented by Jun-Ming Chen 2/27/200

Upload: blake-moore

Post on 11-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme

Information Systems and Machine Learning Lab (ISMLL)University of Hildesheim

Samelsonplatz 1, 31141 Hildesheim, Germany

Tag-aware Recommender Systems by Fusion of Collaborative Filtering

Algorithms (Cited by 5)

in Proceedings of 23rd Annual ACM Symposium on Applied Computing (SAC'08), Fortaleza, Brazil.

Presented by Jun-Ming Chen 2/27/2009

Page 2: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

2

Outline

Introduction Collaborative Filtering

Extension with Tags Fusing User-based and Item-based

Experiments Conclusions

Page 3: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

3

Introduction

Tag (Action : tagging) enabling people to easily add metadata to content Improve search mechanisms

Collaborative tagging system folksonomies Web services

flickr, del.icio.us

Advantage better structure the data for browsing provide personalized recommendations fitting the users’ interests

Page 4: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

4

Introduction

Recommender Systems (RS) aim at predicting items or ratings (preference score, 1-5) of items that the user are interested in.

To improve recommendation quality, metadata such as content information of items has typically been used as additional knowledge.

(attribute aware) RS algorithms is typically attached to the items and is usually provided by domain experts

an item always has the same attributes among all users

Page 5: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

5

Introduction

tags are provided by various users not only associated to the items but also to the users act as additional background knowledge to improve RS algorithms

With the increasing popularity of the collaborative tagging systems, tags could be interesting and useful information to enhance RS algorithms.

Page 6: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

6

Introduction

In this paper, we propose To integrate tags in recommender systems

by first extending the user-item matrix and An adapted fusion mechanism to capture the 3-dimensional correlations

between users, items and tags

Empirical evaluations on three CF algorithms with real-life data set demonstrate that incorporating tags to our proposed approach provides promising and significant results

Page 7: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

7

Collaborative Filtering Recommender systems (RS) predict ratings of items or suggest a

list of items that is unknown to the user Two di erent recommendation tasksff

predicting the ratings, i.e. how much a given user will like a particular item predicting the items, i.e. which N items a user will rate, buy or visit next (topN)

Most recommender systems derive recommendations to a user by using opinions from people who have alike tastes, called neighborhood, while concealing the real identity of the users neighborhood.

The prevalent method in practice is Collaborative Filtering (CF)

Page 8: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

8

Collaborative Filtering Its idea is basically the nearest neighbor method

Given some user profiles, it predicts whether a user might be interested in a certain item,

based on a section of other users or items in the database.

There are in general two types of collaborative filtering: user-based item-based

Page 9: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

9

User-Based

Item-Based

Cosine

Probability

clustercustomers

weightitems

Recommondationweighted ordered

Page 10: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

User-Based CF - Existing issues Low Performance when with millions of users and items.(user-

based filtering--KNN)

Low Prediction accuracy when with sparsity of data

2008-9-25 10

Page 11: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

User-Based CF - Solution

Explore item-based collaborative filtering

Method: Build user-item matrix identify relationships between different

items use relationships to make recommendations

In contrast, item-based collaborative filtering analyzes historical transaction information to identify relations between items that

are co-purchased, uses these relations to compute the similarities between items, and make

recommendations based on item similarity.

2008-9-25 11

Page 12: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

User-Based CF - Solution

It constructs a primitive model with significant amount of time, but makes recommendations in a very short time.

It then alleviates the problems of scalability and real- time performance that user-based CF faces.

2008-9-25 12

Page 13: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

13

Collaborative Filtering Notations

Page 14: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

14

Collaborative Filtering In user-based CF [19], recommendations are generated by

considering solely the ratings of users on items, by computing the pairwise similarities between users,

e.g., by means of vector similarity:

[19] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Analysis of recommendation algorithms fore-commerce. In Proceedings of the Second ACM Conference on Electronic Commerce (EC’00), pages 285–295, 2000. (Cited by 634)

w(u,v)

Useru

Userv

Page 15: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

15

Collaborative Filtering - recommendations

In user-based CF, to derive the recommendations for a target user u, usually only similarities of the k most-similar users are selected

(neighborhood – Nu).

When predicting a rating of a given user u for an item i, the weighted sum of the other users are computed by:

Page 16: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

16

Collaborative Filtering A dualistic form of user-based CF is item-based CF [9], where

similarities are computed between each pair of items .

[9] M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions onInformation Systems. Springer-Verlag, 22/1, 2004. (Cited by 168)

w(i,j)

Page 17: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

17

In the case of item-based CF, the prediction would be the average of the ratings of k most-similar items Ni

rated by the given user u.

In similar notation as user-based CF, the prediction for a rating of a given user u for an item i

Collaborative Filtering - recommendations

u

Page 18: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

18

Extension with Tags < item, attribute > < user, item, tag >

We cope with this three dimensionalities by projecting it as three two-dimensional problem, < user, tag > < item, tag > < user, item >

Page 19: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

19

Extension with Tags User tags, are tags that user u,

uses to tag items and are viewed as items in the user-item matrix.

Item tags, are tags that describedescribe an item, i, by users and play the role of users in the user-item matrix

Page 20: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

20

Extension with Tags Let:

Page 21: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

21

Extension with Tags To apply user- and item-based CF after the extension with tags, both

CF algorithms have to be recomputed with the newly extended user-item matrix.

For user-based CF, the new user-item matrix , Ruextend := R + RTu, is represented in a U × Iextend matrix

For item-based CF, the new user-item matrix, Riextend := R + RTi , is represented in a Uextend × I matrix

Ruextend := R + RTu

U × Iextend matrix

Set : Users x (Items + user Tags)

Set : (Users + Item tags) x Items

Page 22: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

22

Fusing User-based and Item-based Tags of an item, are descriptions of the item by one or more than

one users. Thus, tags are not only attached to the item itself but also are depended on the user’s preference.

This suggests that a RS algorithm that is able to capture both user’s and item’s aspect of tags would eventually be a suitable

choice.

We have selected an existing algorithm developed by Wang et al. [21], which fuses the predictions of user- and item-based CF.

[21] J. Wang, A. P. de Vries, and M. J. T. Reinders. Unifying user-based and item-based collaborativefiltering approaches by similarity fusion. In SIGIR ’06: Proceedings of the 29th annual international ACMSIGIR conference on Research and development in information retrieval, pages 501–508, New York, NY,USA, 2006. ACM Press. (Cited by 49)

Page 23: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

23

Fusing User-based and Item-based The fusion of the user- and item-based predictions was done by

computing the sum of the two conditional probabilities that are based on user- and item-based similarities,

which are computed using standard user- and item-based CF. A parameter, λ, is introduced to adjust the significance of the two predictions.

j

λ = 0.4

Page 24: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

24

Fusion for Predicting Item Problem Most systems that use collaborative tagging do not contain rating

information, i.e. only the occurrence, Ou,i {0,1}∈

Yet, the fusion method by Wang et al. [21] does not consider tags. not the predicting item problem

Thus, we propose a fusion algorithm that tackles the predicting item problem and also takes tags into account.

[21] J. Wang, A. P. de Vries, and M. J. T. Reinders. Unifying user-based and item-based collaborativefiltering approaches by similarity fusion. In SIGIR ’06: Proceedings of the 29th annual international ACMSIGIR conference on Research and development in information retrieval, pages 501–508, New York, NY,USA, 2006. ACM Press.

Page 25: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

25

Fusion for Predicting Item Problem For the predicting item problem in user-based CF,

recommendations are a list of items that is ranked by decreasing frequency of occurrence in the ratings of his/her neighbors.

For item-based CF, the topN recommendation suggested by [9] is to compute a list of items that is ranked by decreasing sum of the similarities of neighboring items, Ni, which have been rated by user u.

user-based being the frequency of items, and item-based the similarity of items

[9] M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions onInformation Systems. Springer-Verlag, 22/1, 2004. (Cited by 168)

Page 26: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

26

Fusion for Predicting Item Problem To address the di erent meanings of the values, we normalized the ff

prediction lists to unity

The topN combined prediction list is thus:

Page 27: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

27

Fusion for Predicting Item Problem We deem that this tag-fusion algorithm is a suitable tag-aware RS

algorithm because user tags and item tags provide extra indications of the user’s and item’s

preferences, our adapted fusion approach then brings about both user and item

aspects of the tags concurrently.

In fact, our empirical analysis has shown that our tag-fusion RS approach provides promising results.

Page 28: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

28

Experiments July 2006 by crawling the Last.fm site

1853 items (artists), 2917 users and 2045 tags users preference information

songs the users listen to tags information

user-end tagging of artists, albums, and tracks of the music

Appear at least 10 times in the tag assignments, i.e. (user, resource, tag) triples

Last.fm is a popular internet radio and music community website which allows user to tag the music.

Page 29: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

29

Experiments 10-fold cross validation

split the training data to validation data to optimize the parameters λ(0.4) and k (20), the neighborhood size

focuses on the item prediction problem, which is to predict a fixed number of topN recommendations

(N = 10)

Suitable evaluation metrics are Precision, Recall and F1

Last.fm is a popular internet radio and music community website which allows user to tag the music.

Page 30: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

30

Experiments

It can be seen that the fusion method, both with and without tags, significantly outperform the standard CF models.

Page 31: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

31

Experiments

It is interesting to see that incorporating tags to the baseline models does not improve the recommendation quality at all, in contrast to the promising results of including tags in the fusion method.

Page 32: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

32

Experiments

Applying user/item tags alone does not exploit the characteristic of tags correctly. Hence, attaching tags to standard CF algorithms, does not improve the

performance at all, the tags are then only seen as noise.

By simply extending the standard CF algorithms with tags, it fails to denote the 3-dimensional correlations between user, item and tag, whereas the proposed fusion method has shown to be able to capture

this relationship.

Page 33: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

33

Conclusions We have presented a generic method to include tags to standard

CF algorithms such as user- and item-based CF.

We have found an approach that deals with the 3-dimensional correlation between the users, items and tags by first applying our tag extension mechanism and then an fusion method which we have adapted from a predicting rating

problem to predicting item problem.

Our empirical analysis has shown that the proposed adapted fusion method outperforms standard baseline models, especially with the incorporation of tags.

Moreover, our findings have suggested that our adapted fusion method has successfully captured the relationships between users, items and tags.

Page 34: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

passion darkness obsession

Catherine orphan lovely cover art

Presented by Jun-Ming Chen (Jacky)

IWiLL - 增進學生閱讀學習的 Keywords 活動

Page 35: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

ox 小說

Knowledge

你曾在閱讀中,繞了半天,卻摸不著任何頭緒嗎?

Page 36: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

Keyword

Keyword

KeywordKeyword

Keyword

分享

討論

Knowledge

收穫

ox 小說

Page 37: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz
Page 38: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

When George was about six years old, he was made the wealthy master of a hatchet of which, like most little boys, he was extremely fond. He went about chopping everything that came his way.

One day, as he wandered about the garden amusing himself by hacking his mother's pea sticks, he found a beautiful, young English cherry tree, of which his father was most proud. He tried the edge of his hatchet on the trunk of the tree and barked it so that it died.

Some time after this, his father discovered what had happened to his favorite tree. He came into the house in great anger, and demanded to know who the mischievous person was who had cut away the bark. Nobody could tell him anything about it.

Just then George, with his little hatchet, came into the room.

"George,'' said his father, "do you know who has killed my beautiful little cherry tree yonder in the garden? I would not have taken five guineas for it!''

This was a hard question to answer, and for a moment George was staggered by it, but quickly recovering himself he cried:

"I cannot tell a lie, father, you know I cannot tell a lie! I did cut it with my little hatchet.''

The anger died out of his father's face, and taking the boy tenderly in his arms, he said:

"My son, that you should not be afraid to tell the truth is more to me than a thousand trees! Yes - though they were blossomed with silver and had leaves of the purest gold!''

Page 39: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

When George was about six years old, he was made the wealthy master of a hatchet of which, like most little boys, he was extremely fond. He went about chopping everything that came his way.

One day, as he wandered about the garden amusing himself by hacking his mother's pea sticks, he found a beautiful, young English cherry tree, of which his father was most proud. He tried the edge of his hatchet on the trunk of the tree and barked it so that it died.

Some time after this, his father discovered what had happened to his favorite tree. He came into the house in great anger, and demanded to know who the mischievous person was who had cut away the bark. Nobody could tell him anything about it.

Just then George, with his little hatchet, came into the room.

"George,'' said his father, "do you know who has killed my beautiful little cherry tree yonder in the garden? I would not have taken five guineas for it!''

This was a hard question to answer, and for a moment George was staggered by it, but quickly recovering himself he cried:

"I cannot tell a lie, father, you know I cannot tell a lie! I did cut it with my little hatchet.''

The anger died out of his father's face, and taking the boy tenderly in his arms, he said:

"My son, that you should not be afraid to tell the truth is more to me than a thousand trees! Yes - though they were blossomed with silver and had leaves of the purest gold!''

George Washington

legend

honesty

Augustine Washington

Mary Ball Washington

Page 40: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz
Page 41: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz
Page 42: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz
Page 43: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

透過來自不同的 Keywords 刺激,幫助你思考、再閱讀並發掘出新的想法

加入你的 Keywords

拿起放大鏡,找出並記下

重要的資訊

重要的人、事、時、地、物

文章想傳達的意念

建構出你所認同最快且最正確的解讀

提出問題 & 挑戰自己

發表你的見解 表達你的意見並參與討論

票選最有價值的關鍵字

從互動、分享、交換、評價中,提升對閱讀的理解與故事的掌握

1. 2. 3.

IWiLL - 增進學生閱讀學習的 Keywords 活動

Page 44: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

透過來自不同的 Keywords 刺激,幫助你思考、再閱讀並發掘出新的想法

加入你的 Keywords

1. 2.

Page 45: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

從互動、分享、交換、評價中,提升對閱讀的理解與故事的掌握

3.

Page 46: Karen H. L. Tso-Sutter, Leandro Balby Marinho* and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Samelsonplatz

提出問題 & 挑戰自己

發表你的見解 表達你的意見並參與討論

票選最有價值的關鍵字

從互動、分享、交換、評價中,提升對閱讀的理解與故事的掌握

3.