exploring generative models of tripartite graphs for recommendation in social media

20
Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media Charalampos Chelmis , Viktor K Prasanna [email protected] MSM 2013, Paris, France

Upload: charalampos-chelmis

Post on 25-May-2015

348 views

Category:

Technology


1 download

DESCRIPTION

Workshop paper part of the Modeling Social Media 2013 workshop at Hypertext 2013 conference presented in Paris, France on May 1, 2013

TRANSCRIPT

Page 1: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Charalampos Chelmis, Viktor K Prasanna [email protected]

MSM 2013, Paris, France

Page 2: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Introduction • Structure of Tripartite Graphs • Generative Models of Tripartite Graphs • Social Link Classification Schemes • Evaluation • Conclusion

Overview

2

Page 3: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Social Networking is used for Content organization Content sharing

• Multiple media types • Users' activities

Reveal interests and tastes Hidden structure

• Description of Resources Text Tags / Hashtags

• Social Annotation Collective characterization of resources Use of synonyms for similar recourses Same keywords for different recourses

Introduction

3

Page 4: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• How to address issues of synonymy and polysemy? Deal with space size explosion

• How to discover emergent structure in online tagging systems? Hidden topics

• How to capture users’ latent interests? Which subjects a user is mostly interested in? Which users have similar interests?

• How to model the process of social generation of annotations? How to capture the semantics of collaboration

• Why is this useful? Recommend people Recommend Tags / resources Clustering …

Research Questions

4

Page 5: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Set of actors (e.g. users) A={a1, ...,ak} • Set of concepts (e.g. tags) C = {c1, ..., cl} • Set of resources (e.g. photos) R ={r1, ..., rm}

Structure of Tripartite Graphs

5

Page 6: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• The User-Concept Model Users are modeled based on their tag usage φ denotes the matrix of topic distributions

− multinomial distribution over N concepts − T topics being drawn independently

θ: the matrix of user-specific mixture weights for these T topics

• Captures users’ latent interests • Ignores Resources • Ignores the social aspect of tagging

• The User-Resource Model Resources become vocabulary terms

• Tags are ignored • Ignores the social aspect of tagging

Reducing the Tripartite Graph to Bipartite Structures

6

Page 7: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Topic-based representation • Model both resources & users’ interests • Multiple users may annotate resource r

• For each tag a user is chosen uniformly at random • Each user is associated with a distribution over

latent topics ɵ • A topic is chosen from a distribution over topics

specific to that user • The tag is generated from the chosen topic

φt: probability distribution of tags for topic t

The User-Resource-Concept Model

7

Page 8: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Tag Recommendation Automatic annotation enhancement Search improvement

• Clustering Community detection Organization of resources/tags in categories

• Navigation and Visualization Social browsing

• Next we focus on recommending people

Recommendation

8

Page 9: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Classification Based on Latent Interests Measure “tastes” distance with respect to latent topics distribution Pointwise squared distance between feature vectors of users u and v Other measures to consider

− Kullback Leibler (KL) divergence − Cosine similarity

• Objective: Minimize the distance between linked users

• Focus on topical homophily Ignore network effects

• Prior work uses network proximity as indicator of link formation

Social Link Recommendation Using Latent Semantics & Network Structure

9

]v))(k,-u)(k,(,,v))(1,-u)(1,[( v)F(u, 22 ΘΘΘΘ=

F(u,v) = 0 => u,v have identical distributions

F(u,v) > 0 => distributions diverge

Page 10: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Latent Topics & Local Structure CN(u,v) = common neighbors between users u and v

− Simplicity and computational efficiency

Latent topics similarity

• Latent Topics & Global Structure SD(u,v) = shortest distance between users u and v

• Non separable training set => inefficient classifiers • Aggregation Strategy

Reduce the number of training samples Produce more efficient classifiers Average latent similarity of user pairs with k common

neighbors:

Social Link Recommendation Using Latent Semantics & Network Structure

10

v)]CN(u, v),(u,[ v)F(u, σ=

∑==

=k k : pp p

(p)|k k : p|

1 (k) avg σσ

v)]SD(u, v),(u,[ v)F(u, σ=

22 ),(),(

),(),(),(

∑∑∑

ΘΘ

ΘΘ=

tt

t

vtut

vtutvuσ

Page 11: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Objectives Ability to uncover subliminal collective knowledge Evaluate performance of “people” recommendation

• Setting 2.4 GHz Intel Core 2 Duo, 2 GB memory, Windows 7

• Real-world Dataset Last.fm online music system

− social relationships − tagging information − music artist listening information

Statistics − 1,892 users − 25,434 directed user friend relations

− 17,632 artists UR Model vocabulary size − 92,834 user-listened-artist relations

− 11,946 unique tags UC and URC vocabulary size − 186,479 annotations (tuples <user, tag, artist>)

Experimental Analysis

11

Page 12: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Sample Topics

12

Page 13: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Evaluate ability to predict tags/resources on new users Perplexity

• Split dataset into two disjoint sets 90% for training

• Lower perplexity indicates better generalization

• URC better overall Exploits more information

• UC Organizes tags in “clusters”

• UR Inferior quality due to noise

Predictive Power

13

Page 14: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Split dataset into two disjoint sets 10%, 25%, 50%, 75% for training, rest for testing

• Evaluation process Randomly sample 12,716 pairs of users 50% true links, 50% negative samples Compute similarity of user pairs Sort users in decreasing order of similarity Add links between users with highest similarity

Recommendation of Social Ties

14

Page 15: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Latent Topics & Shortest Distance Aggregates all true links training similarity values in a single point Least effective

• Ensemble achieves best precision • Over fitting for training size > 50% • Recall drops as dataset size increases

Recommendation of Social Ties

15

[Latent Topics & Local Structure]

[Latent Topics]

[Ensemble]

Page 16: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• In social media number of true links << absent links • High performance for both classes

True negatives easier to classify correctly Degradation in performance for true positives

• Reasonable results for practical purposes

How about High Class Imbalance?

16

[Latent Topics & Local Structure]

[Latent Topics]

[Ensemble]

Page 17: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Baselines Cosine Similarity (CS) Maximal Information Path (MIP)

• Evaluation Criterion Area under the receiver-operating characteristic curve (AUC)

• Baselines AUC Computed over the complete dataset Biases the evaluation in favor of the baselines CS AUC = 0.6087 MIP AUC = 0.6256

• Same evaluation process as before • Compute performance lift

% change over best performing baseline Positive % denotes improvement

Comparison to Tag-based similarity metrics

17

Page 18: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Not all schemes can beat the baseline For 10% training data ≤10% AUC loss But, significant speedup due to minimal training dataset

• Latent Topics & Local Structure Scheme consistently better

Comparison to Tag-based similarity metrics

18

Training dataset size

[Latent Topics & Local Structure]

[Latent Topics]

Page 19: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Three generative models of tripartite graphs in social tagging systems

• Modeling of users’ interests in a latent space over resources and metadata

• Limitations Ignore several aspects of real-world annotation process, such as topic

correlation and user interaction

• Achieve great performance in the recommendation task Accurate predictors of social ties in conjunction with structural

evidence Proposed aggregation strategy to reduce number of training samples

• Future work Incorporate other types of resources Automatically identify most discriminative latent topics and discard

uninformative resources and metadata

Concluding Remarks

19

Page 20: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Questions? [email protected]

Thank you!

20