pharos social map based recommendation for content centric social websites

27
IBM China Research Laboratory 1 IBM Research - China Presenter: Shiwan Zhao ([email protected]) Pharos Team: Advisor: Michelle Zhou, Rongyao Fu, Changyan Chi Social Map Based Recommendation for Content-Centric Social Websites 赵石顽 袁泉 张夏天 郑文涛

Upload: gu-wendong

Post on 05-Dec-2014

3.444 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

1

IBM Research - ChinaPresenter: Shiwan Zhao ([email protected])

Pharos Team:

Advisor: Michelle Zhou, Rongyao Fu, Changyan Chi

Social Map Based Recommendation for Content-Centric Social Websites

赵石顽

袁泉

张夏天

郑文涛

Page 2: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

2

About me

1993~1998– B.S. Computer Science, Tsinghua University

1998~2000 – M.S. Computer Science, Tsinghua University

2000~now– IBM Research - China

2007~now– Focus on recommendation technologies

Page 3: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

3

Agenda

Part 1: – Problem & challenges

– Pharos solution overview

– Demo

Part 2:– Some technology details

Page 4: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

4

Problem

Content-centric social websites (e.g., forums, wikis, and blogs) have flourished with the exponential growth of user-generated information – Overwhelming amount

– Evolving over time

– Not well organized

It is hard for users, especially new users, to grasp what’s out there and then find out interested information

Page 5: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

5

A Blog website contains huge amount of dynamically evolving content (blog entries), while not providing effective navigation approaches– Search

• Be useful when users have well-defined goals

– Recent entries

– Top entries by • most comments• most ratings• most visits

– Featured blog entries

– Tag cloud

– …

Like looking for needles in a haystack, without guidance, novice users can NOT find anything interesting, then leaves BlogCentral quickly (low stickiness), and won’t come back again (low stickiness)

Example

Page 6: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

6

Existing solutions & challenges

Researchers have developed recommender systems to solve this information overload problem – E.g. Blog/News/Webpage recommender

However, current recommenders must address two challenges:– difficult to make effective recommendations for new users

(the cold start problem) due to the lack of user information

– difficult to explain recommendation rationales to end users to make the recommendation more trustworthy

Page 7: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

7

Pharos Solution

Social map creation – Modeling & summarizing

time-sensitive user behaviors of content-centric online sites as a set of “latent communities”

Social map based recommendations – Provide social landmarks

for new users to jump start – Provide personalized social

map for experienced users to effectively navigate the community

Dynamically create a social map helping users find out who's talking about what in an online site.

Page 8: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

8

Demo screenshot

Tom

Alice

Michael

Steve

John

Page 9: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

9

Agenda

Part 1: – Problem & challenges

– Pharos solution overview

– Demo

Part 2:– Some technology details

Page 10: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

10

Triggers

Explicit

Implicit

Content ModelingBehavior Mining

Content ModelingBehavior Mining

Info item (page, fragment)

People (reference to Bluepages, URL)

Community (latent, dynamic community)

User behavior on content

* Multi-faceted recommendation

Social Map

. ... ............

....

. ... ... .. .. ....

..

Time

Recommendation Algorithms

Visual RecommendationExplanations

Time-sensitive social map as recommendation context

target user

Pharos Overview

Page 11: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

11

Pharos Technical Focus

Content ModelingBehavior Mining

Content ModelingBehavior Mining

User behavior on content

Social Map

. ... ............

....

. ... ... .. .. ....

..

Time

Recommendation Algorithms

Visual RecommendationExplanations

target user

2. Community/item/people ranking

3. Community summary

1. Latent community extraction

Page 12: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

12

Latent community extraction

Three approaches– Directly model user-content relationships by using co-

clustering methods

– Group people firstly, then find associated content

– Group content firstly, then find associated people

Page 13: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

13

Approach 1: time-elastic co-clustering

How long of the time window size we should use to mining the communities?

Community Map

.... .. .. .. ......

..

..... ...

...

.. ....

.. . . ..

... .. . . ..

... .. . .. .. .... ........

..... .

... ..

Time

April 2009

.. .. . .. .. ....... . ..

.. . .....

......... ....

. .. . ..... ..

...... ....

. . .

.. .. .. ...... .. .. .. .. ... . ........ . .. . .. ..

..

.. ...

.. . . ..... ... ...

.... .. .. .. . ... ..

Time-Elastic ad hoc community detection

How long is right?

GraphScope: Parameter-free Mining of Large Time-evolving Graphs, Jimeng Sun, et al. KDD’07

Page 14: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

14

Input Data – Graph StreamUser actions as a stream

.... .. .. .. ......

..... ..

...

... .

...

.. . . ..

... .. . . ..

... .. .

.. .. .... ........

..... .

... ..

Time

. . ... . .. . . ..... .... . ... ..... ... . .. .. ...... .. . .. ... . ...... ... .. ... . . ..

... .. .. ......

... ..

..

... .

...

.. . ..

... .. . .... .. .

. .. .... ..

...... ...

. ..

.. ..Time

.. . ... . .. . . ... .

Split click stream into many small time atom frame

A frame click stream data can be presented by a user-item matrix (Graph). – In the matrix, 1 means one

interaction between user and item.

Page 15: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

15

ApproachTwo Step– Co-clustering graphs

– Decide whether a new come graph should be merged with current segment or start a new segment

Based on the MDL (Minimum Description Length) of graphs– MDL is the limit of graphs can be compressed

– Decide merging or splitting a segment• If compress graphs together can save more encoding cost

than compress them respectively, we merge the new graphs with current segment.

• Otherwise, we start a new segment by the new Graph

Page 16: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

16

Pros and consPros – Clustering users and items on the same time– Parameter free

• Don’t need to assign cluster numbers– Automatically decide the size of time window

Cons– Fixed Graph Size

• Any graphs must have the same size (rows and columns)• Can’t handle new users and items

– Can’t handle large scale graphs– Can’t guarantee the optimal result– Result on very sparse graph is not very good

• Communities don’t make sense.• Our data is extremely sparse (< 0.1%)

Page 17: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

17

Approach 2: evolutionary spectral clustering for user clustering

Community Map

TimeJan 2009 Mar 2009 Apr 2009Feb 2009

. ... .. ... . . .. .. ... .. .

. .. .. .. . .. .

.. . . ... .. .

In BlogCentral Domain

Discover communities within a time window– Get high quality clustering in each time window

Model community evolution for a sequence of time windows– Make the evolution between time windows smooth

Page 18: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

18

Evolutionary framework

Basic Idea– Cost Function: Cost = α*CS +β*CT

• Snapshot cost (CS), measures the snapshot quality of the current clustering result with respect to the current data features,

• Temporal cost (CT), measures the temporal smoothness in terms of the goodness-of-fit of the current clustering result with respect to either historic data features or historic clustering results

Two Evolutionary framework– PCQ for preserving cluster quality, the current partition is applied to

historic data and the resulting cluster quality determines the temporal cost.

– PCM for preserving cluster membership, the current partition is directly compared with the historic partition and the resulting difference determines the temporal cost.

– PCQ is our currently implemented framework

Evolutionary Spectral Clustering by Incorporating Temporal Smoothness, Yun Chi, et al. KDD’07

Page 19: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

19

Approach 3: LDA for content clustering

θβθαθβα kN

n znnn dzwpzppp

n

∫ ∏∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛=

=1

),()()(),(w

Latent Dirichlet Allocation (LDA), a probabilistic latent semantic model for topic analysis

[Blei et al. 03]

LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words.

Page 20: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

20

Graphical Model of LDA

Page 21: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

21

Latent community extraction - comparison

Co-clustering– Not work well for extremely sparse data (<0.1%)

Spectral clustering for user– Most behaviors are from anonymous user, difficult to

distinguish users

– Topics are not concentrated for each community

* LDA for content clustering– Users are more likely to be interested in content

Page 22: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

22

Pharos Technical Focus

Content ModelingBehavior Mining

Content ModelingBehavior Mining

User behavior on content

Social Map

. ... ............

....

. ... ... .. .. ....

..

Time

Recommendation Algorithms

Visual RecommendationExplanations

target user

2. Item/people ranking

3. Community summary

1. Latent community extraction

Page 23: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

23

Item/People Ranking

Authority-based ranking by context-sensitive PageRank, considering – Time factor – Context information, e.g., item

attributes, report chain of people

( ) ∑∈

+−=)( )(

)(1)(

ij pMp j

jii pL

pPRdcvdpPR

Context vector (e.g., item attributes)

A 1

B

C

D

2

3

4

Influential people: Active author with high quality entries Influential entry:

written by influential authors, high visited /

commentedAuthority from author to entry

Authority from entry to author

Authority from commenter/rater to entry

Authority from visitor to entry

People Blog entries

Page 24: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

24

Pharos Technical Focus

Content ModelingBehavior Mining

Content ModelingBehavior Mining

User behavior on content

Social Map

. ... ............

....

. ... ... .. .. ....

..

Time

Recommendation Algorithms

Visual RecommendationExplanations

target user

2. Item/people ranking

3. Community summary

1. Latent community extraction

Page 25: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

25

Community Summary & visualization

Visualization – A bubble chart layout (used by ManyEyes2) to pack top-N

communities tightly on the social map• bubble’s size is determined by community’s ‘hotness’

– Inside each community, Wordle3 layout used to pack labels tightly

Community representative keywords extraction – Modified TF/IDF

– Content topic modeling by LDA (Latent Dirichlet Allocation)

Page 26: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

26

Summary

Increase recommendation accuracy– Helps “cold start” problem by providing new users with “social landmarks” of

a social site to jump start their engagement

– Provides users with overall social awareness to compensate for recommendation inaccuracy

Enhance recommendation trustworthiness– Explain recommendation results in the context of a social map

Interactive recommendation– User can navigation through the social map to find what they need

Model, detect, and use a social map that summarizes user behavior of online sites to make accurate and trustworthy recommendations

Page 27: Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

27

Thanks!