informationssuche in sozialen netzen ralf schenkel joint work with tom crecelius, mouna kacimi,...

Post on 29-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Informationssuche in sozialen Netzen

Ralf Schenkel

Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

February 2, 2009 Perspektivenvorlesung

Social Tagging NetworksDefinition: Social Tagging NetworkWebsite where people• publish + tag information• review + rate information• publish their interests• maintain network of friends• interact with friends

Common examples:• Flickr (images)• YouTube (videos)• del.icio.us (bookmarks)• Librarything (books)

• Discogs (CDs)• CiteULike (papers)• Facebook• Myspace (media)

February 2, 2009 Perspektivenvorlesung

Some StatisticsFlickr: (as of Nov 2008)• 3+ billion photos, 3 million new photos per dayFacebook: (as of Nov 2008)• 10+ billion photos, 30+ million new photos per day• 120 million active users• 150,000 new users per day

Myspace: (as of Apr 2007)• 135 million users (6th largest country on Earth)• 2+ billion images (150,000 req/s), millions added daily• 25 million songs• 60TB videos

StudiVZ.net: (as of Nov 2008)• 11 million users• 300 million images, 1 million added daily

Huge volume of highly dynamic data

February 2, 2009 Perspektivenvorlesung

Showcase: librarything.com

RatingsTagsBooks

Others

February 2, 2009 Perspektivenvorlesung

librarything.com: Social Interaction

Explicit Friends

Similar Users

Comments

February 2, 2009 Perspektivenvorlesung

librarything.com: Tag Clouds

February 2, 2009 Perspektivenvorlesung

librarything.com: Search

Search results independent of the querying user(and the social context)

Search results independent of the querying user(and the social context)

February 2, 2009 Perspektivenvorlesung

librarything.com: Search

Search automatically expanded with similar tags(synonyms)

Search automatically expanded with similar tags(synonyms)

February 2, 2009 Perspektivenvorlesung

Librarything.com: Recommendations

Recommendations depend on user and tags(but not on social context)

Recommendations depend on user and tags(but not on social context)

February 2, 2009 Perspektivenvorlesung

Librarything.com: Recommendations

Explanation for the recommendationExplanation for the recommendation

February 2, 2009 Perspektivenvorlesung

Librarything.com: Explanations

February 2, 2009 Perspektivenvorlesung

Librarything.com: Explanations

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks– Graph Model

– Different Information Needs

• Effective Query Scoring

• Efficient Query Evaluation

• Summary & Further Challenges

February 2, 2009 Perspektivenvorlesung

Querying Social Tagging Networks

travelvldb

travelnorway

February 2, 2009 Perspektivenvorlesung

Querying Social Tagging Networks

travelvldb

travelnorway

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travel

travelnorway

travelvldb

probabilitydata miningfoundations

February 2, 2009 Perspektivenvorlesung

Information Need 1: Globally Popular

travelvldb

travelnorway

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travel

travelnorway

travelvldb

probabilitydata miningfoundations

Most frequently tagged items „best“Tags by all users equally important

harry potter

or ?

February 2, 2009 Perspektivenvorlesung

Information Need 2: Similar Users

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travelvldb

travel

travelnorway

travelnorway

travelvldb

probabilitydata miningfoundations

travel

or ?

February 2, 2009 Perspektivenvorlesung

Information Need 2: Similar Users

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travelvldb

travel

travelnorway

travelnorway

travelvldb

probabilitydata miningfoundations

travel

or ?Tags by users with similar tags/items(„brothers in spirit“)

more important

February 2, 2009 Perspektivenvorlesung

Information Need 3: Trusted Friends

harrypotter

harrypotter

harrypotter

traveltrip

travelicdetravel

vldb

travel

travelnorway

travelnorway

travelvldb

probabilityselling

probabilitydata miningfoundations

probabilityselling

probabilityselling

probability harrypotter

travelmexico

or ?

February 2, 2009 Perspektivenvorlesung

Information Need 3: Trusted Friends

harrypotter

harrypotter

harrypotter

traveltrip

travelicdetravel

vldb

travel

travelnorway

travelnorway

travelvldb

probabilityselling

probabilitydata miningfoundations

probabilityselling

probabilityselling

probability harrypotter

travelmexico

or ?

Tags by closely related andwell-known users more important

February 2, 2009 Perspektivenvorlesung

Towards Social-Aware Social Search

Search results may depend on– Global popularity of items– Spiritual context of the querying user

(users with similar books and/or tags)– Social context of the querying user

(known and trusted friends)

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks

• Effective Query Scoring– Quantifying Friendship Strengths

– User-specific Scoring Functions

– Experimental Evaluation

• Efficient Query Evaluation

• Summary & Further Challenges

February 2, 2009 Perspektivenvorlesung

NotationU set of usersT set of tagsI set of items

tags(u): tags used by user uitems(u): items tagged by user u

items(t): items tagged with tag t by at least one user

df(t): number of items tagged with tag ttfu(i,t): number of times user u tagged item i with tag ttf(i,t): number of times item i was tagged with tag t

February 2, 2009 Perspektivenvorlesung

Quantifying Friendship Strengths• Global „friendship“ strength:

||

1)',(

UuuPglobal

• Spiritual friendship strength

• Social friendship strength

• Integrated friendship strength

February 2, 2009 Perspektivenvorlesung

Spritual Friendship Strength

|)'(||)(|

|)'()(|2)',(

utagsutags

utagsutagsuuPspirit

|)'(||)(|

|)'()(|2)',(

uitemsuitems

uitemsuitemsuuPspirit

Several alternatives:

• based on overlap of tag usage:

• based on overlap of tagged items:

For all:

• Pspirit(u,u):

• normalization such that

uu

spirit uuP'

1)',( tags(u): tags used by user uitems(u): items tagged by user u

u u‘

)',( uuPspirit overlap in interests of u and u‘

• overlap of behavior (tagging, searching, rating, …)

u u‘

harrypotterwizard

deathlyhallows

philosopherstone

February 2, 2009 Perspektivenvorlesung

Graph-Based Friendship Strength

u1

u2

u3

u4

u5

u6

u7

1),( 1 ii uuw

1),(),,(1

11 1

juuwuuwj

kiiii kkj

)(min

1)',(

'path pw

uuPuup

social

• set Psocial(u,u):=0

• normalization such that

uu

social uuP'

1)',(

u2

Pso

cial(

,u‘)

||

1

U

u3 u4 u5 u6 u7u‘

)',( uuPsocial distance of u and u‘ in user network

February 2, 2009 Perspektivenvorlesung

Integrated Friendship StrengthQuery-dependent mixture of• spiritual friendship strength• social friendship strength• background model (global)

(0,1; +1)

)',()',(||

1)1()',( uuPuuP

UuuF spiritualsocial

Pint(u,u‘)

February 2, 2009 Perspektivenvorlesung

Excursion: Scoring in Text Retrieval

)(),(),( tidftitftiscore

Importance of t in the collection(the less frequent, the better)

Importance of t for item i(the more frequent, the better)

General scoring framework:

5.0)(

5.0)(||log

),(

),()1(),(

1

1

tdf

tdfI

titfk

titfktiscore

Hand-tuned instance: Okapi BM25

n

jjn tiscorettiscore

11 ),(),(

Linear combination for query scores

February 2, 2009 Perspektivenvorlesung

Towards a User-specific Score

Uu

u titftitf ),(),(

Uu

u titfU

U ),(||

1||

Uu

uu titfuuFUtisf'

),()',(||),(

Convert into user-specific social frequency:

global friendship strength

5.0)(

5.0)(||log

),(

),()1(),(

1

1

tdf

tdfI

tisfk

tisfktiscore

u

uu

Compute user-specific social score

[SIGIR 2008]

February 2, 2009 Perspektivenvorlesung

Including Tag Expansion

Problem: Users use different tags for similar things poor recall (missing relevant results)

Solution:1. Define notion of similar tags2. Expand queries with similar tags3. Modify scoring function for expanded queries

Example:MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, MMCI, UdS, Saarland University, …

February 2, 2009 Perspektivenvorlesung

Heuristics for finding similar tagsCo-Occurrence heuristics:Tags t1 and t2 similar if they occur (almost) always together

|)(||)(|

|)()(|2),(

21

2121 titemstitems

titemstitemsttsim

|)(|

|)()(|]|[),(

2

212121 titems

titemstitemsttPttsim

Specialization heuristics:Tag t2 specialization of t1 if t1 occurs (almost) whenever t2 occurs

Example: t1=Europe, t2=Germany

February 2, 2009 Perspektivenvorlesung

Scoring Expanded QueriesNaive approach:For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query

Better: auto-tuning incremental expansionFor query tag t, consider only expansion withhighest combined score per item

)',()',(max),('

tiscorettsimtiscoreTt

„international crime“ expanded by „mafia camorra yakuza …“ But:„transportation disaster“ expanded by „train car bus plane …“Result quality drops due to topic drift

February 2, 2009 Perspektivenvorlesung

Experimental Evaluation: Effectiveness

Systematic evaluation of result quality difficult

Three possible setups:• Manual queries + human assessments• Queries+assessments derived from external info

(ex: DMOZ categories)• Automated assessments from context of user

– Items tagged by friends– Items tagged in the future

?

February 2, 2009 Perspektivenvorlesung

Prototype [VLDB/SIGIR 2008 demo]

February 2, 2009 Perspektivenvorlesung

Preliminary User StudyLibraryThing user study: [Data Engineering Bulletin, June 2008]• 6 librarything users with reasonably large library and friend sets• Overall 49 queries like „mystery magic“, „wizard“, „yakuza“• Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags,

~12,000 users, ~18,000 friends• Measured NDCG[10]

0.0 0.2 0.5 0.8 1.0

0.0 0.546 0.572 0.568 0.565 0.565

0.2 0.564 0.572 0.579 0.581 -

0.5 0.539 0.552 0.559 - -

0.8 0.515 0.546 - - -

1.0 0.465 - - - -

α (social)

(spiritual)

• Result quality generally very high• Combination of spiritual and social friends is best

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks

• Effective Query Scoring

• Efficient Query Evaluation– Threshold Algorithms

– ContextMerge

– Experimental Evaluation

• Summary & Further Challenges

February 2, 2009 Perspektivenvorlesung

Algorithmic Overview

• Input: query q={t1…tn} for user u, α,

• Output: k items with highest scores

• Goals:– Avoid computing all results– Minimize disk I/O and CPU load– Utilize precomputed information on disk

+ „harry potter“

……………………..

February 2, 2009 Perspektivenvorlesung

Excursion: Threshold Algorithms for Text IR

Input:• query q={t1…tn}

• lists L(tp) with pairs <i,score(i,tp)>, sorted by score(i,tp)↓

Output: k items with highest aggregated score

Family of Threshold Algorithms:• scan lists in parallel• maintain partial candidate results with score bounds• terminate as soon as top-k results are stable

February 2, 2009 Perspektivenvorlesung

Example: Top-1 for 2-term query (NRA)L1 L2 top-1 item

min-k:

candidates

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

February 2, 2009 Perspektivenvorlesung

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.9]

0.9

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.9]

L1 L2

February 2, 2009 Perspektivenvorlesung

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.9]

0.9

? 1.0D:

score: [1.0;1.9]

1.0

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.9]

L1 L2

February 2, 2009 Perspektivenvorlesung

1.0

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates0.9 ?A:

score: [0.9;1.9]

0.3 ?G:

score: [0.3;1.3]

? 1.0D:

score: [1.0;1.3]A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.3]

L1 L2

February 2, 2009 Perspektivenvorlesung

1.0

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.6]

? 1.0D:

score: [1.0;1.3]

0.3 ?G:

score: [0.3;1.0]

No more new candidates considered

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.0]

L1 L2

February 2, 2009 Perspektivenvorlesung

1.0

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.6]

? 1.0D:

score: [1.0;1.3]

Algorithm safely terminates

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? 1.0D:

score: [1.0;1.25]

0.9 ?A:

score: [0.9;1.55]

? 1.0D:

score: [1.0;1.2]

0.9 ?A:

score: [0.9;1.5]

? 1.0D:

score: [1.0;1.2]0.9 0.4A:

score: [1.3;1.3]

1.3

L1 L2

February 2, 2009 Perspektivenvorlesung

Can we reuse this here?harry

0.95

0.85

0.51

travel

0.87

0.82

0.69

No, scores specific to queryinguser and parameter setting!

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

Number of lists to precompute would explode!(#tags #users parameter space)

February 2, 2009 Perspektivenvorlesung

Revisiting the Social Frequency

Uu

uu titfuuFUtisf'

),()',(||),(

Uuuint titfuuP

UU

'

),()',(||

1)1(||

Uuuint

Uu

u titfuuPU

titfU

'

),()',(||

),()1(||

Uu

uint titfuuPUtitf'

),()',(||),()1(

independent of user u dependent of user u

Uuuspiritual

Uuusocial titfuuPtitfuuPU

''

),()',(),()',(|| Compute sfu(i,t) on the fly from tf(i,t), friends of

u and their tagged documents

February 2, 2009 Perspektivenvorlesung

Top-K in Social Networks: ContextMergePrecomputed lists:• ITEMS(t): pairs <i,tf(i,t)>, sorted by tf(i,t)↓

• USERITEMS(u‘,t): pairs <i,tfu‘(i,t)>, unsorted

• FRIENDS(u): pairs <u‘,F(u,u‘)>, sorted by F(u,u‘)↓

ITEMS(harry): 47 32 26

FRIENDS( ): 0.12 0.10 0.085 …

USERITEMS( , harry):

alreadyexist insystems

February 2, 2009 Perspektivenvorlesung

ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list

– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results

ITEMS(harry):

47

32

26

FRIENDS( ):

0.12

0.10

0.085

February 2, 2009 Perspektivenvorlesung

ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list

– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results

ITEMS(harry):

47

32

26

FRIENDS( ):

0.12

0.10

0.085

User-indeppart of sf:

User-specpart of sf:

47

? |U|

computemin score bound

compute max score bound

February 2, 2009 Perspektivenvorlesung

ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list

– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results

ITEMS(harry):

47

32

26

FRIENDS( ):

0.12

0.10

0.085

User-indeppart of sf:

User-specpart of sf:

47

? |U|

User-indeppart of sf:

User-specpart of sf:

?

0.12·|U|

47

|U|

0.88·|U|

February 2, 2009 Perspektivenvorlesung

Experimental Evaluation: Efficiency• Testbed: 3 large crawls of real social networks

– Flickr: 10 mio pictures, ~50,000 users– Del.icio.us: ~175,000 bookmarks, ~12,000 users– Librarything: ~6.5 mio books, ~10,000 users

• Queries:– 150 frequent tag pairs– for each query pick user with „enough“ results &

friends• Abstract cost measure disk load• Baseline: full merge + sort

February 2, 2009 Perspektivenvorlesung

Experimental Evaluation: Efficiency (=0)

α

2-8 times better than baseline

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks

• Effective Query Scoring

• Efficient Query Evaluation

• Summary & Further Challenges

February 2, 2009 Perspektivenvorlesung

Summary• Need for social-aware social search, supporting

– global– social– spiritual

information needs• Social scoring

– integrating global, collection, and social context– including dynamic tag expansion

• ContextMerge: scalable implementation

February 2, 2009 Perspektivenvorlesung

Further Challenges• Meaningful & common benchmark• Incremental maintenance for high dynamics• Extend to ratings, user weights, item weights, …• Extend to non-tags (like image features)• Automatic query parameterization• Meaningful explanations of results• Exploit dynamics (hot topics, evolving groups,….)

Social-Aware Search & Recommendationsat planet scale

February 2, 2009 Perspektivenvorlesung

Thank you.

Questions?

top related