informationssuche in sozialen netzen ralf schenkel joint work with tom crecelius, mouna kacimi,...

56
Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

Upload: claude-stone

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

Informationssuche in sozialen Netzen

Ralf Schenkel

Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

Page 2: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Social Tagging NetworksDefinition: Social Tagging NetworkWebsite where people• publish + tag information• review + rate information• publish their interests• maintain network of friends• interact with friends

Common examples:• Flickr (images)• YouTube (videos)• del.icio.us (bookmarks)• Librarything (books)

• Discogs (CDs)• CiteULike (papers)• Facebook• Myspace (media)

Page 3: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Some StatisticsFlickr: (as of Nov 2008)• 3+ billion photos, 3 million new photos per dayFacebook: (as of Nov 2008)• 10+ billion photos, 30+ million new photos per day• 120 million active users• 150,000 new users per day

Myspace: (as of Apr 2007)• 135 million users (6th largest country on Earth)• 2+ billion images (150,000 req/s), millions added daily• 25 million songs• 60TB videos

StudiVZ.net: (as of Nov 2008)• 11 million users• 300 million images, 1 million added daily

Huge volume of highly dynamic data

Page 4: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Showcase: librarything.com

RatingsTagsBooks

Others

Page 5: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

librarything.com: Social Interaction

Explicit Friends

Similar Users

Comments

Page 6: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

librarything.com: Tag Clouds

Page 7: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

librarything.com: Search

Search results independent of the querying user(and the social context)

Search results independent of the querying user(and the social context)

Page 8: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

librarything.com: Search

Search automatically expanded with similar tags(synonyms)

Search automatically expanded with similar tags(synonyms)

Page 9: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Librarything.com: Recommendations

Recommendations depend on user and tags(but not on social context)

Recommendations depend on user and tags(but not on social context)

Page 10: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Librarything.com: Recommendations

Explanation for the recommendationExplanation for the recommendation

Page 11: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Librarything.com: Explanations

Page 12: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Librarything.com: Explanations

Page 13: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks– Graph Model

– Different Information Needs

• Effective Query Scoring

• Efficient Query Evaluation

• Summary & Further Challenges

Page 14: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Querying Social Tagging Networks

travelvldb

travelnorway

Page 15: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Querying Social Tagging Networks

travelvldb

travelnorway

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travel

travelnorway

travelvldb

probabilitydata miningfoundations

Page 16: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Information Need 1: Globally Popular

travelvldb

travelnorway

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travel

travelnorway

travelvldb

probabilitydata miningfoundations

Most frequently tagged items „best“Tags by all users equally important

harry potter

or ?

Page 17: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Information Need 2: Similar Users

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travelvldb

travel

travelnorway

travelnorway

travelvldb

probabilitydata miningfoundations

travel

or ?

Page 18: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Information Need 2: Similar Users

harrypotter

harrypotter

harrypotter

harrypotter

traveltrip

travelicde

travelmexico

travelvldb

travel

travelnorway

travelnorway

travelvldb

probabilitydata miningfoundations

travel

or ?Tags by users with similar tags/items(„brothers in spirit“)

more important

Page 19: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Information Need 3: Trusted Friends

harrypotter

harrypotter

harrypotter

traveltrip

travelicdetravel

vldb

travel

travelnorway

travelnorway

travelvldb

probabilityselling

probabilitydata miningfoundations

probabilityselling

probabilityselling

probability harrypotter

travelmexico

or ?

Page 20: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Information Need 3: Trusted Friends

harrypotter

harrypotter

harrypotter

traveltrip

travelicdetravel

vldb

travel

travelnorway

travelnorway

travelvldb

probabilityselling

probabilitydata miningfoundations

probabilityselling

probabilityselling

probability harrypotter

travelmexico

or ?

Tags by closely related andwell-known users more important

Page 21: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Towards Social-Aware Social Search

Search results may depend on– Global popularity of items– Spiritual context of the querying user

(users with similar books and/or tags)– Social context of the querying user

(known and trusted friends)

Page 22: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks

• Effective Query Scoring– Quantifying Friendship Strengths

– User-specific Scoring Functions

– Experimental Evaluation

• Efficient Query Evaluation

• Summary & Further Challenges

Page 23: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

NotationU set of usersT set of tagsI set of items

tags(u): tags used by user uitems(u): items tagged by user u

items(t): items tagged with tag t by at least one user

df(t): number of items tagged with tag ttfu(i,t): number of times user u tagged item i with tag ttf(i,t): number of times item i was tagged with tag t

Page 24: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Quantifying Friendship Strengths• Global „friendship“ strength:

||

1)',(

UuuPglobal

• Spiritual friendship strength

• Social friendship strength

• Integrated friendship strength

Page 25: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Spritual Friendship Strength

|)'(||)(|

|)'()(|2)',(

utagsutags

utagsutagsuuPspirit

|)'(||)(|

|)'()(|2)',(

uitemsuitems

uitemsuitemsuuPspirit

Several alternatives:

• based on overlap of tag usage:

• based on overlap of tagged items:

For all:

• Pspirit(u,u):

• normalization such that

uu

spirit uuP'

1)',( tags(u): tags used by user uitems(u): items tagged by user u

u u‘

)',( uuPspirit overlap in interests of u and u‘

• overlap of behavior (tagging, searching, rating, …)

u u‘

harrypotterwizard

deathlyhallows

philosopherstone

Page 26: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Graph-Based Friendship Strength

u1

u2

u3

u4

u5

u6

u7

1),( 1 ii uuw

1),(),,(1

11 1

juuwuuwj

kiiii kkj

)(min

1)',(

'path pw

uuPuup

social

• set Psocial(u,u):=0

• normalization such that

uu

social uuP'

1)',(

u2

Pso

cial(

,u‘)

||

1

U

u3 u4 u5 u6 u7u‘

)',( uuPsocial distance of u and u‘ in user network

Page 27: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Integrated Friendship StrengthQuery-dependent mixture of• spiritual friendship strength• social friendship strength• background model (global)

(0,1; +1)

)',()',(||

1)1()',( uuPuuP

UuuF spiritualsocial

Pint(u,u‘)

Page 28: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Excursion: Scoring in Text Retrieval

)(),(),( tidftitftiscore

Importance of t in the collection(the less frequent, the better)

Importance of t for item i(the more frequent, the better)

General scoring framework:

5.0)(

5.0)(||log

),(

),()1(),(

1

1

tdf

tdfI

titfk

titfktiscore

Hand-tuned instance: Okapi BM25

n

jjn tiscorettiscore

11 ),(),(

Linear combination for query scores

Page 29: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Towards a User-specific Score

Uu

u titftitf ),(),(

Uu

u titfU

U ),(||

1||

Uu

uu titfuuFUtisf'

),()',(||),(

Convert into user-specific social frequency:

global friendship strength

5.0)(

5.0)(||log

),(

),()1(),(

1

1

tdf

tdfI

tisfk

tisfktiscore

u

uu

Compute user-specific social score

[SIGIR 2008]

Page 30: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Including Tag Expansion

Problem: Users use different tags for similar things poor recall (missing relevant results)

Solution:1. Define notion of similar tags2. Expand queries with similar tags3. Modify scoring function for expanded queries

Example:MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, MMCI, UdS, Saarland University, …

Page 31: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Heuristics for finding similar tagsCo-Occurrence heuristics:Tags t1 and t2 similar if they occur (almost) always together

|)(||)(|

|)()(|2),(

21

2121 titemstitems

titemstitemsttsim

|)(|

|)()(|]|[),(

2

212121 titems

titemstitemsttPttsim

Specialization heuristics:Tag t2 specialization of t1 if t1 occurs (almost) whenever t2 occurs

Example: t1=Europe, t2=Germany

Page 32: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Scoring Expanded QueriesNaive approach:For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query

Better: auto-tuning incremental expansionFor query tag t, consider only expansion withhighest combined score per item

)',()',(max),('

tiscorettsimtiscoreTt

„international crime“ expanded by „mafia camorra yakuza …“ But:„transportation disaster“ expanded by „train car bus plane …“Result quality drops due to topic drift

Page 33: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Experimental Evaluation: Effectiveness

Systematic evaluation of result quality difficult

Three possible setups:• Manual queries + human assessments• Queries+assessments derived from external info

(ex: DMOZ categories)• Automated assessments from context of user

– Items tagged by friends– Items tagged in the future

?

Page 34: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Prototype [VLDB/SIGIR 2008 demo]

Page 35: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Preliminary User StudyLibraryThing user study: [Data Engineering Bulletin, June 2008]• 6 librarything users with reasonably large library and friend sets• Overall 49 queries like „mystery magic“, „wizard“, „yakuza“• Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags,

~12,000 users, ~18,000 friends• Measured NDCG[10]

0.0 0.2 0.5 0.8 1.0

0.0 0.546 0.572 0.568 0.565 0.565

0.2 0.564 0.572 0.579 0.581 -

0.5 0.539 0.552 0.559 - -

0.8 0.515 0.546 - - -

1.0 0.465 - - - -

α (social)

(spiritual)

• Result quality generally very high• Combination of spiritual and social friends is best

Page 36: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks

• Effective Query Scoring

• Efficient Query Evaluation– Threshold Algorithms

– ContextMerge

– Experimental Evaluation

• Summary & Further Challenges

Page 37: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Algorithmic Overview

• Input: query q={t1…tn} for user u, α,

• Output: k items with highest scores

• Goals:– Avoid computing all results– Minimize disk I/O and CPU load– Utilize precomputed information on disk

+ „harry potter“

……………………..

Page 38: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Excursion: Threshold Algorithms for Text IR

Input:• query q={t1…tn}

• lists L(tp) with pairs <i,score(i,tp)>, sorted by score(i,tp)↓

Output: k items with highest aggregated score

Family of Threshold Algorithms:• scan lists in parallel• maintain partial candidate results with score bounds• terminate as soon as top-k results are stable

Page 39: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Example: Top-1 for 2-term query (NRA)L1 L2 top-1 item

min-k:

candidates

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

Page 40: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.9]

0.9

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.9]

L1 L2

Page 41: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.9]

0.9

? 1.0D:

score: [1.0;1.9]

1.0

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.9]

L1 L2

Page 42: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

1.0

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates0.9 ?A:

score: [0.9;1.9]

0.3 ?G:

score: [0.3;1.3]

? 1.0D:

score: [1.0;1.3]A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.3]

L1 L2

Page 43: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

1.0

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.6]

? 1.0D:

score: [1.0;1.3]

0.3 ?G:

score: [0.3;1.0]

No more new candidates considered

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? ??:

score: [0.0;1.0]

L1 L2

Page 44: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

1.0

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.6]

? 1.0D:

score: [1.0;1.3]

Algorithm safely terminates

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? 1.0D:

score: [1.0;1.25]

0.9 ?A:

score: [0.9;1.55]

? 1.0D:

score: [1.0;1.2]

0.9 ?A:

score: [0.9;1.5]

? 1.0D:

score: [1.0;1.2]0.9 0.4A:

score: [1.3;1.3]

1.3

L1 L2

Page 45: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Can we reuse this here?harry

0.95

0.85

0.51

travel

0.87

0.82

0.69

No, scores specific to queryinguser and parameter setting!

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

: harry (=0.2,=0.5)

0.98

0.84

0.45

: harry (=0.0,=0.8)

0.90

0.89

0.56

: harry (=1.0,=0.0)

0.90

0.89

0.56

: harry (=0.5,=0.5)

0.90

0.86

0.64

: harry (=0.0,=1.0)

0.90

0.89

0.56

Number of lists to precompute would explode!(#tags #users parameter space)

Page 46: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Revisiting the Social Frequency

Uu

uu titfuuFUtisf'

),()',(||),(

Uuuint titfuuP

UU

'

),()',(||

1)1(||

Uuuint

Uu

u titfuuPU

titfU

'

),()',(||

),()1(||

Uu

uint titfuuPUtitf'

),()',(||),()1(

independent of user u dependent of user u

Uuuspiritual

Uuusocial titfuuPtitfuuPU

''

),()',(),()',(|| Compute sfu(i,t) on the fly from tf(i,t), friends of

u and their tagged documents

Page 47: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Top-K in Social Networks: ContextMergePrecomputed lists:• ITEMS(t): pairs <i,tf(i,t)>, sorted by tf(i,t)↓

• USERITEMS(u‘,t): pairs <i,tfu‘(i,t)>, unsorted

• FRIENDS(u): pairs <u‘,F(u,u‘)>, sorted by F(u,u‘)↓

ITEMS(harry): 47 32 26

FRIENDS( ): 0.12 0.10 0.085 …

USERITEMS( , harry):

alreadyexist insystems

Page 48: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list

– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results

ITEMS(harry):

47

32

26

FRIENDS( ):

0.12

0.10

0.085

Page 49: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list

– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results

ITEMS(harry):

47

32

26

FRIENDS( ):

0.12

0.10

0.085

User-indeppart of sf:

User-specpart of sf:

47

? |U|

computemin score bound

compute max score bound

Page 50: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list

– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results

ITEMS(harry):

47

32

26

FRIENDS( ):

0.12

0.10

0.085

User-indeppart of sf:

User-specpart of sf:

47

? |U|

User-indeppart of sf:

User-specpart of sf:

?

0.12·|U|

47

|U|

0.88·|U|

Page 51: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Experimental Evaluation: Efficiency• Testbed: 3 large crawls of real social networks

– Flickr: 10 mio pictures, ~50,000 users– Del.icio.us: ~175,000 bookmarks, ~12,000 users– Librarything: ~6.5 mio books, ~10,000 users

• Queries:– 150 frequent tag pairs– for each query pick user with „enough“ results &

friends• Abstract cost measure disk load• Baseline: full merge + sort

Page 52: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Experimental Evaluation: Efficiency (=0)

α

2-8 times better than baseline

Page 53: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Outline

• Search in Social Tagging Networks

• Effective Query Scoring

• Efficient Query Evaluation

• Summary & Further Challenges

Page 54: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Summary• Need for social-aware social search, supporting

– global– social– spiritual

information needs• Social scoring

– integrating global, collection, and social context– including dynamic tag expansion

• ContextMerge: scalable implementation

Page 55: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Further Challenges• Meaningful & common benchmark• Incremental maintenance for high dynamics• Extend to ratings, user weights, item weights, …• Extend to non-tags (like image features)• Automatic query parameterization• Meaningful explanations of results• Exploit dynamics (hot topics, evolving groups,….)

Social-Aware Search & Recommendationsat planet scale

Page 56: Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc

February 2, 2009 Perspektivenvorlesung

Thank you.

Questions?