informationssuche in sozialen netzen ralf schenkel joint work with tom crecelius, mouna kacimi,...

Informationssuche in sozialen Netzen

Ralf Schenkel

Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

February 2, 2009 Perspektivenvorlesung

Social Tagging NetworksDefinition: Social Tagging NetworkWebsite where people• publish + tag information• review + rate information• publish their interests• maintain network of friends• interact with friends

Common examples:• Flickr (images)• YouTube (videos)• del.icio.us (bookmarks)• Librarything (books)

• Discogs (CDs)• CiteULike (papers)• Facebook• Myspace (media)

Some StatisticsFlickr: (as of Nov 2008)• 3+ billion photos, 3 million new photos per dayFacebook: (as of Nov 2008)• 10+ billion photos, 30+ million new photos per day• 120 million active users• 150,000 new users per day

Myspace: (as of Apr 2007)• 135 million users (6th largest country on Earth)• 2+ billion images (150,000 req/s), millions added daily• 25 million songs• 60TB videos

StudiVZ.net: (as of Nov 2008)• 11 million users• 300 million images, 1 million added daily

Huge volume of highly dynamic data

Showcase: librarything.com

RatingsTagsBooks

Others

librarything.com: Social Interaction

Explicit Friends

Similar Users

Comments

librarything.com: Tag Clouds

librarything.com: Search

Search results independent of the querying user(and the social context)

librarything.com: Search

Search automatically expanded with similar tags(synonyms)

Librarything.com: Recommendations

Recommendations depend on user and tags(but not on social context)

Librarything.com: Recommendations

Explanation for the recommendationExplanation for the recommendation

Librarything.com: Explanations

Outline

• Search in Social Tagging Networks– Graph Model

– Different Information Needs

• Effective Query Scoring

• Efficient Query Evaluation

• Summary & Further Challenges

Querying Social Tagging Networks

travelvldb

travelnorway

Querying Social Tagging Networks

travelvldb

travelnorway

harrypotter

traveltrip

travelicde

travelmexico

travel

travelnorway

travelvldb

probabilitydata miningfoundations

Information Need 1: Globally Popular

travelvldb

travelnorway

harrypotter

traveltrip

travelicde

travelmexico

travel

travelnorway

travelvldb

Most frequently tagged items „best“Tags by all users equally important

harry potter

Information Need 2: Similar Users

harrypotter

traveltrip

travelicde

travelmexico

travelvldb

travel

travelnorway

travelvldb

travel

Information Need 2: Similar Users

harrypotter

traveltrip

travelicde

travelmexico

travelvldb

travel

travelnorway

travelvldb

travel

or ?Tags by users with similar tags/items(„brothers in spirit“)

more important

Information Need 3: Trusted Friends

harrypotter

traveltrip

travelicdetravel

travel

travelnorway

travelvldb

probabilityselling

probability harrypotter

travelmexico

Information Need 3: Trusted Friends

harrypotter

traveltrip

travelicdetravel

travel

travelnorway

travelvldb

probabilityselling

probability harrypotter

travelmexico

Tags by closely related andwell-known users more important

Towards Social-Aware Social Search

Search results may depend on– Global popularity of items– Spiritual context of the querying user

(users with similar books and/or tags)– Social context of the querying user

(known and trusted friends)

Outline

• Search in Social Tagging Networks

• Effective Query Scoring– Quantifying Friendship Strengths

– User-specific Scoring Functions

– Experimental Evaluation

NotationU set of usersT set of tagsI set of items

tags(u): tags used by user uitems(u): items tagged by user u

items(t): items tagged with tag t by at least one user

df(t): number of items tagged with tag ttfu(i,t): number of times user u tagged item i with tag ttf(i,t): number of times item i was tagged with tag t

Quantifying Friendship Strengths• Global „friendship“ strength:

UuuPglobal

• Spiritual friendship strength

• Social friendship strength

• Integrated friendship strength

Spritual Friendship Strength

|)'(||)(|

|)'()(|2)',(

utagsutags

utagsutagsuuPspirit

|)'(||)(|

|)'()(|2)',(

uitemsuitems

uitemsuitemsuuPspirit

Several alternatives:

• based on overlap of tag usage:

• based on overlap of tagged items:

For all:

• Pspirit(u,u):

• normalization such that

spirit uuP'

1)',( tags(u): tags used by user uitems(u): items tagged by user u

u u‘

)',( uuPspirit overlap in interests of u and u‘

• overlap of behavior (tagging, searching, rating, …)

u u‘

harrypotterwizard

deathlyhallows

philosopherstone

Graph-Based Friendship Strength

1),( 1 ii uuw

1),(),,(1

juuwuuwj

kiiii kkj

'path pw

uuPuup

social

• set Psocial(u,u):=0

• normalization such that

social uuP'

,u‘)

u3 u4 u5 u6 u7u‘

)',( uuPsocial distance of u and u‘ in user network

Integrated Friendship StrengthQuery-dependent mixture of• spiritual friendship strength• social friendship strength• background model (global)

(0,1; +1)

)',()',(||

1)1()',( uuPuuP

UuuF spiritualsocial

Pint(u,u‘)

Excursion: Scoring in Text Retrieval

)(),(),( tidftitftiscore

Importance of t in the collection(the less frequent, the better)

Importance of t for item i(the more frequent, the better)

General scoring framework:

5.0)(||log

),()1(),(

titfktiscore

Hand-tuned instance: Okapi BM25

jjn tiscorettiscore

11 ),(),(

Linear combination for query scores

Towards a User-specific Score

u titftitf ),(),(

u titfU

U ),(||

uu titfuuFUtisf'

),()',(||),(

Convert into user-specific social frequency:

global friendship strength

5.0)(||log

),()1(),(

tisfktiscore

Compute user-specific social score

[SIGIR 2008]

Including Tag Expansion

Problem: Users use different tags for similar things poor recall (missing relevant results)

Solution:1. Define notion of similar tags2. Expand queries with similar tags3. Modify scoring function for expanded queries

Example:MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, MMCI, UdS, Saarland University, …

Heuristics for finding similar tagsCo-Occurrence heuristics:Tags t1 and t2 similar if they occur (almost) always together

|)(||)(|

|)()(|2),(

2121 titemstitems

titemstitemsttsim

|)()(|]|[),(

212121 titems

titemstitemsttPttsim

Specialization heuristics:Tag t2 specialization of t1 if t1 occurs (almost) whenever t2 occurs

Example: t1=Europe, t2=Germany

Scoring Expanded QueriesNaive approach:For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query

Better: auto-tuning incremental expansionFor query tag t, consider only expansion withhighest combined score per item

)',()',(max),('

tiscorettsimtiscoreTt

„international crime“ expanded by „mafia camorra yakuza …“ But:„transportation disaster“ expanded by „train car bus plane …“Result quality drops due to topic drift

Experimental Evaluation: Effectiveness

Systematic evaluation of result quality difficult

Three possible setups:• Manual queries + human assessments• Queries+assessments derived from external info

(ex: DMOZ categories)• Automated assessments from context of user

– Items tagged by friends– Items tagged in the future

Prototype [VLDB/SIGIR 2008 demo]

Preliminary User StudyLibraryThing user study: [Data Engineering Bulletin, June 2008]• 6 librarything users with reasonably large library and friend sets• Overall 49 queries like „mystery magic“, „wizard“, „yakuza“• Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags,

~12,000 users, ~18,000 friends• Measured NDCG[10]

0.0 0.2 0.5 0.8 1.0

0.0 0.546 0.572 0.568 0.565 0.565

0.2 0.564 0.572 0.579 0.581 -

0.5 0.539 0.552 0.559 - -

0.8 0.515 0.546 - - -

1.0 0.465 - - - -

α (social)

(spiritual)

• Result quality generally very high• Combination of spiritual and social friends is best

Outline

• Efficient Query Evaluation– Threshold Algorithms

– ContextMerge

– Experimental Evaluation

Algorithmic Overview

• Input: query q={t1…tn} for user u, α,

• Output: k items with highest scores

• Goals:– Avoid computing all results– Minimize disk I/O and CPU load– Utilize precomputed information on disk

+ „harry potter“

……………………..

Excursion: Threshold Algorithms for Text IR

Input:• query q={t1…tn}

• lists L(tp) with pairs <i,score(i,tp)>, sorted by score(i,tp)↓

Output: k items with highest aggregated score

Family of Threshold Algorithms:• scan lists in parallel• maintain partial candidate results with score bounds• terminate as soon as top-k results are stable

Example: Top-1 for 2-term query (NRA)L1 L2 top-1 item

min-k:

candidates

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

Example: Top-1 for 2-term query (NRA)top-1 item

min-k:

candidates

0.9 ?A:

score: [0.9;1.9]

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

score: [0.0;1.9]

min-k:

candidates

0.9 ?A:

score: [0.9;1.9]

? 1.0D:

score: [1.0;1.9]

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

score: [0.0;1.9]

min-k:

candidates0.9 ?A:

score: [0.9;1.9]

0.3 ?G:

score: [0.3;1.3]

? 1.0D:

score: [1.0;1.3]A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

score: [0.0;1.3]

min-k:

candidates

0.9 ?A:

score: [0.9;1.6]

? 1.0D:

score: [1.0;1.3]

0.3 ?G:

score: [0.3;1.0]

No more new candidates considered

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

score: [0.0;1.0]

min-k:

candidates

0.9 ?A:

score: [0.9;1.6]

? 1.0D:

score: [1.0;1.3]

Algorithm safely terminates

A: 0.9

G: 0.3

H: 0.3

I: 0.25

J: 0.2

K: 0.2

D: 0.15

D: 1.0

E: 0.7

F: 0.7

B: 0.65

C: 0.6

A: 0.3

G: 0.2

? 1.0D:

score: [1.0;1.25]

0.9 ?A:

score: [0.9;1.55]

? 1.0D:

score: [1.0;1.2]

0.9 ?A:

score: [0.9;1.5]

? 1.0D:

score: [1.0;1.2]0.9 0.4A:

score: [1.3;1.3]

Can we reuse this here?harry

travel

No, scores specific to queryinguser and parameter setting!

: harry (=0.2,=0.5)

: harry (=0.0,=0.8)

: harry (=1.0,=0.0)

: harry (=0.5,=0.5)

: harry (=0.0,=1.0)

: harry (=0.2,=0.5)

: harry (=0.0,=0.8)

: harry (=1.0,=0.0)

: harry (=0.5,=0.5)

: harry (=0.0,=1.0)

: harry (=0.2,=0.5)

: harry (=0.0,=0.8)

: harry (=1.0,=0.0)

: harry (=0.5,=0.5)

: harry (=0.0,=1.0)

: harry (=0.2,=0.5)

: harry (=0.0,=0.8)

: harry (=1.0,=0.0)

: harry (=0.5,=0.5)

: harry (=0.0,=1.0)

Number of lists to precompute would explode!(#tags #users parameter space)

Revisiting the Social Frequency

uu titfuuFUtisf'

),()',(||),(

Uuuint titfuuP

),()',(||

1)1(||

Uuuint

u titfuuPU

),()',(||

),()1(||

uint titfuuPUtitf'

),()',(||),()1(

independent of user u dependent of user u

Uuuspiritual

Uuusocial titfuuPtitfuuPU

),()',(),()',(|| Compute sfu(i,t) on the fly from tf(i,t), friends of

u and their tagged documents

Top-K in Social Networks: ContextMergePrecomputed lists:• ITEMS(t): pairs <i,tf(i,t)>, sorted by tf(i,t)↓

• USERITEMS(u‘,t): pairs <i,tfu‘(i,t)>, unsorted

• FRIENDS(u): pairs <u‘,F(u,u‘)>, sorted by F(u,u‘)↓

ITEMS(harry): 47 32 26

FRIENDS( ): 0.12 0.10 0.085 …

USERITEMS( , harry):

alreadyexist insystems

ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list

– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results

ITEMS(harry):

FRIENDS( ):

ITEMS(harry):

FRIENDS( ):

User-indeppart of sf:

User-specpart of sf:

computemin score bound

compute max score bound

ITEMS(harry):

FRIENDS( ):

0.12·|U|

0.88·|U|

Experimental Evaluation: Efficiency• Testbed: 3 large crawls of real social networks

– Flickr: 10 mio pictures, ~50,000 users– Del.icio.us: ~175,000 bookmarks, ~12,000 users– Librarything: ~6.5 mio books, ~10,000 users

• Queries:– 150 frequent tag pairs– for each query pick user with „enough“ results &

friends• Abstract cost measure disk load• Baseline: full merge + sort

Experimental Evaluation: Efficiency (=0)

2-8 times better than baseline

Outline

Summary• Need for social-aware social search, supporting

– global– social– spiritual

information needs• Social scoring

– integrating global, collection, and social context– including dynamic tag expansion

• ContextMerge: scalable implementation

Further Challenges• Meaningful & common benchmark• Incremental maintenance for high dynamics• Extend to ratings, user weights, item weights, …• Extend to non-tags (like image features)• Automatic query parameterization• Meaningful explanations of results• Exploit dynamics (hot topics, evolving groups,….)

Social-Aware Search & Recommendationsat planet scale

Thank you.

Questions?

informationssuche in sozialen netzen ralf schenkel joint work with tom crecelius, mouna kacimi,...

social tagging networkwebsite

querying user users

new users

user uitemsu

tag ttfi

tag ttfui

number of items

new photos

Documents

professionelles dokumentenmanagement€¦ · enterprise...

„ip“ is not always „internet protocol“ a long and a...

· since our last visit this spring. each veteran also...

sprachengineering grundlagen und methoden...

mécanique du point matériel -...

en vue de l’obtention du doctorat de...

géodynamique externe: eaux souterraines. kassou mr....

datenbanken...datenbanken gewährleisten. • hilfe bei der...

you start it – die ersten schritte zum ecdl (windows xp...

g+j mobile 360° studie€¦ · informationssuche –...

journalistische recherche im internet -...

introduction à python - achraf kacimi el hassani

andré-luc beylot, rahim kacimi et riadh dhaou...

stein der mada: die chroniken von ilaris -...

kacimi teevan terre sainte version coup e - etc-cte file1...

last name first name email boudiaf amina boudiaf rafika...

rapport de stage - irit · 2018-04-06 · rapport de stage...

membership58e94294-ca71...2021/03/04 · 3) mallory...

idas-vda software für informationssuche in texten idas okt....

werbewirkung renault megane - ströer€¦ · märz 2016 i...