informationssuche in sozialen netzen ralf schenkel joint work with tom crecelius, mouna kacimi,...
TRANSCRIPT
Informationssuche in sozialen Netzen
Ralf Schenkel
Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum
February 2, 2009 Perspektivenvorlesung
Social Tagging NetworksDefinition: Social Tagging NetworkWebsite where people• publish + tag information• review + rate information• publish their interests• maintain network of friends• interact with friends
Common examples:• Flickr (images)• YouTube (videos)• del.icio.us (bookmarks)• Librarything (books)
• Discogs (CDs)• CiteULike (papers)• Facebook• Myspace (media)
February 2, 2009 Perspektivenvorlesung
Some StatisticsFlickr: (as of Nov 2008)• 3+ billion photos, 3 million new photos per dayFacebook: (as of Nov 2008)• 10+ billion photos, 30+ million new photos per day• 120 million active users• 150,000 new users per day
Myspace: (as of Apr 2007)• 135 million users (6th largest country on Earth)• 2+ billion images (150,000 req/s), millions added daily• 25 million songs• 60TB videos
StudiVZ.net: (as of Nov 2008)• 11 million users• 300 million images, 1 million added daily
Huge volume of highly dynamic data
February 2, 2009 Perspektivenvorlesung
Showcase: librarything.com
RatingsTagsBooks
Others
February 2, 2009 Perspektivenvorlesung
librarything.com: Social Interaction
Explicit Friends
Similar Users
Comments
February 2, 2009 Perspektivenvorlesung
librarything.com: Tag Clouds
February 2, 2009 Perspektivenvorlesung
librarything.com: Search
Search results independent of the querying user(and the social context)
Search results independent of the querying user(and the social context)
February 2, 2009 Perspektivenvorlesung
librarything.com: Search
Search automatically expanded with similar tags(synonyms)
Search automatically expanded with similar tags(synonyms)
February 2, 2009 Perspektivenvorlesung
Librarything.com: Recommendations
Recommendations depend on user and tags(but not on social context)
Recommendations depend on user and tags(but not on social context)
February 2, 2009 Perspektivenvorlesung
Librarything.com: Recommendations
Explanation for the recommendationExplanation for the recommendation
February 2, 2009 Perspektivenvorlesung
Librarything.com: Explanations
February 2, 2009 Perspektivenvorlesung
Librarything.com: Explanations
February 2, 2009 Perspektivenvorlesung
Outline
• Search in Social Tagging Networks– Graph Model
– Different Information Needs
• Effective Query Scoring
• Efficient Query Evaluation
• Summary & Further Challenges
February 2, 2009 Perspektivenvorlesung
Querying Social Tagging Networks
travelvldb
travelnorway
February 2, 2009 Perspektivenvorlesung
Querying Social Tagging Networks
travelvldb
travelnorway
harrypotter
harrypotter
harrypotter
harrypotter
traveltrip
travelicde
travelmexico
travel
travelnorway
travelvldb
probabilitydata miningfoundations
February 2, 2009 Perspektivenvorlesung
Information Need 1: Globally Popular
travelvldb
travelnorway
harrypotter
harrypotter
harrypotter
harrypotter
traveltrip
travelicde
travelmexico
travel
travelnorway
travelvldb
probabilitydata miningfoundations
Most frequently tagged items „best“Tags by all users equally important
harry potter
or ?
February 2, 2009 Perspektivenvorlesung
Information Need 2: Similar Users
harrypotter
harrypotter
harrypotter
harrypotter
traveltrip
travelicde
travelmexico
travelvldb
travel
travelnorway
travelnorway
travelvldb
probabilitydata miningfoundations
travel
or ?
February 2, 2009 Perspektivenvorlesung
Information Need 2: Similar Users
harrypotter
harrypotter
harrypotter
harrypotter
traveltrip
travelicde
travelmexico
travelvldb
travel
travelnorway
travelnorway
travelvldb
probabilitydata miningfoundations
travel
or ?Tags by users with similar tags/items(„brothers in spirit“)
more important
February 2, 2009 Perspektivenvorlesung
Information Need 3: Trusted Friends
harrypotter
harrypotter
harrypotter
traveltrip
travelicdetravel
vldb
travel
travelnorway
travelnorway
travelvldb
probabilityselling
probabilitydata miningfoundations
probabilityselling
probabilityselling
probability harrypotter
travelmexico
or ?
February 2, 2009 Perspektivenvorlesung
Information Need 3: Trusted Friends
harrypotter
harrypotter
harrypotter
traveltrip
travelicdetravel
vldb
travel
travelnorway
travelnorway
travelvldb
probabilityselling
probabilitydata miningfoundations
probabilityselling
probabilityselling
probability harrypotter
travelmexico
or ?
Tags by closely related andwell-known users more important
February 2, 2009 Perspektivenvorlesung
Towards Social-Aware Social Search
Search results may depend on– Global popularity of items– Spiritual context of the querying user
(users with similar books and/or tags)– Social context of the querying user
(known and trusted friends)
February 2, 2009 Perspektivenvorlesung
Outline
• Search in Social Tagging Networks
• Effective Query Scoring– Quantifying Friendship Strengths
– User-specific Scoring Functions
– Experimental Evaluation
• Efficient Query Evaluation
• Summary & Further Challenges
February 2, 2009 Perspektivenvorlesung
NotationU set of usersT set of tagsI set of items
tags(u): tags used by user uitems(u): items tagged by user u
items(t): items tagged with tag t by at least one user
df(t): number of items tagged with tag ttfu(i,t): number of times user u tagged item i with tag ttf(i,t): number of times item i was tagged with tag t
February 2, 2009 Perspektivenvorlesung
Quantifying Friendship Strengths• Global „friendship“ strength:
||
1)',(
UuuPglobal
• Spiritual friendship strength
• Social friendship strength
• Integrated friendship strength
February 2, 2009 Perspektivenvorlesung
Spritual Friendship Strength
|)'(||)(|
|)'()(|2)',(
utagsutags
utagsutagsuuPspirit
|)'(||)(|
|)'()(|2)',(
uitemsuitems
uitemsuitemsuuPspirit
Several alternatives:
• based on overlap of tag usage:
• based on overlap of tagged items:
For all:
• Pspirit(u,u):
• normalization such that
uu
spirit uuP'
1)',( tags(u): tags used by user uitems(u): items tagged by user u
u u‘
)',( uuPspirit overlap in interests of u and u‘
• overlap of behavior (tagging, searching, rating, …)
u u‘
harrypotterwizard
deathlyhallows
philosopherstone
February 2, 2009 Perspektivenvorlesung
Graph-Based Friendship Strength
u1
u2
u3
u4
u5
u6
u7
1),( 1 ii uuw
1),(),,(1
11 1
juuwuuwj
kiiii kkj
)(min
1)',(
'path pw
uuPuup
social
• set Psocial(u,u):=0
• normalization such that
uu
social uuP'
1)',(
u2
Pso
cial(
,u‘)
||
1
U
u3 u4 u5 u6 u7u‘
)',( uuPsocial distance of u and u‘ in user network
February 2, 2009 Perspektivenvorlesung
Integrated Friendship StrengthQuery-dependent mixture of• spiritual friendship strength• social friendship strength• background model (global)
(0,1; +1)
)',()',(||
1)1()',( uuPuuP
UuuF spiritualsocial
Pint(u,u‘)
February 2, 2009 Perspektivenvorlesung
Excursion: Scoring in Text Retrieval
)(),(),( tidftitftiscore
Importance of t in the collection(the less frequent, the better)
Importance of t for item i(the more frequent, the better)
General scoring framework:
5.0)(
5.0)(||log
),(
),()1(),(
1
1
tdf
tdfI
titfk
titfktiscore
Hand-tuned instance: Okapi BM25
n
jjn tiscorettiscore
11 ),(),(
Linear combination for query scores
February 2, 2009 Perspektivenvorlesung
Towards a User-specific Score
Uu
u titftitf ),(),(
Uu
u titfU
U ),(||
1||
Uu
uu titfuuFUtisf'
),()',(||),(
Convert into user-specific social frequency:
global friendship strength
5.0)(
5.0)(||log
),(
),()1(),(
1
1
tdf
tdfI
tisfk
tisfktiscore
u
uu
Compute user-specific social score
[SIGIR 2008]
February 2, 2009 Perspektivenvorlesung
Including Tag Expansion
Problem: Users use different tags for similar things poor recall (missing relevant results)
Solution:1. Define notion of similar tags2. Expand queries with similar tags3. Modify scoring function for expanded queries
Example:MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, MMCI, UdS, Saarland University, …
February 2, 2009 Perspektivenvorlesung
Heuristics for finding similar tagsCo-Occurrence heuristics:Tags t1 and t2 similar if they occur (almost) always together
|)(||)(|
|)()(|2),(
21
2121 titemstitems
titemstitemsttsim
|)(|
|)()(|]|[),(
2
212121 titems
titemstitemsttPttsim
Specialization heuristics:Tag t2 specialization of t1 if t1 occurs (almost) whenever t2 occurs
Example: t1=Europe, t2=Germany
February 2, 2009 Perspektivenvorlesung
Scoring Expanded QueriesNaive approach:For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query
Better: auto-tuning incremental expansionFor query tag t, consider only expansion withhighest combined score per item
)',()',(max),('
tiscorettsimtiscoreTt
„international crime“ expanded by „mafia camorra yakuza …“ But:„transportation disaster“ expanded by „train car bus plane …“Result quality drops due to topic drift
February 2, 2009 Perspektivenvorlesung
Experimental Evaluation: Effectiveness
Systematic evaluation of result quality difficult
Three possible setups:• Manual queries + human assessments• Queries+assessments derived from external info
(ex: DMOZ categories)• Automated assessments from context of user
– Items tagged by friends– Items tagged in the future
?
February 2, 2009 Perspektivenvorlesung
Prototype [VLDB/SIGIR 2008 demo]
February 2, 2009 Perspektivenvorlesung
Preliminary User StudyLibraryThing user study: [Data Engineering Bulletin, June 2008]• 6 librarything users with reasonably large library and friend sets• Overall 49 queries like „mystery magic“, „wizard“, „yakuza“• Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags,
~12,000 users, ~18,000 friends• Measured NDCG[10]
0.0 0.2 0.5 0.8 1.0
0.0 0.546 0.572 0.568 0.565 0.565
0.2 0.564 0.572 0.579 0.581 -
0.5 0.539 0.552 0.559 - -
0.8 0.515 0.546 - - -
1.0 0.465 - - - -
α (social)
(spiritual)
• Result quality generally very high• Combination of spiritual and social friends is best
February 2, 2009 Perspektivenvorlesung
Outline
• Search in Social Tagging Networks
• Effective Query Scoring
• Efficient Query Evaluation– Threshold Algorithms
– ContextMerge
– Experimental Evaluation
• Summary & Further Challenges
February 2, 2009 Perspektivenvorlesung
Algorithmic Overview
• Input: query q={t1…tn} for user u, α,
• Output: k items with highest scores
• Goals:– Avoid computing all results– Minimize disk I/O and CPU load– Utilize precomputed information on disk
+ „harry potter“
……………………..
February 2, 2009 Perspektivenvorlesung
Excursion: Threshold Algorithms for Text IR
Input:• query q={t1…tn}
• lists L(tp) with pairs <i,score(i,tp)>, sorted by score(i,tp)↓
Output: k items with highest aggregated score
Family of Threshold Algorithms:• scan lists in parallel• maintain partial candidate results with score bounds• terminate as soon as top-k results are stable
February 2, 2009 Perspektivenvorlesung
Example: Top-1 for 2-term query (NRA)L1 L2 top-1 item
min-k:
candidates
A: 0.9
G: 0.3
H: 0.3
I: 0.25
J: 0.2
K: 0.2
D: 0.15
D: 1.0
E: 0.7
F: 0.7
B: 0.65
C: 0.6
A: 0.3
G: 0.2
February 2, 2009 Perspektivenvorlesung
Example: Top-1 for 2-term query (NRA)top-1 item
min-k:
candidates
0.9 ?A:
score: [0.9;1.9]
0.9
A: 0.9
G: 0.3
H: 0.3
I: 0.25
J: 0.2
K: 0.2
D: 0.15
D: 1.0
E: 0.7
F: 0.7
B: 0.65
C: 0.6
A: 0.3
G: 0.2
? ??:
score: [0.0;1.9]
L1 L2
February 2, 2009 Perspektivenvorlesung
Example: Top-1 for 2-term query (NRA)top-1 item
min-k:
candidates
0.9 ?A:
score: [0.9;1.9]
0.9
? 1.0D:
score: [1.0;1.9]
1.0
A: 0.9
G: 0.3
H: 0.3
I: 0.25
J: 0.2
K: 0.2
D: 0.15
D: 1.0
E: 0.7
F: 0.7
B: 0.65
C: 0.6
A: 0.3
G: 0.2
? ??:
score: [0.0;1.9]
L1 L2
February 2, 2009 Perspektivenvorlesung
1.0
Example: Top-1 for 2-term query (NRA)top-1 item
min-k:
candidates0.9 ?A:
score: [0.9;1.9]
0.3 ?G:
score: [0.3;1.3]
? 1.0D:
score: [1.0;1.3]A: 0.9
G: 0.3
H: 0.3
I: 0.25
J: 0.2
K: 0.2
D: 0.15
D: 1.0
E: 0.7
F: 0.7
B: 0.65
C: 0.6
A: 0.3
G: 0.2
? ??:
score: [0.0;1.3]
L1 L2
February 2, 2009 Perspektivenvorlesung
1.0
Example: Top-1 for 2-term query (NRA)top-1 item
min-k:
candidates
0.9 ?A:
score: [0.9;1.6]
? 1.0D:
score: [1.0;1.3]
0.3 ?G:
score: [0.3;1.0]
No more new candidates considered
A: 0.9
G: 0.3
H: 0.3
I: 0.25
J: 0.2
K: 0.2
D: 0.15
D: 1.0
E: 0.7
F: 0.7
B: 0.65
C: 0.6
A: 0.3
G: 0.2
? ??:
score: [0.0;1.0]
L1 L2
February 2, 2009 Perspektivenvorlesung
1.0
Example: Top-1 for 2-term query (NRA)top-1 item
min-k:
candidates
0.9 ?A:
score: [0.9;1.6]
? 1.0D:
score: [1.0;1.3]
Algorithm safely terminates
A: 0.9
G: 0.3
H: 0.3
I: 0.25
J: 0.2
K: 0.2
D: 0.15
D: 1.0
E: 0.7
F: 0.7
B: 0.65
C: 0.6
A: 0.3
G: 0.2
? 1.0D:
score: [1.0;1.25]
0.9 ?A:
score: [0.9;1.55]
? 1.0D:
score: [1.0;1.2]
0.9 ?A:
score: [0.9;1.5]
? 1.0D:
score: [1.0;1.2]0.9 0.4A:
score: [1.3;1.3]
1.3
L1 L2
February 2, 2009 Perspektivenvorlesung
Can we reuse this here?harry
0.95
0.85
0.51
travel
0.87
0.82
0.69
No, scores specific to queryinguser and parameter setting!
: harry (=0.2,=0.5)
0.98
0.84
0.45
: harry (=0.0,=0.8)
0.90
0.89
0.56
: harry (=1.0,=0.0)
0.90
0.89
0.56
: harry (=0.5,=0.5)
0.90
0.86
0.64
: harry (=0.0,=1.0)
0.90
0.89
0.56
: harry (=0.2,=0.5)
0.98
0.84
0.45
: harry (=0.0,=0.8)
0.90
0.89
0.56
: harry (=1.0,=0.0)
0.90
0.89
0.56
: harry (=0.5,=0.5)
0.90
0.86
0.64
: harry (=0.0,=1.0)
0.90
0.89
0.56
: harry (=0.2,=0.5)
0.98
0.84
0.45
: harry (=0.0,=0.8)
0.90
0.89
0.56
: harry (=1.0,=0.0)
0.90
0.89
0.56
: harry (=0.5,=0.5)
0.90
0.86
0.64
: harry (=0.0,=1.0)
0.90
0.89
0.56
: harry (=0.2,=0.5)
0.98
0.84
0.45
: harry (=0.0,=0.8)
0.90
0.89
0.56
: harry (=1.0,=0.0)
0.90
0.89
0.56
: harry (=0.5,=0.5)
0.90
0.86
0.64
: harry (=0.0,=1.0)
0.90
0.89
0.56
Number of lists to precompute would explode!(#tags #users parameter space)
February 2, 2009 Perspektivenvorlesung
Revisiting the Social Frequency
Uu
uu titfuuFUtisf'
),()',(||),(
Uuuint titfuuP
UU
'
),()',(||
1)1(||
Uuuint
Uu
u titfuuPU
titfU
'
),()',(||
),()1(||
Uu
uint titfuuPUtitf'
),()',(||),()1(
independent of user u dependent of user u
Uuuspiritual
Uuusocial titfuuPtitfuuPU
''
),()',(),()',(|| Compute sfu(i,t) on the fly from tf(i,t), friends of
u and their tagged documents
February 2, 2009 Perspektivenvorlesung
Top-K in Social Networks: ContextMergePrecomputed lists:• ITEMS(t): pairs <i,tf(i,t)>, sorted by tf(i,t)↓
• USERITEMS(u‘,t): pairs <i,tfu‘(i,t)>, unsorted
• FRIENDS(u): pairs <u‘,F(u,u‘)>, sorted by F(u,u‘)↓
ITEMS(harry): 47 32 26
FRIENDS( ): 0.12 0.10 0.085 …
…
USERITEMS( , harry):
alreadyexist insystems
February 2, 2009 Perspektivenvorlesung
ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list
– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results
ITEMS(harry):
47
32
26
…
FRIENDS( ):
0.12
0.10
0.085
…
February 2, 2009 Perspektivenvorlesung
ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list
– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results
ITEMS(harry):
47
32
26
…
FRIENDS( ):
0.12
0.10
0.085
…
User-indeppart of sf:
User-specpart of sf:
47
? |U|
computemin score bound
compute max score bound
February 2, 2009 Perspektivenvorlesung
ContextMergeAdapted Threshold Algorithm for query u,t:• Scan ITEMS(t) and FRIENDS(u) in parallel• pick „best“ list
– If ITEMS(t): read next entry– If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘– Maintain candidates with bounds for min and max score and current results
ITEMS(harry):
47
32
26
…
FRIENDS( ):
0.12
0.10
0.085
…
User-indeppart of sf:
User-specpart of sf:
47
? |U|
User-indeppart of sf:
User-specpart of sf:
?
0.12·|U|
47
|U|
0.88·|U|
February 2, 2009 Perspektivenvorlesung
Experimental Evaluation: Efficiency• Testbed: 3 large crawls of real social networks
– Flickr: 10 mio pictures, ~50,000 users– Del.icio.us: ~175,000 bookmarks, ~12,000 users– Librarything: ~6.5 mio books, ~10,000 users
• Queries:– 150 frequent tag pairs– for each query pick user with „enough“ results &
friends• Abstract cost measure disk load• Baseline: full merge + sort
February 2, 2009 Perspektivenvorlesung
Experimental Evaluation: Efficiency (=0)
α
2-8 times better than baseline
February 2, 2009 Perspektivenvorlesung
Outline
• Search in Social Tagging Networks
• Effective Query Scoring
• Efficient Query Evaluation
• Summary & Further Challenges
February 2, 2009 Perspektivenvorlesung
Summary• Need for social-aware social search, supporting
– global– social– spiritual
information needs• Social scoring
– integrating global, collection, and social context– including dynamic tag expansion
• ContextMerge: scalable implementation
February 2, 2009 Perspektivenvorlesung
Further Challenges• Meaningful & common benchmark• Incremental maintenance for high dynamics• Extend to ratings, user weights, item weights, …• Extend to non-tags (like image features)• Automatic query parameterization• Meaningful explanations of results• Exploit dynamics (hot topics, evolving groups,….)
Social-Aware Search & Recommendationsat planet scale
February 2, 2009 Perspektivenvorlesung
Thank you.
Questions?