social systems for smaller communities
DESCRIPTION
A presentation of Google-supported project on short-term community at Google Pittsburgh on 5/9/2011TRANSCRIPT
Social Systems for Smaller Communities
Peter Brusilovsky with
Chirayu Wongchokprasitti
Shaghayegh Sahebi, Danielle Lee,
Claudia Lopez, and other PAWS students
University of Pittsburgh - PAWS Lab 2
Overview
• The context• The problem• The goal• Work done• Google Integration
Social Systems: the Web of People
http://www.veryweb.it/?page_id=27
• User-Generated content
– Blogs– Wikis
• Shared resources– Video (YouTube)– Bookmarks– News
• Secondary content– Comments– Ratings– Tags
• User as a first-class participant, contributor, author
http://www.masternewmedia.org/news/2006/12/01/social_bookmarking_services_and_tools.htm
Key Elements
Sharing and Tagging
• Delicious & Flickr– Pioneered the concept of folksonomy
• Collaborative categorization using freely chosen keywords (tags)
University of Pittsburgh - PAWS Lab 6
Sharing and Tagging: CiteULike
• Encyclopedia to Wikipedia
–Launched in 2001–Largest and fastest growing, and most popular reference work
•News Services to Blogosphere•Books to FanFiction
User-Generated Content
Comments and Ratings
Markets, Feedback, and Trust
– Collective activity of all its users
Voting by Linking - PageRank
– Using the link structure of the web
• Wisdom of Crowds: Communities create value!• Community of authors
produce valuable content
• Critical mass of participation act as filtering what is valuable
• The web of connections grows organically as an output of the collective activity of all web users
Collective Intelligence
The Weak Link: Participation
• Community based Systems share many issues, which should be addressedto produces successfulsystems• Participation vs lurking• Social capital• Social networking• Trust and reputation• Privacy and presence
1
10Synthesizers
100consumers
creators
University of Pittsburgh - PAWS Lab 13
One of 100? One of 500?
9/26/2010
University of Pittsburgh - PAWS Lab 14
One of 100000?
University of Pittsburgh - PAWS Lab 15
Diminishing Returns
• 307,006,550: US Population• 10,000,000: Watched the movie (1:30)• 20,000: Rated the movie in IMDB
(1:15,000)• 238: Wrote a review (1:1,000,000)• 54: Rated the movie in MovieLens
(1:5,000,000)
University of Pittsburgh - PAWS Lab 16
Social Systems for Small Communities?• Sharing cultural events in Pittsburgh?
– Post event, rate event, write a review– One of many systems presenting events– 334563 people, 143739 households, and
74169 families – Expected ratings (1:5,000,000)?
• Sharing research talks at CMU and Pitt?– The one and the only system of this kind…– Expected posts (1:1,000,000)?– Expected bookmarks (1:15,000)?
University of Pittsburgh - PAWS Lab 17
Conference Navigator III
University of Pittsburgh - PAWS Lab 19
CoMeT (http://halley.exp.sis.pitt.edu/comet/)
University of Pittsburgh - PAWS Lab 20
CoMeT: Collaborative Management of Talks
9/26/2010
Social
Personalized
The Idea
Ubiquitous
• Personalization– Recommender service– Social navigation– Adaptive engagement
• Mobile and Ubiquitous– Android application– Facebook connection (a sidewalk sale)– Twitter feed– Public displays
22
The Plan
• Personalization– Simple content-based recommender in CoMeT
and CN3– Offered in navigation support mode
• Mobile and Ubiquitous– First Eventur app (search for Eventur in the
Android market)– Eventur Facebook export– Eventur Twitter feed
23
Where we Are?
University of Pittsburgh - PAWS Lab 24
CoMeT Navigation Support
9/26/2010
University of Pittsburgh - PAWS Lab 25
Personalization Challenge
• Events: Short living artifacts• Need everything that can work• Content-based recommendation• Collaborative recommendation• Social recommendation• Demographic and group-based
recommendation• Case-Based (Metadata-based)
recommendation
University of Pittsburgh - PAWS Lab 26
Personalization for Engagement
• Adaptive engagement efforts– Based on user knowledge/goals/interests– Based on user past experience with the
system• Special efforts to deal with cold start:
Using information from other social systems– Social bookmarking systems (CiteULike,
Delicious)– Social linking systems (Facebook, LinkedIn)– Public data (i.e., Google Scholar)
• HetRec 2011 workshop!
University of Pittsburgh - PAWS Lab 27
Recommendation Approaches
• Various sources of information:– Standard information: Keywords of
bookmarked talks in CoMeT– Keywords of bookmarked papers from
CiteULike– Tags of talks in CoMeT– Tags of papers in CiteULike (CUL)
• Different models for fusion of tags and keywords
9/26/2010
University of Pittsburgh - PAWS Lab 28
Document Representation Models
• Keywords Only (KO)– Keywords extracted from documents’ titles and
abstracts
• Keywords+n*Tags (KnT)– Keywords extracted from documents’ titles and
abstracts + tags assigned to documents
• Keywords Concatenated by Tags (KCT)– Keywords extracted from documents’ titles and
abstracts + tags assigned to documents9/26/2010
University of Pittsburgh - PAWS Lab 29
Keywords Only (KO) Model
• Each document: – a bag of words– represented as a vector in keywords vector
space– TF.IDF weightening scheme
9/26/2010
W1
W2
W3
W4
W5
W6
D1 0 1 0 0 0 0
D2 .5 0 0 .5 0 0
D3.12
.13
0.25
.5 0
D4.25
0.25
0.25
.25
Talks/Papers
Keywords
University of Pittsburgh - PAWS Lab
Merging CUL and CoMeT Data in KO Model
9/26/201030
w1 w2
w3 w4
P1 1 0 0 0
P2 .25
0 .5 .25
P3 0 .5 .25 .25
W3
W4
w5
T1 0 1 0
t2 0 0 .5
Dc: CUL Papers’ Matrix Dt: CoMeT Talks’
Matrix
W1 w2 W3 W4 w5
T1 0 0 0 1 0
T2 0 0 0 0 .5
P1 1 0 0 0 0
P2 .25 0 .5 .25 0
P3 0 .5 .25 .25 0
k
l
e
m
K+e
l+m-o
D: Merged Documents’ Matrix
k- the number of CiteULike papersl- the number of keywords used in CiteULike paperse- total number of talks in CoMeTm- total number of keywords in CoMeTo- the number of common keywords between two CoMeT and CiteULike systems
University of Pittsburgh - PAWS Lab 31
Keywords+n*Tags (KnT) Model• Each document: a bag of words containing :
– document’s abstract, title and tags• Tags: regular keywords
– Each tag appears n times• Merge CUL and CoMeT data in this model: same as KO
9/26/2010
W1 W2W3/T1
W4/T2
T3 T4
D1 0 1 1 0 0 0
D2 1 0 3 5 0 0
D3 1 2 3 0 1 0
D4 2 0 5 0 2 1
Keywords
Talks/Papers
Common Keywords &
Tags
Tags
Keywords: w1, w2, w3,
w2
Tags: T1, T3
D3
n=2
W3=T1W4=T2
University of Pittsburgh - PAWS Lab 32
Keywords Concatenated by Tags (KCT) Model• Tags: a separated source of information• Each document: a bag of keywords and a
bag of tags– Concatenating keywords and tags vectors– TF.IDF weightening scheme
9/26/2010
W1 W2 W3 W4 T1 T2 T3 T4
D1 0 1 1 0 0 0 0 0
D2 1 0 3 1 0 2 0 0
D3 1 2 1 0 1 0 1 0
D4 2 3 3 0 1 0 2 1
Keywords
Talks/Papers
Tags
W3=T1W4=T2
Keywords: w1, w2, w3,
w2
Tags: T1, T3
D3
University of Pittsburgh - PAWS Lab 33
Merging CUL and CoMeT Data in KCT Model
9/26/2010
w1 w2
T1 T2
P1 1 0 0 0
P2 .25
0 .5 .25
P3 0 .5 .25 .25
W2
W3
T1
C1 0 1 0
C2 0 0 .5
Dc: CUL Papers’ Matrix Dt: CoMeT Talks’ Matrix
W1 w2 W3 T1 T2
C1 0 0 1 0 0
C2 0 0 0 .5 0
P1 1 0 0 0 0
P2 .25 0 0 .5 .25
P3 0 .5 0 .25 .25
k
m+i
e
l+j
K+e
l+m+i+j-o-p
D: Merged Documents’ Matrix
k- the number of CiteULike papersm- the number of keywords used in CiteULike papersi- the number of tags used in CiteULike paperse- total number of talks in CoMeTl- total number of keywords in CoMeTj- total number of tags in CoMeTo- the number of common keywords between two CoMeT and CiteULike systemsP- the number of common tags between two CoMeT and CiteULike systems
University of Pittsburgh - PAWS Lab 34
Recommending Talks to Users• K-nearest neighbor method
– recommend top K closest documents to user profile• User profiles: based on users’ bookmarked and
rated talks and papers
9/26/2010
D1 D2
D3 D4
U1 1 0 0 0
U2 .25
0 .5 .25
U3 0 .5 .25 .25
W1
W2
w3
D1 0 1 0
D2 0 0 .5
D3 0 1 0
D4 0 0 .5
U: User Profiles in Talks/Papers
Space
D: Documents in Keywords Space
users
Documents Keywords
w1 w2
w3
U1 1 0 1
U2 .25
0.5
.37
U3 0 .25
.37
UP: User Profiles in Keywords
Space
users
Keywords
University of Pittsburgh - PAWS Lab 35
Experimental Results
• User study:– 8 real users of both CoMeT and CiteULike systems
• Evaluation questionnaire for each recommended talk:– Is this talk related to your interest? (yes/no
question)– How interesting this talk to you? (in 5-point scale)– If the talk is related to your interests, how novel is
this talk to you? (in 5-step scale)
9/26/2010
University of Pittsburgh - PAWS Lab 36
Experimental Results (Cont’d)
• Compared six models: – KO, KnT (with n = 1, 2,5; best n = 1), and KCT
• using only CoMeT data • using both, CoMeT and CiteULike
• Measures:– Relevance: precision by yes/no answers– Interest: nDCG by 5-point scale– Novelty: averaged the novelty ratings (Non-
relevant = zero novelty)9/26/2010
University of Pittsburgh - PAWS Lab 37
Precision results for differentnumber of recommendations
Precision 1 2 3 4 5 6 7 8 9 10
Only CoMeT
Data
KO 0.83 0.67 0.72 0.63 0.6 0.56 0.57 0.5 0.51 0.51
KnT
n = 10.5 0.5 0.58 0.59 0.57 0.58 0.57 0.58 0.6 0.57
KCT 0.5 0.33 0.39 0.46 0.47 0.53 0.52 0.5 0.5 0.53
CoMeT + CiteULike
Data
KO 0.83 0.83 0.67 0.75 0.73 0.69 0.64 0.63 0.56 0.57
KnT
n = 10.63 0.69 0.71 0.72 0.73 0.73 0.71 0.7 0.68 0.67
KCT 0.38 0.44 0.42 0.47 0.48 0.52 0.5 0.49 0.53 0.55
9/26/2010
University of Pittsburgh - PAWS Lab 38
Precision results for differentnumber of recommendations (Cont’d)
• Adding tag using KnT → better cumulative precision for top 10 recommendations
• Adding CoMeT data in both KnT and KO → higher precision
• KnT with both CoMeT and CUL data → best cumulative precision
• KCT model → decrease in precision– High dimensionality of vector space model →
increased distance of documents and user profiles → decreased variance between similarities of user profile to different talks
9/26/2010
University of Pittsburgh - PAWS Lab 39
nDCG Results for different number of recommendations
nDCG 1 2 3 4 5 6 7 8 9 10
Only CoMeT
Data
KO 0.9 0.88 0.89 0.93 0.92 0.94 0.95 0.95 0.95 0.96KnT
n = 10.9 0.85 0.82 0.83 0.87 0.88 0.89 0.9 0.91 0.93
KCT 0.84 0.88 0.89 0.9 0.9 0.91 0.92 0.92 0.94 0.95
CoMeT + CiteULike
Data
KO 0.84 0.91 0.9 0.92 0.93 0.94 0.95 0.96 0.96 0.96KnT
n = 10.9 0.9 0.89 0.88 0.9 0.92 0.92 0.94 0.94 0.95
KCT 0.77 0.85 0.84 0.81 0.83 0.84 0.86 0.88 0.91 0.92
9/26/2010
University of Pittsburgh - PAWS Lab 40
nDCG Results for different number of recommendations (Cont’d)
• KCT and KnT models: using both CiteULike and CoMeT data → increased user cumulative interest
• Best results: tag-less KO model both with and without CiteULike data
9/26/2010
University of Pittsburgh - PAWS Lab 41
Novelty Results for different number of recommendations
Novelty 1 2 3 4 5 6 7 8 9 10
Only CoMeT
Data
KO 1.75 1.69 1.67 1.72 1.7 1.65 1.66 1.55 1.49 1.44
KnT
n = 11.88 1.75 1.67 1.88 1.88 1.88 2 2.03 1.99 1.93
KCT 2 1.5 1.54 1.56 1.55 1.6 1.63 1.58 1.5 1.5
CoMeT + CiteULike
Data
KO 1.88 1.44 1.33 1.5 1.5 1.52 1.61 1.47 1.44 1.36
KnT
n = 11.75 2.19 1.79 2.06 2.2 2.08 2.02 2.19 2.06 1.96
KCT 1.38 1.31 1.38 1.47 1.58 1.6 1.52 1.47 1.61 1.64
9/26/2010
University of Pittsburgh - PAWS Lab 42
Novelty Results for different number of recommendations (Cont’d)
• Adding tags using KnT fusion model → largest positive impact
• adding different sources of information → improve the novelty of recommendations – Tags are provided by users → include a broader range of vocabulary – Each user tags: describe a document from her point of view (different
from the terms included in the document)
• Adding CUL data in KO model → decreased novelty– Distinctive natures of CoMeT and CiteULike systems
• CiteULike: adding, reviewing and rating related papers to their research field • CoMeT: information about talks happening within a specific time given on a
particular date users bookmark a more novel, less relevant talk
9/26/2010
University of Pittsburgh - PAWS Lab 43
Conclusion
• Relevance: a fit to user research work • Interest: an overall attraction of an item• Users interested in talks on more general
topics – little in common with their research interests
• Increased focus of relevance encapsulated in tags → The decrease of system ability to recommend interesting talks with the addition of tags
9/26/2010
University of Pittsburgh - PAWS Lab 44
Conclusion (Cont’d)
• Including another reliable user profile → increase precision of recommendations; – Considering the way to augment the additional profile
• Using CiteULike data for all models – Increased Relevancy of every recommended documents – Various results of interestingness
• Adding tags – Increased novelty of recommendations (both using CoMeT and CUL
data)– increased relatedness in larger number of recommendations
• Injection of keywords from another source of data: more reliable than including tags for relevancy
• Including tags from various sources of information: more reliable for interestingness or novelty
9/26/2010
University of Pittsburgh - PAWS Lab 45
Thank you!
9/26/2010