social systems for smaller communities

Social Systems for Smaller Communities

Peter Brusilovsky with

Chirayu Wongchokprasitti

Shaghayegh Sahebi, Danielle Lee,

Claudia Lopez, and other PAWS students

University of Pittsburgh - PAWS Lab 2

Overview

• The context• The problem• The goal• Work done• Google Integration

Social Systems: the Web of People

http://www.veryweb.it/?page_id=27

• User-Generated content

– Blogs– Wikis

• Shared resources– Video (YouTube)– Bookmarks– News

• Secondary content– Comments– Ratings– Tags

• User as a first-class participant, contributor, author

http://www.masternewmedia.org/news/2006/12/01/social_bookmarking_services_and_tools.htm

Key Elements

Sharing and Tagging

• Delicious & Flickr– Pioneered the concept of folksonomy

• Collaborative categorization using freely chosen keywords (tags)


Sharing and Tagging: CiteULike

http://www.citeulike.org/user/brusilovsky

• Encyclopedia to Wikipedia

–Launched in 2001–Largest and fastest growing, and most popular reference work

•News Services to Blogosphere•Books to FanFiction

User-Generated Content

Comments and Ratings

Markets, Feedback, and Trust

– Collective activity of all its users

Voting by Linking - PageRank

– Using the link structure of the web

• Wisdom of Crowds: Communities create value!• Community of authors

produce valuable content

• Critical mass of participation act as filtering what is valuable

• The web of connections grows organically as an output of the collective activity of all web users

Collective Intelligence

The Weak Link: Participation

• Community based Systems share many issues, which should be addressedto produces successfulsystems• Participation vs lurking• Social capital• Social networking• Trust and reputation• Privacy and presence

1

10Synthesizers

100consumers

creators


One of 100? One of 500?

9/26/2010


One of 100000?


Diminishing Returns

• 307,006,550: US Population• 10,000,000: Watched the movie (1:30)• 20,000: Rated the movie in IMDB

(1:15,000)• 238: Wrote a review (1:1,000,000)• 54: Rated the movie in MovieLens

(1:5,000,000)


Social Systems for Small Communities?• Sharing cultural events in Pittsburgh?

– Post event, rate event, write a review– One of many systems presenting events– 334563 people, 143739 households, and

74169 families – Expected ratings (1:5,000,000)?

• Sharing research talks at CMU and Pitt?– The one and the only system of this kind…– Expected posts (1:1,000,000)?– Expected bookmarks (1:15,000)?


Conference Navigator III

http://halley.exp.sis.pitt.edu/cn3/portalindex.php


Eventur.us

http://eventur.sis.pitt.edu/


CoMeT (http://halley.exp.sis.pitt.edu/comet/)


CoMeT: Collaborative Management of Talks

9/26/2010

Social

Personalized

The Idea

Ubiquitous

• Personalization– Recommender service– Social navigation– Adaptive engagement

• Mobile and Ubiquitous– Android application– Facebook connection (a sidewalk sale)– Twitter feed– Public displays

22

The Plan

• Personalization– Simple content-based recommender in CoMeT

and CN3– Offered in navigation support mode

• Mobile and Ubiquitous– First Eventur app (search for Eventur in the

Android market)– Eventur Facebook export– Eventur Twitter feed

23

Where we Are?

http://www.facebook.com/pages/Eventur-Events-in-Pittsburgh/117954101600897





CoMeT Navigation Support

9/26/2010


Personalization Challenge

• Events: Short living artifacts• Need everything that can work• Content-based recommendation• Collaborative recommendation• Social recommendation• Demographic and group-based

recommendation• Case-Based (Metadata-based)

recommendation


Personalization for Engagement

• Adaptive engagement efforts– Based on user knowledge/goals/interests– Based on user past experience with the

system• Special efforts to deal with cold start:

Using information from other social systems– Social bookmarking systems (CiteULike,

Delicious)– Social linking systems (Facebook, LinkedIn)– Public data (i.e., Google Scholar)

• HetRec 2011 workshop!

http://ir.ii.uam.es/hetrec2011/

http://ir.ii.uam.es/hetrec2011/


Recommendation Approaches

• Various sources of information:– Standard information: Keywords of

bookmarked talks in CoMeT– Keywords of bookmarked papers from

CiteULike– Tags of talks in CoMeT– Tags of papers in CiteULike (CUL)

• Different models for fusion of tags and keywords

9/26/2010


Document Representation Models

• Keywords Only (KO)– Keywords extracted from documents’ titles and

abstracts

• Keywords+n*Tags (KnT)– Keywords extracted from documents’ titles and

abstracts + tags assigned to documents

• Keywords Concatenated by Tags (KCT)– Keywords extracted from documents’ titles and

abstracts + tags assigned to documents9/26/2010


Keywords Only (KO) Model

• Each document: – a bag of words– represented as a vector in keywords vector

space– TF.IDF weightening scheme

9/26/2010

W1

W2

W3

W4

W5

W6

D1 0 1 0 0 0 0

D2 .5 0 0 .5 0 0

D3.12

.13

0.25

.5 0

D4.25

0.25

0.25

.25

Talks/Papers

Keywords

University of Pittsburgh - PAWS Lab

Merging CUL and CoMeT Data in KO Model

9/26/201030

w1 w2

w3 w4

P1 1 0 0 0

P2 .25

0 .5 .25

P3 0 .5 .25 .25

W3

W4

w5

T1 0 1 0

t2 0 0 .5

Dc: CUL Papers’ Matrix Dt: CoMeT Talks’

Matrix

W1 w2 W3 W4 w5

T1 0 0 0 1 0

T2 0 0 0 0 .5

P1 1 0 0 0 0

P2 .25 0 .5 .25 0

P3 0 .5 .25 .25 0

k

l

e

m

K+e

l+m-o

D: Merged Documents’ Matrix

k- the number of CiteULike papersl- the number of keywords used in CiteULike paperse- total number of talks in CoMeTm- total number of keywords in CoMeTo- the number of common keywords between two CoMeT and CiteULike systems


Keywords+n*Tags (KnT) Model• Each document: a bag of words containing :

– document’s abstract, title and tags• Tags: regular keywords

– Each tag appears n times• Merge CUL and CoMeT data in this model: same as KO

9/26/2010

W1 W2W3/T1

W4/T2

T3 T4

D1 0 1 1 0 0 0

D2 1 0 3 5 0 0

D3 1 2 3 0 1 0

D4 2 0 5 0 2 1

Keywords

Talks/Papers

Common Keywords &

Tags

Tags

Keywords: w1, w2, w3,

w2

Tags: T1, T3

D3

n=2

W3=T1W4=T2


Keywords Concatenated by Tags (KCT) Model• Tags: a separated source of information• Each document: a bag of keywords and a

bag of tags– Concatenating keywords and tags vectors– TF.IDF weightening scheme

9/26/2010

W1 W2 W3 W4 T1 T2 T3 T4

D1 0 1 1 0 0 0 0 0

D2 1 0 3 1 0 2 0 0

D3 1 2 1 0 1 0 1 0

D4 2 3 3 0 1 0 2 1

Keywords

Talks/Papers

Tags

W3=T1W4=T2

Keywords: w1, w2, w3,

w2

Tags: T1, T3

D3


Merging CUL and CoMeT Data in KCT Model

9/26/2010

w1 w2

T1 T2

P1 1 0 0 0

P2 .25

0 .5 .25

P3 0 .5 .25 .25

W2

W3

T1

C1 0 1 0

C2 0 0 .5

Dc: CUL Papers’ Matrix Dt: CoMeT Talks’ Matrix

W1 w2 W3 T1 T2

C1 0 0 1 0 0

C2 0 0 0 .5 0

P1 1 0 0 0 0

P2 .25 0 0 .5 .25

P3 0 .5 0 .25 .25

k

m+i

e

l+j

K+e

l+m+i+j-o-p

D: Merged Documents’ Matrix

k- the number of CiteULike papersm- the number of keywords used in CiteULike papersi- the number of tags used in CiteULike paperse- total number of talks in CoMeTl- total number of keywords in CoMeTj- total number of tags in CoMeTo- the number of common keywords between two CoMeT and CiteULike systemsP- the number of common tags between two CoMeT and CiteULike systems


Recommending Talks to Users• K-nearest neighbor method

– recommend top K closest documents to user profile• User profiles: based on users’ bookmarked and

rated talks and papers

9/26/2010

D1 D2

D3 D4

U1 1 0 0 0

U2 .25

0 .5 .25

U3 0 .5 .25 .25

W1

W2

w3

D1 0 1 0

D2 0 0 .5

D3 0 1 0

D4 0 0 .5

U: User Profiles in Talks/Papers

Space

D: Documents in Keywords Space

users

Documents Keywords

w1 w2

w3

U1 1 0 1

U2 .25

0.5

.37

U3 0 .25

.37

UP: User Profiles in Keywords

Space

users

Keywords


Experimental Results

• User study:– 8 real users of both CoMeT and CiteULike systems

• Evaluation questionnaire for each recommended talk:– Is this talk related to your interest? (yes/no

question)– How interesting this talk to you? (in 5-point scale)– If the talk is related to your interests, how novel is

this talk to you? (in 5-step scale)

9/26/2010


Experimental Results (Cont’d)

• Compared six models: – KO, KnT (with n = 1, 2,5; best n = 1), and KCT

• using only CoMeT data • using both, CoMeT and CiteULike

• Measures:– Relevance: precision by yes/no answers– Interest: nDCG by 5-point scale– Novelty: averaged the novelty ratings (Non-

relevant = zero novelty)9/26/2010


Precision results for differentnumber of recommendations

Precision 1 2 3 4 5 6 7 8 9 10

Only CoMeT

Data

KO 0.83 0.67 0.72 0.63 0.6 0.56 0.57 0.5 0.51 0.51

KnT

n = 10.5 0.5 0.58 0.59 0.57 0.58 0.57 0.58 0.6 0.57

KCT 0.5 0.33 0.39 0.46 0.47 0.53 0.52 0.5 0.5 0.53

CoMeT + CiteULike

Data

KO 0.83 0.83 0.67 0.75 0.73 0.69 0.64 0.63 0.56 0.57

KnT

n = 10.63 0.69 0.71 0.72 0.73 0.73 0.71 0.7 0.68 0.67

KCT 0.38 0.44 0.42 0.47 0.48 0.52 0.5 0.49 0.53 0.55

9/26/2010


Precision results for differentnumber of recommendations (Cont’d)

• Adding tag using KnT → better cumulative precision for top 10 recommendations

• Adding CoMeT data in both KnT and KO → higher precision

• KnT with both CoMeT and CUL data → best cumulative precision

• KCT model → decrease in precision– High dimensionality of vector space model →

increased distance of documents and user profiles → decreased variance between similarities of user profile to different talks

9/26/2010


nDCG Results for different number of recommendations

nDCG 1 2 3 4 5 6 7 8 9 10

Only CoMeT

Data

KO 0.9 0.88 0.89 0.93 0.92 0.94 0.95 0.95 0.95 0.96KnT

n = 10.9 0.85 0.82 0.83 0.87 0.88 0.89 0.9 0.91 0.93

KCT 0.84 0.88 0.89 0.9 0.9 0.91 0.92 0.92 0.94 0.95

CoMeT + CiteULike

Data

KO 0.84 0.91 0.9 0.92 0.93 0.94 0.95 0.96 0.96 0.96KnT

n = 10.9 0.9 0.89 0.88 0.9 0.92 0.92 0.94 0.94 0.95

KCT 0.77 0.85 0.84 0.81 0.83 0.84 0.86 0.88 0.91 0.92

9/26/2010


nDCG Results for different number of recommendations (Cont’d)

• KCT and KnT models: using both CiteULike and CoMeT data → increased user cumulative interest

• Best results: tag-less KO model both with and without CiteULike data

9/26/2010


Novelty Results for different number of recommendations

Novelty 1 2 3 4 5 6 7 8 9 10

Only CoMeT

Data

KO 1.75 1.69 1.67 1.72 1.7 1.65 1.66 1.55 1.49 1.44

KnT

n = 11.88 1.75 1.67 1.88 1.88 1.88 2 2.03 1.99 1.93

KCT 2 1.5 1.54 1.56 1.55 1.6 1.63 1.58 1.5 1.5

CoMeT + CiteULike

Data

KO 1.88 1.44 1.33 1.5 1.5 1.52 1.61 1.47 1.44 1.36

KnT

n = 11.75 2.19 1.79 2.06 2.2 2.08 2.02 2.19 2.06 1.96

KCT 1.38 1.31 1.38 1.47 1.58 1.6 1.52 1.47 1.61 1.64

9/26/2010


Novelty Results for different number of recommendations (Cont’d)

• Adding tags using KnT fusion model → largest positive impact

• adding different sources of information → improve the novelty of recommendations – Tags are provided by users → include a broader range of vocabulary – Each user tags: describe a document from her point of view (different

from the terms included in the document)

• Adding CUL data in KO model → decreased novelty– Distinctive natures of CoMeT and CiteULike systems

• CiteULike: adding, reviewing and rating related papers to their research field • CoMeT: information about talks happening within a specific time given on a

particular date users bookmark a more novel, less relevant talk

9/26/2010


Conclusion

• Relevance: a fit to user research work • Interest: an overall attraction of an item• Users interested in talks on more general

topics – little in common with their research interests

• Increased focus of relevance encapsulated in tags → The decrease of system ability to recommend interesting talks with the addition of tags

9/26/2010


Conclusion (Cont’d)

• Including another reliable user profile → increase precision of recommendations; – Considering the way to augment the additional profile

• Using CiteULike data for all models – Increased Relevancy of every recommended documents – Various results of interestingness

• Adding tags – Increased novelty of recommendations (both using CoMeT and CUL

data)– increased relatedness in larger number of recommendations

• Injection of keywords from another source of data: more reliable than including tags for relevancy

• Including tags from various sources of information: more reliable for interestingness or novelty

9/26/2010


Thank you!

9/26/2010

social systems for smaller communities

Technology