finding high-quality content in social media

Post on 16-Feb-2016

72 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Finding High-Quality Content in Social Media. c henwq 2011/11/26. Authors. Eugene Agichtein Emory University Research: Intelligent Information Access Lab ( IRLab ) News:our team wins the "Best Paper" award at SIGIR 2011. . Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Finding High-Quality Content in Social Media

chenwq2011/11/26

AuthorsEugene Agichtein

Emory University

Research: Intelligent Information Access Lab (IRLab)

News:our team wins the "Best Paper" award at SIGIR 2011.

AbstractFrom the early 2000s,user-generated content has become popular on the web.The quality of user-generated content varies drastically from excel-lent to abuse and spam.To separate high-quality content from the rest automaticallyGraph-based framework– combine the different sources of evidence

in a classification formulation

MODELING CONTENT QUALITY

Related work

CONTENT QUALITY ANALYSIS

EXPERIMENT & Conclusion

1

2

3

4

Contents

Related work

Link analysis in social media

Propagating reputation

Question/answering portals and fo-rums

Expert finding

Text analysis for content quality

Implicit feedback for ranking

Related work

Link analysis in social media– G = (V, E)

– V corresponding to the users of a question/an-

swer system

– a directed edge e = (u, v) ∈ E from a user u ∈ V

to a user v ∈ V if user u has answered to at least

one question of user v

– G’ = (V, E’)

PageRank, ExpertiseRank, HITS

MODELING CONTENT QUALITY

Related work

CONTENT QUALITY ANALYSIS

EXPERIMENT & Conclusion

1

2

3

4

Contents

CONTENT QUALITY ANALYSIS——Intrinsic content quality

As a baseline, we use textual features only—with all word n-grams up to length 5 that appear in the collection more than 3 times used as feature-susers

Punctuation and typos Syntactic and semantic Grammaticality

1. Punctuation

2. Capitalization

3. Spacing density

4. Character-level

entropy

5. Spelling mistakes

6. Out-of-vocabulary

words

1. Average number of

syllables per word

2. Entropy of word

lengths

3. Readability measures

1. Part-of-speech

sequences

2. Formality score

3. Distance between its

(trigram) language

model and several

given language models

CONTENT QUALITY ANALYSIS——Intrinsic content quality

CONTENT QUALITY ANALYSIS——User relationships

items and users Graph

user-user Graphu qanswer

u vu has answered a question from user v

CONTENT QUALITY ANALYSIS——Usage statistics

The number of clicks on some itemThe dwell time on some item

CONTENT QUALITY ANALYSIS——classification framework

We cast the problem of quality ranking as a binary classification – support vector machines– log-linear classifiers– stochastic gradient boosted trees

Our goal is to discover interesting,well for-mulated and factually accurate content

MODELING CONTENT QUALITY

Related work

CONTENT QUALITY ANALYSIS

EXPERIMENT & Conclusion

1

2

3

4

Contents

MODELING CONTENT QUALITY——user relationships

Our dataset, viewed as a graph as il-lustrated in Figure 1

MODELING CONTENT QUALITY——user relationships

The relationships between questions, users asking and answering questions, and answers can be captured by a tri-partite graph outlined in Figure 2

MODELING CONTENT QUALITY——user relationships

the unique characteristics of the com-munity question/answering domain

MODELING CONTENT QUALITY——user relationships

Question subtree– Q Features from the question being answered– QU Features from the asker of the question being

answered– QA Features from the other answers to the same

question

MODELING CONTENT QUALITY——user relationships

User subtree– UA Features from the answers of the user– UQ Features from the questions of the user– UV Features from the votes of the user– UQA Features from answers received to the

user’s questions– U Other user-based features

MODELING CONTENT QUALITY——user relationships

Question features

MODELING CONTENT QUALITY——user relationships

Implicit user-user relationsG = (V,E)– E = Ea∪Eb∪Ev∪Es∪E+∪E−

Gx = (V,Ex)– hx the vector of hub scores on the vertices V– ax the vector of authority scores– px the vector of PageRank scores– p´x the vector of PageRank scores in the trans-

posed graph

MODELING CONTENT QUALITY——user relationships

Implicit user-user relations

MODELING CONTENT QUALITY——user relationships

Content features for QA– to identify the most salient features for the

specific tasks of question or answer quality classification• the KL-divergence between the

language models of the two texts• their non-stopword overlap• the ratio between their lengths

MODELING CONTENT QUALITY——user relationships

Usage features for QA– number of item views (clicks)– Metadata of question

• how long ago the question was posted– derived statistics

• the expected number of views for a given category

• the deviation from the expected num-ber of views

– other second-order statistics• the click frequency

MODELING CONTENT QUALITY

Related work

CONTENT QUALITY ANALYSIS

EXPERIMENT & Conclusion

1

2

3

4

Contents

Experiment & Conclusions——EXPERIMENTAL SETTING

Dataset

Edges induced from the whole dataset.

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

MODELING CONTENT QUALITY——EXPERIMENTAL SETTING

Dataset statistics

Thanks for attention!

top related