finding high-quality content in social media chenwq 2011/11/26
Post on 19-Dec-2015
217 views
TRANSCRIPT
![Page 1: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/1.jpg)
Finding High-Quality Content in Social Media
chenwq2011/11/26
![Page 2: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/2.jpg)
Authors
Eugene Agichtein
Emory University
Research: Intelligent Information Access Lab (IRLab)
News:our team wins the "Best Paper" award at SIGIR 2011.
![Page 3: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/3.jpg)
Abstract
From the early 2000s,user-generated content has become popular on the web.The quality of user-generated content varies drastically from excel-lent to abuse and spam.To separate high-quality content from the rest automaticallyGraph-based framework– combine the different sources of evidence
in a classification formulation
![Page 4: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/4.jpg)
MODELING CONTENT QUALITYMODELING CONTENT QUALITY
Related workRelated work
CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS
EXPERIMENT & ConclusionEXPERIMENT & Conclusion
11
22
33
44
Contents
![Page 5: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/5.jpg)
Related work
Link analysis in social media
Propagating reputation
Question/answering portals and fo-
rums
Expert finding
Text analysis for content quality
Implicit feedback for ranking
![Page 6: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/6.jpg)
Related work
Link analysis in social media
– G = (V, E)
– V corresponding to the users of a question/an-
swer system
– a directed edge e = (u, v) ∈ E from a user u ∈ V
to a user v ∈ V if user u has answered to at least
one question of user v
– G’ = (V, E’)
PageRank, ExpertiseRank, HITS
![Page 7: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/7.jpg)
MODELING CONTENT QUALITYMODELING CONTENT QUALITY
Related workRelated work
CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS
EXPERIMENT & ConclusionEXPERIMENT & Conclusion
11
22
33
44
Contents
![Page 8: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/8.jpg)
CONTENT QUALITY ANALYSIS——Intrinsic content quality
As a baseline, we use textual features
only—with all word n-grams up to
length 5 that appear in the collection
more than 3 times used as feature-
susers
![Page 9: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/9.jpg)
Punctuation and typos Syntactic and semantic Grammaticality
1. Punctuation
2. Capitalization
3. Spacing density
4. Character-level
entropy
5. Spelling mistakes
6. Out-of-vocabulary
words
1. Average number of
syllables per word
2. Entropy of word
lengths
3. Readability measures
1. Part-of-speech
sequences
2. Formality score
3. Distance between its
(trigram) language
model and several
given language models
CONTENT QUALITY ANALYSIS——Intrinsic content quality
![Page 10: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/10.jpg)
CONTENT QUALITY ANALYSIS——User relationships
items and users Graph
user-user Graphu qanswer
uv
u has answered a question from user v
![Page 11: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/11.jpg)
CONTENT QUALITY ANALYSIS——Usage statistics
The number of clicks on some itemThe dwell time on some item
![Page 12: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/12.jpg)
CONTENT QUALITY ANALYSIS——classification framework
We cast the problem of quality ranking as a binary classification – support vector machines– log-linear classifiers– stochastic gradient boosted trees
Our goal is to discover interesting,well for-mulated and factually accurate content
![Page 13: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/13.jpg)
MODELING CONTENT QUALITYMODELING CONTENT QUALITY
Related workRelated work
CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS
EXPERIMENT & ConclusionEXPERIMENT & Conclusion
11
22
33
44
Contents
![Page 14: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/14.jpg)
MODELING CONTENT QUALITY——user relationships
Our dataset, viewed as a graph as il-lustrated in Figure 1
![Page 15: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/15.jpg)
MODELING CONTENT QUALITY——user relationships
The relationships between questions, users asking and answering questions, and answers can be captured by a tri-partite graph outlined in Figure 2
![Page 16: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/16.jpg)
MODELING CONTENT QUALITY——user relationships
the unique characteristics of the com-munity question/answering domain
![Page 17: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/17.jpg)
MODELING CONTENT QUALITY——user relationships
Question subtree– Q Features from the question being answered– QU Features from the asker of the question being
answered– QA Features from the other answers to the same
question
![Page 18: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/18.jpg)
MODELING CONTENT QUALITY——user relationships
User subtree– UA Features from the answers of the user– UQ Features from the questions of the user– UV Features from the votes of the user– UQA Features from answers received to the
user’s questions– U Other user-based features
![Page 19: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/19.jpg)
MODELING CONTENT QUALITY——user relationships
Question features
![Page 20: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/20.jpg)
MODELING CONTENT QUALITY——user relationships
Implicit user-user relationsG = (V,E)– E = Ea∪Eb∪Ev∪Es∪E+∪E−
Gx = (V,Ex)– hx the vector of hub scores on the vertices V– ax the vector of authority scores– px the vector of PageRank scores– p´x the vector of PageRank scores in the trans-
posed graph
![Page 21: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/21.jpg)
MODELING CONTENT QUALITY——user relationships
Implicit user-user relations
![Page 22: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/22.jpg)
MODELING CONTENT QUALITY——user relationships
Content features for QA
– to identify the most salient features for the specific tasks of question or answer quality classification• the KL-divergence between the
language models of the two texts• their non-stopword overlap• the ratio between their lengths
![Page 23: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/23.jpg)
MODELING CONTENT QUALITY——user relationships
Usage features for QA– number of item views (clicks)– Metadata of question
• how long ago the question was posted– derived statistics
• the expected number of views for a given category
• the deviation from the expected num-ber of views
– other second-order statistics• the click frequency
![Page 24: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/24.jpg)
MODELING CONTENT QUALITYMODELING CONTENT QUALITY
Related workRelated work
CONTENT QUALITY ANALYSISCONTENT QUALITY ANALYSIS
EXPERIMENT & ConclusionEXPERIMENT & Conclusion
11
22
33
44
Contents
![Page 25: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/25.jpg)
Experiment & Conclusions——EXPERIMENTAL SETTING
Dataset
Edges induced from the whole dataset.
![Page 26: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/26.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 27: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/27.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 28: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/28.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 29: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/29.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 30: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/30.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 31: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/31.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 32: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/32.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 33: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/33.jpg)
MODELING CONTENT QUALITY——EXPERIMENTAL SETTING
Dataset statistics
![Page 34: Finding High-Quality Content in Social Media chenwq 2011/11/26](https://reader030.vdocuments.net/reader030/viewer/2022033107/56649d2f5503460f94a06a97/html5/thumbnails/34.jpg)
Thanks for attention!