classification of unanswerable questions: the rhetoric of twitter
TRANSCRIPT
Classification of unanswerable
questions: the rhetoric of Twitter
Clasificación de preguntas sin respuesta:
la retórica de Twitter
questions: the rhetoric of Twitter
David TomásDepartment of Software and Computing Systems
University of Alicante, Spain
CERI 2012
This presentation is about…
Questions that look like questions when in fact they are not
Corpus-based question classification
A preliminary evaluation
A lot of future work
A perfect way to spread information
Fast, fast, fast
Immediacy: many people is asking questions
Proposal
Wouldn’t it be nice that someone come to
your aid when you need an answer?
New paradigm: systems going to the userNew paradigm: systems going to the user
First problem: who really needs an answer?
Proposal
Question classification problem
Real questions vs. rhetorical questions
Supervised / corpus-based
Corpus + Features + Algorithms
Corpus
Real question: expects an answer, from the
mass or from an individual
Rhetorical question: all the others
what
who whom
whose
which
when where
whyhow
x 100 =
= 220 real + 680 rhetorical
Features
punctuation marks
? ! “
part-of-speech
named entity recognition
entities
WordNet
relations
friends
Twitter language
@ # links
words
interjections
part-of-speech
NN NP VWordNet
average length
% terms found
total terms found
sentiment analysis
polarity
friends
followers
friends/followers
Experiments and results
72
74
76
78
80
Accuracy
60
62
64
66
68
70
72
SVM NB IB1 RF
real + rhetorical
Experiments and results
72
74
76
78
80
Accuracy
Baseline
60
62
64
66
68
70
72
SVM NB IB1 RF
real + rhetorical
Experiments and results
0
10
20
30
40
50
60
70
80
90
Precision
0
SVM NB IB1 RF
real rhetorical
0
10
20
30
40
50
60
70
80
90
100
SVM NB IB1 RF
Recall
real rhetorical
Corpus (2nd attempt)
Unbalanced corpus bias classification
Problem: need for more real questions
Solution: #lazyweb
Corpus (2nd attempt)
Balanced corpus of 1360 questions:
680 rhetorical
680 real (from a set of 2,800 #lazyweb)
Experiments and results
75
80
85
Accuracy
60
65
70
75
SVM NB IB1 RF
real + rhetorical balanced
Baseline50
Experiments and results
80
81
82
83
Accuracy (ablation study)
75
76
77
78
79
80
Punctuation Language Entities POS WordNet Polarity Relations
Selection All
Conclusions and future work
Just a first step
Room for improvement
Augment the corpusAugment the corpus
Truly analyze the rhetoric of Twitter
Integrate in a QA system