classification of unanswerable questions: the rhetoric of twitter

23
Classification of unanswerable questions: the rhetoric of Twitter Clasificación de preguntas sin respuesta: la retórica de Twitter David Tomás Department of Software and Computing Systems University of Alicante, Spain [email protected] CERI 2012

Upload: david-tomas

Post on 18-Jan-2017

479 views

Category:

Technology


0 download

TRANSCRIPT

Classification of unanswerable

questions: the rhetoric of Twitter

Clasificación de preguntas sin respuesta:

la retórica de Twitter

questions: the rhetoric of Twitter

David TomásDepartment of Software and Computing Systems

University of Alicante, Spain

[email protected]

CERI 2012

This presentation is about…

Twitter

Questions that look like questions when in fact they are not

Corpus-based question classification

A preliminary evaluation

A lot of future work

Twitter

friend

mention @

hashtag #

url

follower

tweet hashtag #

140 characters

RT

Twitter

Twitter

A perfect way to spread information

Fast, fast, fast

Immediacy: many people is asking questions

Twitter

Proposal

Wouldn’t it be nice that someone come to

your aid when you need an answer?

New paradigm: systems going to the userNew paradigm: systems going to the user

First problem: who really needs an answer?

Proposal

Proposal

Question classification problem

Real questions vs. rhetorical questions

Supervised / corpus-based

Corpus + Features + Algorithms

Corpus

Real question: expects an answer, from the

mass or from an individual

Rhetorical question: all the others

what

who whom

whose

which

when where

whyhow

x 100 =

= 220 real + 680 rhetorical

Features

punctuation marks

? ! “

part-of-speech

named entity recognition

entities

WordNet

relations

friends

Twitter language

@ # links

words

interjections

part-of-speech

NN NP VWordNet

average length

% terms found

total terms found

sentiment analysis

polarity

friends

followers

friends/followers

Algorithm

Experiments and results

72

74

76

78

80

Accuracy

60

62

64

66

68

70

72

SVM NB IB1 RF

real + rhetorical

Experiments and results

72

74

76

78

80

Accuracy

Baseline

60

62

64

66

68

70

72

SVM NB IB1 RF

real + rhetorical

Experiments and results

0

10

20

30

40

50

60

70

80

90

Precision

0

SVM NB IB1 RF

real rhetorical

0

10

20

30

40

50

60

70

80

90

100

SVM NB IB1 RF

Recall

real rhetorical

Corpus (2nd attempt)

Unbalanced corpus bias classification

Problem: need for more real questions

Solution: #lazyweb

Corpus (2nd attempt)

Corpus (2nd attempt)

Balanced corpus of 1360 questions:

680 rhetorical

680 real (from a set of 2,800 #lazyweb)

Experiments and results

75

80

85

Accuracy

60

65

70

75

SVM NB IB1 RF

real + rhetorical balanced

Experiments and results

75

80

85

Accuracy

60

65

70

75

SVM NB IB1 RF

real + rhetorical balanced

Baseline50

Experiments and results

80

81

82

83

Accuracy (ablation study)

75

76

77

78

79

80

Punctuation Language Entities POS WordNet Polarity Relations

Selection All

Conclusions and future work

Just a first step

Room for improvement

Augment the corpusAugment the corpus

Truly analyze the rhetoric of Twitter

Integrate in a QA system

Thank you very much

35,679,2

CERI 2012