machine learning for coreference resolution · isabelle tellier 27/02/2015 isabelle tellier...

33
Machine Learning for Coreference Resolution Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33

Upload: others

Post on 23-Jan-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Machine Learning for Coreference Resolution

Isabelle Tellier

27/02/2015

Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33

Page 2: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Introduction : purpose of this presentation

introduce coreference resolution : a difficult and important NLP taskintroduce supervised machine learning : a general approach, severalalgorithmssee how the approach can be applied to the task

Isabelle Tellier Coreference Resolution 27/02/2015 2 / 33

Page 3: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Outline

1 Coreference ResolutionDefinition of CoreferenceHardness of the TaskWays of Treating it

2 Supervised Machine LearningGeneral PropertiesSupervised Machine Learning applied to ClassificationDemo Weka

3 Supervised Classification for Coreference ResolutionANCOR CorpusCoding Coreference into a Classification ProblemVariants of the ProblemExperiments and Results with ANCOR

Isabelle Tellier Coreference Resolution 27/02/2015 3 / 33

Page 4: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Coreference Resolution

Outline

1 Coreference ResolutionDefinition of CoreferenceHardness of the TaskWays of Treating it

2 Supervised Machine LearningGeneral PropertiesSupervised Machine Learning applied to ClassificationDemo Weka

3 Supervised Classification for Coreference ResolutionANCOR CorpusCoding Coreference into a Classification ProblemVariants of the ProblemExperiments and Results with ANCOR

Isabelle Tellier Coreference Resolution 27/02/2015 4 / 33

Page 5: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Coreference Resolution Definition of Coreference

Coreference ResolutionDefinition of Coreference

Anaphore : non symmetric relation between an anaphoric expression(pronoun...) and its antecedentCoreference : relation shared by various expressions referring to a singleentity

« bonjour je suis Madame Nom1 et je souhaiterais parlerà Madame Nom2 s’il vous plaît- oui c’est moi-même, que puis-je pour vous madame ? »

Isabelle Tellier Coreference Resolution 27/02/2015 5 / 33

Page 6: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Coreference Resolution Hardness of the Task

Coreference ResolutionHardness of the Task (Example from D. Kayser)

The teacher sent the pupil to the principal.

He had enough of him.He was talking to his neighbor.He wanted to see him.

Isabelle Tellier Coreference Resolution 27/02/2015 6 / 33

Page 7: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Coreference Resolution Hardness of the Task

Coreference ResolutionHardness of the Task (Examples from Rahman & Ng)

James asked Robert for a favor, but he refused.James asked Robert for a favor, but he was refused.

Keith fired Blaine but he did not regret.Keith fired Blaine although he is diligent.

Emma did not pass the ball to Janie, although she was open.Emma did not pass the ball to Janie, although she should have.

Medvedev will cede the presidency to Putin because he is more popular.Medvedev will cede the presidency to Putin because he is less popular.

Isabelle Tellier Coreference Resolution 27/02/2015 7 / 33

Page 8: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Coreference Resolution Hardness of the Task

Coreference ResolutionHardness of the Task, Interest

well known constraints : agreement in gender and number... arenecessary but not enough !some examples require large knowledge about the world"Winograd Schema Challenge" : possible alternative to the Turingtest !useful for Information Retrieval (search engines), InformationExtraction (automatic filling of a predefined form from a text),Question Answering (answer to a natural language question)useful for summary, translation...

Isabelle Tellier Coreference Resolution 27/02/2015 8 / 33

Page 9: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Coreference Resolution Ways of Treating it

Coreference ResolutionWays of Treating it

two steps :identifying the entities (and their properties such as gender, number...)grouping co-referential entities together

various approaches for the second step :rule-based handwritten systemsas a clustering problem (unsupervised)by supervised machine learning

Isabelle Tellier Coreference Resolution 27/02/2015 9 / 33

Page 10: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning

Outline

1 Coreference ResolutionDefinition of CoreferenceHardness of the TaskWays of Treating it

2 Supervised Machine LearningGeneral PropertiesSupervised Machine Learning applied to ClassificationDemo Weka

3 Supervised Classification for Coreference ResolutionANCOR CorpusCoding Coreference into a Classification ProblemVariants of the ProblemExperiments and Results with ANCOR

Isabelle Tellier Coreference Resolution 27/02/2015 10 / 33

Page 11: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning General Properties

General Schema of Supervised Machine Learning (from Cornuejols &Miclet)

Isabelle Tellier Coreference Resolution 27/02/2015 11 / 33

Page 12: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning General Properties

General Properties

Learning (for a machine)transforms isolated examples (x , y) into a rule, a function f (x) = y

allows to predict the value of f (x) for some new x

requires to generalize (learning "by heart" is useless !)... but not toomuch !there are some traps....

Isabelle Tellier Coreference Resolution 27/02/2015 12 / 33

Page 13: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning General Properties

General Properties

Three possibles "optimal" functions f for a given set of examples (x , y)

Isabelle Tellier Coreference Resolution 27/02/2015 13 / 33

Page 14: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning General Properties

General Properties

Two steps : (1) leaning from examples (2) applying the result to new data.What is necessary for "learning a function f " from examples (x , y) ?

(1) a relevant search space of functions(1) a distance criterion to optimizebut to avoid overfitting, the real criterion is (2) ability to predict newvaluesthus : need of a learning protocol with a training set (1) distinct froma test set (2)

Isabelle Tellier Coreference Resolution 27/02/2015 14 / 33

Page 15: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning General Properties

General Properties

There exist many distinct machine learning algorithms because its choicedepends of :

the nature of x and y (numbers, symbols, strings, vectors...)the choice of the search space for fthe evaluation criterion preferred : ability to predict, computing time,readability/interpretability of the result, incrementally of the result,robustness to new domains...No Free Lunch Theorem : no machine learning approach can do betterthan every other on every possible problem !

Isabelle Tellier Coreference Resolution 27/02/2015 15 / 33

Page 16: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning Supervised Machine Learning applied to Classification

ML applied to Classification

x : data described by features (attributes)y : a (symbolic) class chosen among a finite number of possible ones

Applications :assurance, bank (ability to repay a loan), medicine (diagnosis, drugeffectiveness)...text classification : bag of words representationtext classification according to the domain, the author, the period ofwriting, the language variant... (any metadata)opinion mining !

Isabelle Tellier Coreference Resolution 27/02/2015 16 / 33

Page 17: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning Demo Weka

Demo Weka !

free, open-source, user-friendly softwareimplements many supervised machine learning algorithmswith various protocols and evaluation metricsand many unsupervised (clustering) algorithmsand many other things !

Isabelle Tellier Coreference Resolution 27/02/2015 17 / 33

Page 18: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Machine Learning Demo Weka

Three selected algorithms

Decision Trees (J48) : symbolic approach, readabilitySVM (SMO) : numeric approach, effectiveness (good results),especially for binary classificationNaiveBayes : statistical approach, efficiency (easy computation)

Isabelle Tellier Coreference Resolution 27/02/2015 18 / 33

Page 19: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution

Outline

1 Coreference ResolutionDefinition of CoreferenceHardness of the TaskWays of Treating it

2 Supervised Machine LearningGeneral PropertiesSupervised Machine Learning applied to ClassificationDemo Weka

3 Supervised Classification for Coreference ResolutionANCOR CorpusCoding Coreference into a Classification ProblemVariants of the ProblemExperiments and Results with ANCOR

Isabelle Tellier Coreference Resolution 27/02/2015 19 / 33

Page 20: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution ANCOR Corpus

ANCOR Corpus

ANCOR : transcribed oral corpora of various origins

Corpus Discursive Context Finalisation Interactivity Length et TimeESLO_ANCOR Interview Moderate Weak 417 kMots – 25 h

ESLO_CO2 Interview Moderate Weak 35 kMots – 2,5 hOTG Oral ialogue Very strong Strong 26 kMots – 2 h

Accueil_UBS Telephonic Dialogue Quite Strong Strong 10 kMots – 1 h

Corpus ESLO CO2 OTG UBS TOTALEntities (of any type) 97 939 8 399 7 462 1 872 115 672

among which new ones (NEW) 26,8 % 32,2 % 38,4 % 33,7 % 28,0 %

among which belonging to a coreference chain 73,2 % 67,8 % 61,6 % 66,3 % 72,2 %

Relations (of any type) 44 597 3 670 2 572 655 51 494

Isabelle Tellier Coreference Resolution 27/02/2015 20 / 33

Page 21: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution ANCOR Corpus

ANCOR Corpus

manually annotated coreference chainseach chain link is connected to the first chain link

Isabelle Tellier Coreference Resolution 27/02/2015 21 / 33

Page 22: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution ANCOR Corpus

ANCOR Corpus

0

5

10

15

20

25

30

35

40

45

50

2 3 4 5 < 10 11 < 50 > 50

Fréq

uenc

e (e

n %

)

Longueur des chaînes (en nombre de maillons)

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10 11 < 20 21 < 50 51 < 100 > 100

Fréq

uenc

e (%

)

Distance entre deux maillons de chaînes (en nb de mentions)

Isabelle Tellier Coreference Resolution 27/02/2015 22 / 33

Page 23: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Coding Coreference into a Classification Problem

Coding Coreference into a Classification Problem

refering entities e1, e2, e3... are supposed to be recognized in the text,associated with some features (gender, number, lemma...)the classification problem is : given a couple (ei , ej) of entities, arethey coreferent ? (yes/no)x : set of features associated with (ei , ej)

y : booleanrequired pre-treatment of ANCOR : each chain link is connected withits previous and/or next chain link (so that the distance/contextbetween the entities of a couple are relevant)

Isabelle Tellier Coreference Resolution 27/02/2015 23 / 33

Page 24: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Coding Coreference into a Classification Problem

Coding Coreference into a Classification ProblemPossible Features for describing x = (ei , ej)

possible non-relational features (related to each entity)gender, number...syntactic category (pronoun, definite NP, demonstrative NP ?)new/old entity ? (not always available !)Named Entity ? Class of NE(person/organization/place/date/numeric) ?semantic category (from Wordnet, ontology...) when available

possible relational features (related to the couple)equality/inclusion of ei and ej as strings, rate of commontokens/substringsfor any non-relational feature : equality of their values for ei and ej

distance in characters, words, sentences, paragraph, speech turns...embedding : [[my neighbor]i ’s cat]j (entities cannot co-refer)equality/inclusion of left/right contexts (in terms of tokens/words)equality of the speaker (in dialogues)

Isabelle Tellier Coreference Resolution 27/02/2015 24 / 33

Page 25: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Coding Coreference into a Classification Problem

Coding Coreference into a Classification ProblemReconstructing a coreference chain

reference data are coreference chains and not couplesnecessity to take into account the fact that coreference is transitive

{a1, b1, b2, a2, c1, b3, c2, a3}

IF {a1, a2} = COREF AND {a2, a3} = COREF,THEN {a1, a3} = COREF

So, the chain is = {a1, a2, a3}

Isabelle Tellier Coreference Resolution 27/02/2015 25 / 33

Page 26: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Coding Coreference into a Classification Problem

Coding Coreference into a Classification ProblemEvaluation Metrics

Tm = reference chainSm = predicted chain

Tm = {{1, 2, 4, 7}, {3}, {5, 6}}

Sm = {{1, 2, 3, 4, 7}, {5, 6}}

MUC B3

BLANC CEAF

R = Rc+Rn2 ; P = Pc+Pn

2 complex computation...

Isabelle Tellier Coreference Resolution 27/02/2015 26 / 33

Page 27: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Variants of the Problem

Variants of the ProblemChoice of the Set of Features

all features (30)only relational features (18)without oral (equality of speaker and speech turn distance) relatedfeatures (28)

Isabelle Tellier Coreference Resolution 27/02/2015 27 / 33

Page 28: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Variants of the Problem

Variants of the ProblemGenerating Negative Examples and Training Sets

problem : negative examples are not "given"for any two entities, it is more likely that they are not coreferent

{a1, b1, b2, b3, a2, c1, c2, a3}

{a1,a2} -> {b1,a2}, {b2,a2}, {b3,a2}{a2,a3} -> {c1,a3}, {c2,a3}

big training set : 142 498 couples (24 620 coref, 117 878 not)medium training set : 101 919 couples (17 844 coref, 84 075 not)small training set : 71 881 couples (11 908 cored, 59 973 not)

Isabelle Tellier Coreference Resolution 27/02/2015 28 / 33

Page 29: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Variants of the Problem

Variants of the ProblemChoice of the learning algorithm

Decision Trees (J48) : symbolic approach, readabilitySVM (SMO) : numeric approach, effectiveness (good results),especially for binary classificationNaiveBayes : statistical approach, efficiency (easy computation)

Isabelle Tellier Coreference Resolution 27/02/2015 29 / 33

Page 30: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Experiments and Results with ANCOR

Experiments and Results with ANCOR

Isabelle Tellier Coreference Resolution 27/02/2015 30 / 33

Page 31: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Experiments and Results with ANCOR

Experiments and Results with ANCORComments

Naive Byes is not competitivesmall training sets are enough (over-fitting for larger training sets ?)SVM globally betteruse of all features usefuloral-specific features are not very important (and not high in decisiontrees)important features (from decision trees) : distance between entities,equality of gender, NE nature of entities, embedding

Isabelle Tellier Coreference Resolution 27/02/2015 31 / 33

Page 32: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Experiments and Results with ANCOR

Experiments and Results with ANCORPerspective

learn separately for each sub-corpusfeature selection by incremental adding of one feature after the otherlearn the typing of the links (or learn separately for distinct type)associate this system with an identification of entities : end-to-endsystem

Isabelle Tellier Coreference Resolution 27/02/2015 32 / 33

Page 33: Machine Learning for Coreference Resolution · Isabelle Tellier 27/02/2015 Isabelle Tellier Coreference Resolution 27/02/2015 1 / 33. Introduction : purpose of this presentation introduce

Supervised Classification for Coreference Resolution Experiments and Results with ANCOR

Conclusion

coreference resolution is difficult, there is room for improvementsexperimental study : a lot of parameters are to be taken into account(differences between various results are more important than theresults themselves)machine learning does not prevent from thinking !

Isabelle Tellier Coreference Resolution 27/02/2015 33 / 33