error analysis for learning-based coreference resolution olga uryupina 27.05.08
TRANSCRIPT
![Page 1: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/1.jpg)
Error Analysis for Learning-based Coreference Resolution
Olga Uryupina27.05.08
![Page 2: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/2.jpg)
Outline
• CR: state-of-the-art and our system• Distribution of errors• Discussion: possible remedies
![Page 3: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/3.jpg)
Coreference Resolution
„This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York.
..Globalstar still needs to raise $ 600 million,
and Schwartz said that the company would try..
![Page 4: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/4.jpg)
Coreference Resolution
„This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York.
..Globalstar still needs to raise $ 600 million,
and Schwartz said that the company would try..
![Page 5: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/5.jpg)
Coreference Resolution
„This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York.
..Globalstar still needs to raise $ 600 million,
and Schwartz said that the company would try..
![Page 6: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/6.jpg)
Machine Learning Approaches
• Soon et al (2000)• Cardie & Wagstaff (1999)• Strube et al. (2002)• Ng & Cardie (2001-2004)• ACE competition
![Page 7: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/7.jpg)
Features: Soon et al. (2000)
1. Anaphor is a pronoun2. Anaphor is a definite NP3. Anaphor is an NP with a demonstrative pronoun
(„this“,..)4. Antecedent is a pronoun5. Both markables are proper names6. Number agreement7. Gender agreement8. Alias9. Appositive10. Same surface form11. Semantic class agreement12. Distance in sentences
![Page 8: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/8.jpg)
Features: other approaches
Cardie & Wagstaff: 11 FeaturesStrube et al.: 17 Features (the same
standard features + approximate matching (MED))
Ng & Cardie: 53 Features (no improvement on the extended feature set, better results (F=63.4) with manual feature selection)
![Page 9: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/9.jpg)
Performance: Soon et al.
Soon et al‘s system:
Our reimlementation:
C5.0, optimized 56.1 65.5 60.4
C4.5, not optimized
53.5 72.8 61.7
Ripper 44.6 74.8 55.9
SVM 50.9 68.8 58.5
MaxEnt 49.2 64.1 55.7
![Page 10: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/10.jpg)
Performance: Soon et al.
Learning Curve for C5.0
474951535557596163
10 15 20 25 30
![Page 11: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/11.jpg)
Tricky and easy anaphors
Cristea et al. (2002): state-of-the-art coreference resolution systems have essentially the same performance level
Pronominal anaphora – 80%Full-scale coreference – 60%
Hypothesis: tricky vs. easy anaphors
![Page 12: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/12.jpg)
Our system
Goal:Bridge the gap between the theory and the practice:
sophisticated linguistic knowledge + data-driven coreference resolution algorithm
![Page 13: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/13.jpg)
New Features
Different aspects of CR:• Surface similarity (122 features)• Syntax (64)• Semantic Compatibility (29)• Salience (136)• (Anaphoricity)
More or less sophisticated linguistic theories exist for all these phenomena
![Page 14: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/14.jpg)
Evaluation
Methodology• Standart dataset (MUC-7)• Standard learning set-up• Compare to Soon et al. (2001)
![Page 15: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/15.jpg)
Performance (F)
Basic feature set
Extended f. set
Soon et al., C5.0
60.4 N/A
C4.5 61.7 64.6
SVM 58.5 65.4
Ripper 55.9 57.5
MaxEnt 55.7 59.4
![Page 16: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/16.jpg)
Performance
Learning Curve, SVM
505254565860626466
10 15 20 25 30
![Page 17: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/17.jpg)
Error analysis
Different approaches – same performance:
• Same errors?• „Tricky anaphors“? (Cristea et al.,
2002)
Extensive error analysis needed!
![Page 18: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/18.jpg)
Outline
• CR: state-of-the-art and our system• Distribution of errors• Discussion: possible remedies
![Page 19: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/19.jpg)
Recall errors
Errors %
MUC 17 3.6
Markables 166 35.4
Propagated P 31 6.6
Pronouns 77 16.4
NE-matching 31 6.6
Syntax 39 8.3
Nominal anaphora
104 22.2
total 469 100
![Page 20: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/20.jpg)
Recall errors - markables
• Auxilliary doc parts• Tokenization• Modifiers• Bracketing/labeling
![Page 21: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/21.jpg)
Recall errors - markables
.. there was no requirement for tether to be manufactured in a contaminant-free enviroment.
A mesmerizing set.
![Page 22: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/22.jpg)
Recall errors - pronouns
1st pl – reconstructing the group:The retiring Republican chairman of the House
Committee on Science want U.S. Businesses to <..> „We need to make it easier for the private sector..“ Walker said
3rd sg, 3rd pl – (non-)salience:[The explanation] for the History Channel‘s success
begin with its association with another channel owned by the same parent consortium.
![Page 23: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/23.jpg)
Recall errors - nominal
Mostly common noun phrases with different heads, WordNet does not help much
.. a report on the satellites‘ findings <..> the abilities of U.S. Reconnaissance technology <..> the use of advanced intelligence-gathering tools <..> Remote-sensing instruments..
![Page 24: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/24.jpg)
Precision errors
Errors %
MUC 30 7.4
Markables 76 18.6
Pronouns 78 19.1
NE-matching 20 4.9
Syntax 22 5.4
Nominal anaphora
182 44.6
total 408 100
![Page 25: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/25.jpg)
Precision errors- pronouns
• incorrect Parsing/TaggingTwo key vice presidents, [Wei Yen] and Eric Carlson, are leaving to start their own Silicon Valley companies.
• (non-)salience• matching (propagated R)
![Page 26: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/26.jpg)
Precision errors - nominal
Mostly same-head descriptions. Possible solutions:
• modifiers?• anaphoricicty detectors?
![Page 27: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/27.jpg)
P errors – nominal - modifiers
Idea: „red car“ cannot corefer with „blue car“
Problem: list of mutually incompatible properties?
MUC7 test data:incompatible modifiers 30„new“ mod for anaphora 15compatible modifiers 58no modifiers 62
![Page 28: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/28.jpg)
P errors – nominal - dnew
Idea: identify and discard unlikely anaphors
Problem: even a very good detector does not help
![Page 29: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/29.jpg)
Outline
• CR: state-of-the-art and our system• Distribution of errors• Discussion: Possible remedies
![Page 30: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/30.jpg)
Discussion – Errors
Problematic areas:• Data• Preprocessing modules• Features• Resolution strategy
![Page 31: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/31.jpg)
Discussion - Data
• bigger corpus• more uniform doc selection, text
only • better definition of COREF• better scoring
![Page 32: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/32.jpg)
Discussion - Preprocessing
• local improvements (e.g. appositions)
• probabilistic architecture to neutralize errors
![Page 33: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/33.jpg)
Discussion - Features
• feature selection• ensemble learning• more targeted learning for under-
represented phenomena (abbreviations)
![Page 34: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/34.jpg)
Discussion - Resolution
• less local: move to the chains level• less uniform: specific treatment for
different types of anaphors
![Page 35: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/35.jpg)
Discussion – Conclusion
• ML approaches to the Coreference Resolution yield similar performance values
• Some anaphors are indeed tricky (esp. crucial for precision errors)
• But some errors can be eliminated within a ML framework– improving the training material– elaborated integration of preprocessing
modules– more global resolution strategies
![Page 36: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/36.jpg)
Thank You!
![Page 37: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/37.jpg)
Recall errors
Errors %
MUC 17 3.6
Markables 166 35.4
Propagated P 31 6.6
Pronouns 77 16.4
NE-matching 31 6.6
Syntax 39 8.3
Nominal anaphora
104 22.2
total 469 100
![Page 38: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/38.jpg)
Recall errors - MUC
Mainly incorrect bracketing
..said <COREF .. MIN=„vice president“>Jim Johannesen, <COREF .. MIN=„vice president“>vice president of site development for McDonald‘s</COREF></COREF>..
Only clear typos etc considered MUC-errors
![Page 39: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/39.jpg)
Recall errors – propagated P
The company also said the Marine Corps has begun testing two of [its radars] as part of a short-range ballistic missile defense program. That testing could lead to an order for the radars.
Crucial for pronouns and indicators for intrasentential coreference
![Page 40: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/40.jpg)
Recall errors - matching
Mostly ORGANIZATIONs. Problems:• Abbreviations
Federal Communication Commission FCC
• Hyphenated names Ziff-Davis Publishing Ziff
• Foreign namesTaiwan President Lee Teng-huiPresident Lee
![Page 41: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/41.jpg)
Recall errors - syntax
Apposition, copulaProblems:• Parsing mistakes• Missing constructions
..the venture will become synonymous with JSkyB
• P/R trade-off ..Kevlar, a synthetic fiber, and Nomex..
Quantitative constructions.. More than quadruple the three-month daily average of
88,700 shares
![Page 42: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/42.jpg)
Precision errors
Errors %
MUC 30 7.4
Markables 76 18.6
Pronouns 78 19.1
NE-matching 20 4.9
Syntax 22 5.4
Nominal anaphora
182 44.6
total 408 100
![Page 43: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/43.jpg)
Precision errors - matching
Finer NE analysis could help, but mostly too difficult even for humans:Loral
Loral Space and Communications CorpLoral SpaceSpace Systems Loral
![Page 44: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/44.jpg)
Anaphoricity
Some markables are not anaphors. We can tell that by looking at them, without any sophisticated coreference resolution.
Poesio & Vieira, Ng & Cardie – try to identify Discourse New entities automatically
Not used for this talk
![Page 45: Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649f4d5503460f94c6e4d2/html5/thumbnails/45.jpg)
Anaphoricity
Some markables are not anaphors. We can tell that by looking at them, without any sophisticated coreference resolution.
Poesio & Vieira, Ng & Cardie – try to identify Discourse New entities automatically
Not used for this talk