robust constituent-to-dependency conversion for english
DESCRIPTION
This paper suggests a robust way of converting constituent-based trees in the Penn Treebank style into dependency trees for several different English corpora. For English, there already exist conversion tools. However, these tools are often customized enough for a specific corpus that they do not necessarily work as well when applied to different corpora involving newly introduced POS-tags or annotation schemes. The desire to improve conversion portability motivated us to build a new conversion tool that would produce more robust results across different corpora. In particular, we have modified the treatment of head-percolation rules, function tags, coordination, gap- ping, and empty category mappings. We compare our method with the LTH conversion tool used for the CoNLL’07-09 shared tasks. For our experiments, we use 6 different English corpora from OntoNotes release 4.0. To demonstrate the impact our approach has on parsing, we train and test two state-of-the-art dependency parsers, MaltParser and MSTParser, and our own parser, ClearParser, using converted output from both the LTH tool and our method. Our results show that our method removes certain unnecessary non-projective dependencies and generates fewer unclassified dependencies. All three parsers give higher parsing accuracies on average across these corpora using data generated by our method; especially on semantic dependencies.TRANSCRIPT
![Page 1: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/1.jpg)
Robust Constituent-to-Dependency Conversion for English
Jinho D. Choi & Martha PalmerUniversity of Colorado at Boulder
December 3rd, 2010
The 9th International Workshop on Treebanks and Linguistic Theories
![Page 2: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/2.jpg)
Dependency Structure• What is dependency?
- Syntactic or semantic relation between a pair of words.
• Phrase structure vs. dependency structure
- Constituents vs. dependencies
LOC PMODNMOD
places in this city
TMP
events year
S
He
NP VP
NPbought
a car
bought
carHe
a
root
SBJ OBJ
NMOD
2
![Page 3: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/3.jpg)
Dependency Graph• For a sentence s = w1 .. wn , a dependency graph Gs = (Vs, Es)
- Vs = {w0 = root, w1, ... , wn}
- Es = {(wi, r, wj) : wi ≠ wj, wi ∈ Vs, wj ∈ Vs - {w0}, r ∈ Rs}
- Rs = a set of all dependency relations in s
• A well-formed dependency graph
- Unique root, single head, connected, acyclic
- Projective vs. non-projective.
➜ dependency tree
aboughtHeroot car that is redyesterday
aboughtHeroot car that is red
3
![Page 4: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/4.jpg)
Why Constituent-to-Dependency? • Dependency Treebanks in English
- Prague English Dependency Treebank (≈ 0.28M words)
- ISLE English Dependency Treebank (?)
• Constituent Treebanks in English
- The Penn Treebank (> 1M words)
- The OntoNotes Treebank (> 2M words)
- The Genia Treebank (≈ 0.48M words)
- The Craft Treebank (≈ 0.79M words)
• By performing the conversion, we get larger corpora with more diversities.
4
![Page 5: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/5.jpg)
Previous Conversion Tools• Constituent-to-dependency conversion tools
- Penn2Malt.
- LTH conversion tool (Johansson and Nugues, 2007).
- Stanford dependencies (Marneffe et al., 2006).
• LTH conversion tool
- Used for CoNLL 2007 - 2009.
- Generates semantic dependencies from function tags and non-projective dependencies using empty categories.
- Customized for the original Penn Treebank.
• Penn Treebank style phrase structure has made some changes.
5
![Page 6: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/6.jpg)
Changes in Penn Treebank Style Phrase Structure
• Tokenized hyphenated words, inserted NML phrases.
• Introduced some new phrase/token-level tags.
6
ADJP
NNP JJ
New York-based
ADJP
NML
New York
NNP NNP
based
HYPH VBN
-
S
He
NP VP
EDITEDmet
these people
NP NP
this group
PP
NPof
people
NP,
![Page 7: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/7.jpg)
Updated Conversion Tool• Motivations
- The conversion tool needs to be updated as the phrase structure format changes.
- The conversion tool needs to perform robustly across different corpora.
- The conversion tool may generate dependency trees with empty categories.
• Contributions
- Less unclassified dependencies.
- Less unnecessary non-projective dependencies.
- More robust parsing performance across different corpora.
7
![Page 8: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/8.jpg)
Constituent-to-Dependency Conversion• Conversion steps
1. Use head-percolation rules to find the head of each constituent, and make it the parent of all other nodes in the constituent.
2. For certain empty categories (e.g., *T*, *ICH*), make their antecedents children of the empty categories’ parents.
3. Label all dependencies by comparing relations between all head-dependent pairs.
• Head-percolation rules
• A set of rules that defines the head of each constituent.
• e.g., the head of a noun phrase ::= the rightmost noun.
8
![Page 9: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/9.jpg)
Head-percolation Rules
9
![Page 10: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/10.jpg)
Constituent-to-Dependency Conversion
10
far you expect *-2 run *T*-1toHow doroot
1:WHNP-1
NP
2:VP
3:VPNP
4:S
5:VPNP-2
6:SQ
7:SBARQ
1 2345
667
8:TOP
8
far you expect runtoHow doroot
Non-projective dependency
![Page 11: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/11.jpg)
Small Clauses• Object predicates in adjectival small clauses
- LTH tool: direct children of the main verbs.
- Ours: direct children of the subject-nouns.
11
He
happyus
S
NP VP
made S-1
NP-SBJ ADJP-PRD
He made us happy
SBJROOT
root
OBJOPRD
He made us happy
SBJROOT
root
OBJ PRD
ARG0impeller to action
ARG1impelled predication
cause to be
![Page 12: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/12.jpg)
Coordination• A phrase contains coordination if
- It contains a conjunction (CC) or a conj-phrase (CONJP)
- It is tagged as an unlike coordinated phrase (UCP).
- It contains a child annotated with a function tag, ETC.
• Find correct left and right conjunct pairs
12
![Page 13: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/13.jpg)
Coordination
13
root We sold newboughtold books and then books
SNP
NPVP CC ADVP
NPVP
VP
root We sold newboughtold books and then books
root We sold newboughtold books and then books
![Page 14: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/14.jpg)
S
S
said SBAR
*0* S
NP
Putin visited PP
in NP-2
April
VP
VP
Some
NP-1
NP=2
NP=1
some
VP
said
May
S,
Gapping Relations• Parsing gapping relations is hard.
- It is hard to distinguish them from coordination.
- There are not many instances to train with.
14
![Page 15: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/15.jpg)
Gapping Relations
15
Some said1 Putin Aprilin some said2
SBJ PMODOBJ
visited May
ROOT
root
SBJ TMP
,
PGAP-SBJ
DEPGAP-PMOD
Some said1 Putin Aprilin some said2
SBJ PMODOBJ
visited May
ROOT
root
SBJ TMP
,
GAP
SBJP
TMP
![Page 16: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/16.jpg)
Our
Empty Category Mappings
root I know his admiration for and trust in you*RNR*-1 *RNR*-1
NPPP
NP-1
NML
NMLCC
NPPPNML
NML
NMLNP
NP VP
S
• *RNR* (right node raising)
- At least two *RNR* nodes get to be referenced to the same antecedent.
- Map the antecedent to its closest *RNR* node.
16
LTH
![Page 17: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/17.jpg)
Experiments• Corpora
- OntoNotes v4.0.
• Dependency parsers
- MaltParser: swap-lazy algorithm, LibLinear
- MSTParser: Chu-Liu-Edmonds algorithm, MIRA
- ClearParser: shift-eager algorithm, LibLinear
17
EBC EBN SIN XIN WEB WSJ ALLTrain 14,873 11,968 7,259 3,156 13,419 12,311 62,986Eval. 1,291 1,339 1,066 1,296 1,172 1,381 7,545Avg. 15.21 19.49 23.36 29.77 22.01 24.03 21.02
![Page 18: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/18.jpg)
Constituent-to-Dependency Conversion
• Distributions of unclassified dependencies (in %)
• Distributions of non-projective dependencies (in %)
18
EBC EBN SIN XIN WEB WSJ ALLLTH 4.77 1.51 1.16 1.63 1.93 1.93 2.20Our 0.86 0.57 0.33 0.44 1.03 0.25 0.60
EBC EBN SIN XIN WEB WSJ ALLLTH - Dep 1.44 0.81 0.70 0.29 0.95 0.51 0.82Our - Dep 1.29 0.73 0.69 0.21 0.83 0.46 0.73
LTH - Sen 11.14 8.66 8.47 5.30 11.29 7.27 9.27Our - Sen 9.19 7.39 8.22 3.75 9.02 6.24 7.78
![Page 19: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/19.jpg)
Constituent-to-Dependency Conversion
• Parsing accuracy when trained and tested on the same corpora (in %)
19
EBC EBN SIN XIN WEB WSJ ALLMalt - LTH 82.91 86.38 86.20 84.61 85.10 86.93 85.44Malt - Our 83.20 86.40 86.03 84.85 85.45 87.40 85.65
Clear - LTH 83.36 86.32 86.80 85.50 85.53 87.15 85.88Clear - Our 84.06 86.77 86.55 85.41 85.70 87.58 86.09
MST - LTH 81.64 85.47 85.02 84.10 84.05 85.93 84.49MST - Our 82.54 85.68 85.11 83.85 84.03 86.43 84.69
![Page 20: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/20.jpg)
Constituent-to-Dependency Conversion
• Parsing accuracy when trained and tested on different corpora (in %)
20
EBC EBN SIN XIN WEB WSJ ALLMalt - LTH 74.80 82.40 81.74 79.39 80.42 80.59 80.01Malt - Our 75.60 83.05 81.81 81.46 80.81 81.17 80.85
Clear - LTH 76.37 83.16 83.53 81.29 81.83 81.29 81.36Clear - Our 77.14 84.16 83.66 82.45 82.26 82.32 82.16
MST - LTH 76.65 82.45 82.29 80.46 80.64 80.02 80.49MST - Our 77.20 83.06 82.52 80.88 80.82 81.04 81.01
All parsers gave significantly more accurate results when trained and tested on different corpora.
![Page 21: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/21.jpg)
Conclusion• Aims
- Updated the conversion tool with respect to the changes in Penn Treebank style phrase structure.
- Robust conversion across different corpora.
• Contributions
- Less unclassified dependencies
- Less unnecessary non-projective dependencies.
- More robust parsing performance across different corpora.
• ClearParser open-source project
- http://code.google.com/p/clearparser/
21
![Page 22: Robust Constituent-to-Dependency Conversion for English](https://reader033.vdocuments.net/reader033/viewer/2022052506/55795440d8b42ab6648b48eb/html5/thumbnails/22.jpg)
Acknowledgements• Special thanks to Joakim Nivre for helpful insights.
• We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE-CRI 0709167, Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this mate- rial are those of the authors and do not necessarily reflect the views of the National Science Foundation.
22