developing a tt-mctag for german with an rcg-based parser · developing a tt-mctag for german with...
TRANSCRIPT
![Page 1: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/1.jpg)
Developing a TT-MCTAG for German with anRCG-based Parser
Laura Kallmeyer, Timm Lichte, Wolfgang Maier,Yannick Parmentier⋆, Johannes Dellert
University of Tubingen, Germany⋆CNRS-LORIA, France
LREC 2008, 28.05.2008
Developing a TT-MCTAG for German 1
![Page 2: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/2.jpg)
Aims and scope
Presentation of an implementation framework for a GermanTAG-based grammar
How to design and maintain a grammatical resource ?(i.e., a German TT-MCTAG)
How to connect this with a (2-layered) lexical resource?
How to parse German using these resources?
Outline:
1 The formalism: TAG and TT-MCTAG
2 The implementation framework: XMG and TuLiPA
3 The grammar: GerTT
Developing a TT-MCTAG for German 2
![Page 3: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/3.jpg)
Tree-Adjoining Grammar - Basics
A Tree Adjoining Grammar (TAG) is a set of elementary trees:
a finite set of initial trees
a finite set of auxiliary trees
E.g.:
VP
ADV VP*
easily
VP
NP↓ VP
V NP↓
repaired
Combinatorial operations:
substitution: replacing a non-terminal leaf with an initial tree
adjunction: replacing an internal node with an auxiliary tree
Developing a TT-MCTAG for German 3
![Page 4: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/4.jpg)
Tree-Adjoining Grammar - Example
NP
Peter
VP
NP↓ VP
V NP↓
repaired
NP
the fridgeVP
ADV VP*
easily
derived tree derivation treeVP
NP VP
Peter ADV VP
easily V NP
repaired the fridge
repaired
Peter
1
easily
2
the fridge
22
Developing a TT-MCTAG for German 4
![Page 5: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/5.jpg)
Tree-Adjoining Grammar - Basics
TAGs are mildly context-sensitive:
1 Polynomial time parsing complexity
2 Generation of limited crossing dependencies
3 Constant growth property (semilinearity)
Large TAG grammars:
English and Korean (XTAG, UPenn)
French TAG (Benoit Crabbe’s PhD-thesis)
. . .
Developing a TT-MCTAG for German 5
![Page 6: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/6.jpg)
Why not TAG for German?
The order of complements (and adjuncts) of a verb is flexible.
(1) Peter liebt Susi.1: Peter loves Susi
2: Susi loves Peter
(2) dass Peter heute den Kuhlschrank repariert hatdass den Kuhlschrank heute Peter repariert hat. . .(’that Peter has repaired the fridge today’)
TAG is inappropriate for German, because it is:
not powerful enough for some constructions(i.e., coherent constructions)
not descriptively adequat(i.e., one elementary tree for each permutation)
Developing a TT-MCTAG for German 6
![Page 7: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/7.jpg)
Why not TAG for German?
The order of complements (and adjuncts) of a verb is flexible.
(1) Peter liebt Susi.1: Peter loves Susi
2: Susi loves Peter
(2) dass Peter heute den Kuhlschrank repariert hatdass den Kuhlschrank heute Peter repariert hat. . .(’that Peter has repaired the fridge today’)
TAG is inappropriate for German, because it is:
not powerful enough for some constructions(i.e., coherent constructions)
not descriptively adequat(i.e., one elementary tree for each permutation)
Developing a TT-MCTAG for German 7
![Page 8: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/8.jpg)
TT-MCTAG: a TAG-extension for German
Multi-Component TAG (MCTAG) with shared-nodes locality
Elementary structures are tuples 〈γ, {β1 , ..., βn}〉:
a lexicalized elementary tree γ (the head tree)a tree set {β1 , ..., βn} (the complement trees)
Meaning of tree tuples: During derivation, the β-trees haveto attach to the γ-tree (via node sharing).
Node sharing: In the derivation tree,
1 a β-tree must either be the immediate daughter of its γ-tree,2 or the β-tree must be connected to the daughter of the γ-tree
via a chain of root adjunctions.
⟨
VP
V
repariert
,
VP
NPnom ↓ VP*,
VP
NPacc ↓ VP*
⟩
Developing a TT-MCTAG for German 8
![Page 9: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/9.jpg)
TT-MCTAG example
(3) dass den Kuhlschrank heute Peter repariert(“that Peter repairs the fridge today”)
VP
ADV VP*
heute
*
VP
V
repariert
,
8
>
<
>
:
VP
NPnom ↓ VP*,
VP
NPacc ↓ VP*
9
>
=
>
;
+
NP
Peter
NP
den K.
repariert
NPnom
0
Peter
1
heute
0
NPacc
0
den Kuhlschrank
1
Developing a TT-MCTAG for German 9
![Page 10: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/10.jpg)
The implementation framework:
metagrammar XMG-compiler
lexicon parser parsing results(TuLiPA)
sentence
XMG: eXtensible MetaGrammar (Duchier et al, 2004)
TuLiPA: Tubingen Linguistic Parsing Architecture(Parmentier et al, 2008)
Developing a TT-MCTAG for German 10
![Page 11: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/11.jpg)
eXtensible MetaGrammar (XMG)
(Duchier et al, 2004)
XMG lets one construct a grammar semi-automatically bydescribing tree fragments and their combination. The outputstructures are unlexicalized trees (tree schemata).
Essential for: consistency, design and maintainance efforts
Components:
1 a descripton language
2 a compiler
3 a viewer
4 output format: XML
⇒ XMG has been extended to describe tree sets.
Developing a TT-MCTAG for German 11
![Page 12: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/12.jpg)
XMG: An example
NP↓
substitution node
+
VP
VP*
VP-projection
⇒
VP
NP↓ VP*
complement tree
AP⋄
adverbial anchor
+
VP
VP*
VP-projection
⇒
VP
AP⋄ VP*
adverbial tree
Developing a TT-MCTAG for German 12
![Page 13: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/13.jpg)
XMG: An example
+ ⇒
Developing a TT-MCTAG for German 13
![Page 14: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/14.jpg)
A 2-layered lexicon
Morphological lexicon
maps an (inflected) token to some lemma form, while preservingmorphological information in a feature structure.
vergisst vergessen [pos=v; num=sg; per=3;]
Lemma lexicon
maps a lemma onto tree tuple families, while also containing selectionalrestrictions (e.g., case assignment).
*ENTRY: vergessen*CAT: v*SEM: BinaryRel[pred=vergessen]*ACC: 1*FAM: Vnp2*FILTERS: []*EX:*EQUATIONS:NParg1 → cas = nomNParg2 → cas = acc*COANCHORS:
Developing a TT-MCTAG for German 14
![Page 15: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/15.jpg)
A 2-layered lexicon
Morphological lexicon
maps an (inflected) token to some lemma form, while preservingmorphological information in a feature structure.
vergisst vergessen [pos=v; num=sg; per=3;]
Lemma lexicon
maps a lemma onto tree tuple families, while also containing selectionalrestrictions (e.g., case assignment).
*ENTRY: vergessen*CAT: v*SEM: BinaryRel[pred=vergessen]*ACC: 1*FAM: Vnp2*FILTERS: []*EX:*EQUATIONS:NParg1 → cas = nomNParg2 → cas = acc*COANCHORS:
Developing a TT-MCTAG for German 15
![Page 16: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/16.jpg)
Tubingen Linguistic Parsing Architecture (TuLiPA)
(Parmentier et al, 2008)
Components:
1 TT-MCTAG-to-RCG converter (on-line)
2 RCG parser → RCG derivation forest → TT-MCTAGderivation forest
3 Parse viewer (derived tree, derivation tree, dependency view,semantic representation)
Availability of TuLiPA:written in Java and released under the GNU GPL(http://sourcesup.cru.fr/tulipa/)
Developing a TT-MCTAG for German 16
![Page 17: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/17.jpg)
TuLiPA: Why RCG?
RCG is useful, because:
it has attractive formal properties (polynomially parsable, fullexpressive power of MCS-languages);
there exist parsing algorithms.
⇒ Parser can be reused for other mildly context-sensitiveformalisms!
NB: RCG properly includes MCS. We use a restricted RCG, calledsimple RCG, that is included in MCS.
Developing a TT-MCTAG for German 17
![Page 18: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/18.jpg)
TuLiPA: The graphical frontend
Developing a TT-MCTAG for German 18
![Page 19: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/19.jpg)
TuLiPA: The graphical frontend
Developing a TT-MCTAG for German 19
![Page 20: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/20.jpg)
Ongoing grammar development
GerTT (German TT-MCTAG)
Large-coverage TT-MCTAG for German, including semantics.
Linguistic principals:
no empty elements such as traces and PRO
no control and raising in the syntax
State of implementation:
free word order phenomena:scrambling, coherent constructions, verbal clustering
extraction phenomena:relative clauses, wh-questions, bridging constructions
ca. 70 XMG-classes
Currently, coverage testing is prepared based on the TSNLP testsuite.
Developing a TT-MCTAG for German 20
![Page 21: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/21.jpg)
Summary
TT-MCTAG:
More natural support of flexible word order languages, but stillmildly context-sensitive (in fact only k-TT-MCTAG).
The implementation framework:
XMG + TuLiPA: Immediate control over implementational(consistency) and linguistic (coverage) aspects of thegrammar.
XMG: Effortless means for making systematic changes in thegrammar.
TuLiPA: Easiliy adoptable to other MCS formalisms (given aRCG conversion algorithm).
And GerTT is on his way . . .
Developing a TT-MCTAG for German 21
![Page 22: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,](https://reader030.vdocuments.net/reader030/viewer/2022040918/5e93b3b73a5c8e3379514af5/html5/thumbnails/22.jpg)
References
Denys Duchier,Joseph Le Roux,Yannick Parmentier (2004):The Metagrammar Compiler: An NLP Application with a
Multi-paradigm. Second International Mozart/Oz Conference(MOZ’2004)Architecture.
Yannick Parmentier, Laura Kallmeyer, Wolfgang Maier, TimmLichte, Johannes Dellert (2008):TuLiPA: A syntax-semantics parsing environment for mildly
context-sensitive formalisms. Proceedings of the The NinthInternational Workshop on Tree Adjoining Grammars and RelatedFormalisms (TAG+9).
Developing a TT-MCTAG for German 22