deep grammars in hybrid machine translation university of bergen helge dyvik
TRANSCRIPT
![Page 1: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/1.jpg)
Deep Grammarsin Hybrid Machine Translation
University of Bergen
Helge Dyvik
![Page 2: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/2.jpg)
Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian
A 4-year project (2002 - 2006) involving groups at:•The University of Oslo•The University of Bergen•NTNU (The University of Trondheim)
Cooperation with PARC (John Maxwell) and others
![Page 3: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/3.jpg)
The LOGON systemSchematic architecture
![Page 4: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/4.jpg)
XLE: Xerox Linguistic EnvironmentA platform developed over more than 20 years
at Xerox PARC (now PARC)Developer: John Maxwell
•LFG grammar development•Parsing•Generation•Transfer•Stochastic parse selection•Interaction with shallow methods
![Page 5: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/5.jpg)
An LFG analysis:
Det regnet'It rained'
![Page 6: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/6.jpg)
•Develops parallel grammars on XLE:English, French, German, Norwegian, Japanese, Urdu, Welsh, Malagasy, Arabic, Hungarian, Chinese, Vietnamese•‘Parallel grammars’ means parallel f-structures:
A common inventory of featuresCommon principles of analysis
ParGram: The Parallel Grammar ProjectA long-term project (1993-)
![Page 7: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/7.jpg)
LOGON Analysis Modules
Input string
•Tokenization•Named ent.•Compounds•Morphology
LFG lexicons:•NKL-derived•Hand coded
Lexicaltemplates
SyntacticrulesRule templates
c-structures
f-structures
MRSs
Norsk ordbanklexicon
XLE Parser
NorGram String of stemsand tags
Output-inputSupporting knowledgebase
![Page 8: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/8.jpg)
Scope of NorGram
Lexicon: about 80 000 lemmas.In addition:
Automatically analyzed compoundsAutomatically recognized proper names"Guessed" nouns
Syntax: 229 complex rules, giving rise to about 48 000 arcs
Semantics: Minimal Recursion Semantics projections for all readings
![Page 9: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/9.jpg)
Coverage
Performance on an unknown corpus of newspaper text:
•17 randomly selected pieces of text, limited to coherent text,
•comprising 1000 sentences
•taken from 9 newspapers
Adresseavisen, Aftenposten, Aftenposten nett, Bergens Tidende,
Dagbladet, Dagens Næringsliv, Dagsavisen, Fædrelandsvennen, Nordlys,
•from the editions on November 11th 2005.
![Page 10: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/10.jpg)
![Page 11: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/11.jpg)
The LOGON challenge:
From a resource grammar based on independent linguistic principles, derive MRS structures harmonized with the MRS structures of the HPSG English Resource Grammar.
![Page 12: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/12.jpg)
Semantics for translation:Two issues
• The representational subset problem- Desirable: normalization to flat structures withunordered elements.
• Complete and detailed semantic analyses may be unnecessary.
- Desirable: rich possibilities of underspecification
![Page 13: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/13.jpg)
Basics of
Minimal Recursion Semantics
•Developers: A. Copestake, D. Flickinger, R. Malouf, S. Rieheman, I.
Sag
•A framework for the representation of semantic information
•Developed in the context of HPSG and machine translation
(Verbmobil)
•Sources of inspiration:
- Quasi-Logical Form (H. Alshawi):
underspecification, e.g. of quantifier scope
- Shake-and-bake translation (P. Whitelock):
a bag of words as interface structure
![Page 14: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/14.jpg)
An MRS representation
• is a bag of semantic entities (some corresponding to words,
some not),
each with a handle,
• plus a bag of handle constraints allowing the underspecification
of
scope,
• plus a handle and an index.
• Each semantic entity is referred to as an Elementary Predication
(EP).
• Relations among EPs are captured by means of shared
variables.
• There are three elementary variable types:
- handles (or 'labels') (h)
- events (e)
- referential indices (x)
![Page 15: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/15.jpg)
From standard logical form to MRS
«Every ferry crosses some fjord»
Two readings:
Replace operators with generalized quantifiers:
every(variable, restriction, body)some(variable, restriction, body)
The first reading (wide-scope every):
var restriction body
![Page 16: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/16.jpg)
Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)
![Page 17: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/17.jpg)
Underspecified scope by means of handle constraints:
Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)
Wide scope: someWide scope: every
![Page 18: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/18.jpg)
MRS as feature structure (also adding event variables):
Norwegian translation: «Hver ferge krysser en fjord»
![Page 19: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/19.jpg)
Projecting MRS representationsfrom f-structures
«Katten sover»'The cat sleeps'
![Page 20: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/20.jpg)
Projecting MRS representationsfrom f-structures
«Katten sover»'The cat sleeps'
![Page 21: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/21.jpg)
![Page 22: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/22.jpg)
mrs::
![Page 23: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/23.jpg)
mrs::
mrs::
![Page 24: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/24.jpg)
Composition: Top-level MRSwith unions of HCONS and RELS:
![Page 25: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/25.jpg)
![Page 26: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/26.jpg)
Post-processing this structurebrings us back to the LOGON MRS format:
http://decentius.aksis.uib.no/logon/xle-mrs.xml
![Page 28: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/28.jpg)
bil 'car' (as in "Han kjøpte bil" 'He bought [a] car')
No SPEC
![Page 29: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/29.jpg)
disse hans mange spørsmål 'these his many questions'
Multiple SPECs
![Page 30: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/30.jpg)
Han jaget barnet ut nakent'He chased the child out naked'
![Page 31: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/31.jpg)
The Transfer Component
Developer of the formalism: Stephan Oepen
![Page 32: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/32.jpg)
Example of transfer
Source sentence:
Henter han bilen sin?fetches he car.DEF POSS.REFL.SG.MASC'Does he fetch his car?'
Alternative reading:'Does he fetch the one of the car?'
![Page 33: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/33.jpg)
Parse output:
![Page 34: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/34.jpg)
Choosing the first reading of Henter han bilen sin?
![Page 35: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/35.jpg)
Choosing the first reading of Henter han bilen sin?
The variables have features.Interrogative is coded as [SF ques] on the event variable.
![Page 36: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/36.jpg)
Two of fourtransferoutputs
![Page 37: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/37.jpg)
Norwegiantransferinput
One of fourEnglishtransferoutputs
![Page 38: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/38.jpg)
Generator output from the chosen transfer output
![Page 39: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/39.jpg)
Transfer formalism(Stephan Oepen)
The form of a transfer rule:
C = contextI = inputF = filterO = output
![Page 40: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/40.jpg)
Simple example:Lexical transfer rule, transferring bekk into creek
No context, no filter, only the predicate is replaced.
![Page 41: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/41.jpg)
Example with a context restriction:gå en tur (lit. 'go a trip') is transferred into the light-verb constructiontake a trip.
In the context of _tur_n as its second argument,_gå_v is transferred to _take_v.
![Page 42: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/42.jpg)
The SEM-I(Semantic Interface)
A documentation of the external semantic interfacefor a grammar, crucial for the writer of transfer rules.
In order to enforce the maintaining of a SEM-I,LOGON parsing returns fail if every parse containsat least one predicate not in the SEM-I.
![Page 43: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/43.jpg)
A small sectionof the verb partof the NorGramSEM-ISize of the NorwegianSEM-I: slightly lessthan 6000 entries
![Page 44: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/44.jpg)
Parse Selection
Parsing, transfer and generation may each givemany solutions, leading to a fanout tree:
The outputs at each of the three stages arestatistically ranked.
![Page 45: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/45.jpg)
Example of a four-way ambiguity:
Det regnet 'It rained'/'It calculated'/'That one calculated'/'That rain'
The ParsebankerEfficient treebank building by discriminants
Developer: Paul Meurer, Bergen
Predecessors in discriminant analysis:David Carter (1997)Stephan Oepen, Dan Flickinger & al. (2003)
![Page 46: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/46.jpg)
1
2
![Page 47: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/47.jpg)
3
4
![Page 48: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/48.jpg)
Packed representations and discriminants(Paul Meurer)
![Page 49: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/49.jpg)
![Page 50: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/50.jpg)
Clicking on one discriminant is in this case sufficientto select a unique solution:
![Page 51: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/51.jpg)
The Parsebanker
![Page 52: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/52.jpg)
![Page 53: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/53.jpg)
![Page 54: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/54.jpg)
'After all, a human being must be something more than a machine?'
![Page 55: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/55.jpg)
TigerSearchThe implementation is under development by Paul Meurer
Find selected prepositional phrases with sentential objects:
![Page 56: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/56.jpg)
Find selected prepositional phrases with the preposition 'om' and nominal objects:
![Page 57: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik](https://reader036.vdocuments.net/reader036/viewer/2022062417/551a6deb550346b52d8b4e0d/html5/thumbnails/57.jpg)
Find topicalized objects: