linguistics 187/287 week 6
DESCRIPTION
Linguistics 187/287 Week 6. Generation Term-rewrite System Machine Translation. Martin Forst, Ron Kaplan, and Tracy King. Generation. Parsing: string to analysis Generation: analysis to string What type of input? How to generate. Why generate?. Machine translation - PowerPoint PPT PresentationTRANSCRIPT
Linguistics 187/287 Week 6Linguistics 187/287 Week 6
Martin Forst, Ron Kaplan, Martin Forst, Ron Kaplan, and Tracy Kingand Tracy King
GenerationGenerationTerm-rewrite SystemTerm-rewrite SystemMachine TranslationMachine Translation
GenerationGeneration
Parsing: string to analysis Generation: analysis to string What type of input? How to generate
Why generate?Why generate?
Machine translationLang1 string -> Lang1 fstr -> Lang2 fstr -> Lang2 string
Sentence condensationLong string -> fstr -> smaller fstr -> new string
Question answering Production of NL reports
– State of machine or process– Explanation of logical deduction
Grammar debugging
F-structures as inputF-structures as input
Use f-structures as input to the generator May parse sentences that shouldn’t be
generated May want to constrain number of generated
options Input f-structure may be underspecified
XLE generatorXLE generator
Use the same grammar for parsing and generation
Advantages– maintainability– write rules and lexicons once
But– special generation tokenizer– different OT ranking
Generation tokenizer/morphologyGeneration tokenizer/morphology
White space– Parsing: multiple white space becomes a single
TB John appears. -> John TB appears TB . TB
– Generation: single TB becomes a single space (or nothing)
John TB appears TB . TB -> John appears. *John appears .
Suppress variant forms– Parse both favor and favour– Generate only one
Morphconfig for parsing & generationMorphconfig for parsing & generation
STANDARD ENGLISH MOPRHOLOGY (1.0)
TOKENIZE:
P!eng.tok.parse.fst G!eng.tok.gen.fst
ANALYZE:
eng.infl-morph.fst G!amerbritfilter.fst
G!amergen.fst
----
Reversing the parsing grammarReversing the parsing grammar
The parsing grammar can be used directly as a generator
Adapt the grammar with a special OT ranking GENOPTIMALITYORDER
Why do this?– parse ungrammatical input– have too many options
Ungrammatical inputUngrammatical input
Linguistically ungrammatical– They walks.– They ate banana.
Stylistically ungrammatical– No ending punctuation: They appear– Superfluous commas: John, and Mary appear.– Shallow markup: [NP John and Mary] appear.
Too many optionsToo many options
All the generated options can be linguistically valid, but too many for applications
Occurs when more than one string has the same, legitimate f-structure
PP placement: – In the morning I left. I left in the morning.
Using the Gen OT rankingUsing the Gen OT ranking
Generally much simpler than in the parsing direction– Usually only use standard marks and NOGOOD
no * marks, no STOPPOINT– Can have a few marks that are shared by several
constructions
one or two for dispreferred
one or two for preferred
Example: Prefer initial PPExample: Prefer initial PP
S --> (PP: @ADJUNCT @(OT-MARK GenGood))
NP: @SUBJ;
VP.
VP --> V
(NP: @OBJ)
(PP: @ADJUNCT).
GENOPTIMALITYORDER NOGOOD +GenGood.
parse: they appear in the morning.
generate: without OT: In the morning they appear.
They appear in the morning.
with OT: In the morning they appear.
Debugging the generatorDebugging the generator
When generating from an f-structure produced by the same grammar, XLE should always generate
Unless:– OT marks block the only possible string– something is wrong with the tokenizer/morphology regenerate-morphemes: if this gets a string the tokenizer/morphology is not the problem
Hard to debug: XLE has robustness features to help
Underspecified InputUnderspecified Input
F-structures provided by applications are not perfect– may be missing features– may have extra features– may simply not match the grammar coverage
Missing and extra features are often systematic– specify in XLE which features can be added and
deleted Not matching the grammar is a more serious
problem
Adding featuresAdding features English to French translation:
– English nouns have no gender– French nouns need gender– Soln: have XLE add gender
the French morphology will control the value
Specify additions in xlerc:– set-gen-adds add "GEND"– can add multiple features:
set-gen-adds add "GEND CASE PCASE"– XLE will optionally insert the feature
Note: Unconstrained additions make generation undecidable
ExampleExample
[ PRED 'dormir<SUBJ>'
SUBJ [ PRED 'chat'
NUM sg
SPEC def ]
TENSE present ]
[ PRED 'dormir<SUBJ>'
SUBJ [ PRED 'chat'
NUM sg
GEND masc
SPEC def ]
TENSE present ]
The cat sleeps. -> Le chat dort.
Deleting featuresDeleting features
French to English translation– delete the GEND feature
Specify deletions in xlerc– set-gen-adds remove "GEND"– can remove multiple features set-gen-adds remove "GEND CASE PCASE"– XLE obligatorily removes the features no GEND feature will remain in the f-structure– if a feature takes an f-structure value, that f-
structure is also removed
Changing valuesChanging values
If values of a feature do not match between the input f-structure and the grammar:– delete the feature and then add it
Example: case assignment in translation– set-gen-adds remove "CASE"
set-gen-adds add "CASE"– allows dative case in input to become accusative
e.g., exceptional case marking verb in input language but regular case in output language
Generation for DebuggingGeneration for Debugging
Checking for grammar and lexicon errors– create-generator english.lfg– reports ill-formed rules, templates, feature
declarations, lexical entries
Checking for ill-formed sentences that can be parsed– parse a sentence– see if all the results are legitimate strings– regenerate “they appear.”
Rewriting/TransferRewriting/Transfer System System
Why a Rewrite SystemWhy a Rewrite System
Grammars produce c-/f-structure output Applications may need to manipulate this
– Remove features– Rearrange features– Continue linguistic analysis (semantics, knowledge
representation – next week)
XLE has a general purpose rewrite system (aka "transfer" or "xfr" system)
Sample Uses of Rewrite SystemSample Uses of Rewrite System
Sentence condensation Machine translation Mapping to logic for knowledge
representation and reasoning Tutoring systems
What does the system do?What does the system do?
Input: set of "facts" Apply a set of ordered rules to the facts
– this gradually changes the set of input facts
Output: new set of facts
Rewrite system uses the same ambiguity management as XLE– can efficiently rewrite packed structures,
maintaining the packing
Example F-structure FactsExample F-structure Facts
PERS(var(1),3)PRED(var(1),girl)CASE(var(1),nom)NTYPE(var(1),common)NUM(var(1),pl)
SUBJ(var(0),var(1))
PRED(var(0),laugh)TNS-ASP(var(0),var(2))TENSE(var(2),pres)
arg(var(0),1,var(1))lex_id(var(0),1)lex_id(var(1),0)
F-structures get var(#) Special arg facts lex_id for each PRED
Facts have two arguments (except arg) Rewrite system allows for any number of arguments
Rule formatRule format Obligatory rule: LHS ==> RHS. Optional rule: LHS ?=> RHS. Unresourced fact: |- clause. LHS
clause : match and delete+clause : match and keep-LHS : negation (don't have fact)LHS, LHS : conjunction( LHS | LHS ) : disjunction{ ProcedureCall } : procedural attachment
RHSclause : replacement facts0 : empty set of replacement factsstop : abandon the analysis
Example rulesExample rules
"PRS (1.0)"
grammar = toy_rules.
"obligatorily add a determiner if there is a noun with no spec"
+NTYPE(%F,%%), -SPEC(%F,%%)==>SPEC(%F,def).
"optionally make plural nouns singular this will split the choice space"
NUM(%F, pl) ?=> NUM(%F, sg).
PERS(var(1),3)PRED(var(1),girl)CASE(var(1),nom)NTYPE(var(1),common)NUM(var(1),pl)
SUBJ(var(0),var(1))
PRED(var(0),laugh)TNS-ASP(var(0),var(2))TENSE(var(2),pres)
arg(var(0),1,var(1))lex_id(var(0),1)lex_id(var(1),0)
Example Obligatory RuleExample Obligatory Rule
"obligatorily add a determiner if there is a noun with no spec"
+NTYPE(%F,%%), -SPEC(%F,%%)==>SPEC(%F,def).
PERS(var(1),3)PRED(var(1),girl)CASE(var(1),nom)NTYPE(var(1),common)NUM(var(1),pl)
SUBJ(var(0),var(1))
PRED(var(0),laugh)TNS-ASP(var(0),var(2))TENSE(var(2),pres)
arg(var(0),1,var(1))lex_id(var(0),1)lex_id(var(1),0)
Output facts: all the input facts plus: SPEC(var(1),def)
Example Optional RuleExample Optional Rule
"optionally make plural nouns singular this will split the choice space"
NUM(%F, pl) ?=> NUM(%F, sg).
PERS(var(1),3)PRED(var(1),girl)CASE(var(1),nom)NTYPE(var(1),common)NUM(var(1),pl)SPEC(var(1),def)
SUBJ(var(0),var(1))
PRED(var(0),laugh)TNS-ASP(var(0),var(2))TENSE(var(2),pres)
arg(var(0),1,var(1))lex_id(var(0),1)lex_id(var(1),0)
Output facts: all the input facts plus choice split: A1: NUM(var(1),pl) A2: NUM(var(1),sg)
Output of example rulesOutput of example rules
Output is a packed f-structure Generation gives two sets of strings
– The girls {laugh.|laugh!|laugh}– The girl {laughs.|laughs!|laughs}
Manipulating setsManipulating sets
Sets are represented with an in_set feature– He laughs in the park with the telescopeADJUNCT(var(0),var(2))
in_set(var(4),var(2))
in_set(var(5),var(2))
PRED(var(4),in)
PRED(var(5),with)
Might want to optionally remove adjuncts– but not negation
Example Adjunct Deletion RulesExample Adjunct Deletion Rules
"optionally remove member of adjunct set"
+ADJUNCT(%%, %AdjSet), in_set(%Adj, %AdjSet), -PRED(%Adj, not)?=> 0.
"obligatorily remove adjunct with nothing in it"
ADJUNCT(%%, %Adj), -in_set(%%,%Adj)==> 0.
He laughs with the telescope in the park.He laughs in the park with the telescopeHe laughs with the telescope.He laughs in the park.He laughs.
Manipulating PREDsManipulating PREDs
Changing the value of a PRED is easy– PRED(%F,girl) ==> PRED(%F,boy).
Changing the argument structure is trickier– Make any changes to the grammatical functions– Make the arg facts correlate with these
Example Passive RuleExample Passive Rule
"make actives passive
make the subject NULL; make the object the subject;
put in features"
SUBJ( %Verb, %Subj), arg( %Verb, %Num, %Subj),
OBJ( %Verb, %Obj), CASE( %Obj, acc)
==>
SUBJ( %Verb, %Obj), arg( %Verb, %Num, NULL), CASE( %Obj, nom),
PASSIVE( %Verb, +), VFORM( %Verb, pass).
the girls saw the monkeys ==>The monkeys were seen.
in the park the girls saw the monkeys ==>In the park the monkeys were seen.
Templates and MacrosTemplates and Macros
Rules can be encoded as templatesn2n(%Eng,%Frn) ::
PRED(%F,%Eng), +NTYPE(%F,%%)
==> PRED(%F,%Frn).
@n2n(man, homme).
@n2n(woman, femme).
Macros encode groups of clauses/factssg_noun(%F) :=
+NTYPE(%F,%%), +NUM(%F,sg).
@sg_noun(%F), -SPEC(%F)
==> SPEC(%F,def).
Unresourced FactsUnresourced Facts
Facts can be stipulated in the rules and refered to– Often used as a lexicon of information not
encoded in the f-structure
For example, list of days and months for manipulation of dates|- day(Monday). |- day(Tuesday). etc.
|- month(January). |- month(February). etc.
+PRED(%F,%Pred), ( day(%Pred) | month(%Pred) ) ==> …
Rule OrderingRule Ordering
Rewrite rules are ordered (unlike LFG syntax rules but like finite-state rules)– Output of rule1 is input to rule2 – Output of rule2 is input to rule3
This allows for feeding and bleeding– Feeding: insert facts used by later rules– Bleeding: remove facts needed by later rules
Can make debugging challenging
Example of Rule FeedingExample of Rule Feeding
Early Rule: Insert SPEC on nouns+NTYPE(%F,%%), -SPEC(%F,%%) ==>
SPEC(%F, def).
Later Rule: Allow plural nouns to become singular only if have a specifier (to avoid bad count nouns)NUM(%F,pl), +SPEC(%F,%%) ==> NUM(%F,sg).
Example of Rule BleedingExample of Rule Bleeding
Early Rule: Turn actives into passives (simplified)SUBJ(%F,%S), OBJ(%F,%O) ==>
SUBJ(%F,%O), PASSIVE(%F,+).
Later Rule: Impersonalize actives SUBJ(%F,%%), -PASSIVE(%F,+) ==>
SUBJ(%F,%S), PRED(%S,they), PERS(%S,3), NUM(%S,pl).
– will apply to intransitives and verbs with (X)COMPs but not transitives
DebuggingDebugging XLE command line: tdbg
– steps through rules stating how they apply============================================Rule 1: +(NTYPE(%F,A)), -(SPEC(%F,B)) ==>SPEC(%F,def) File /tilde/thking/courses/ling187/hws/thk.pl, lines 4-10
Rule 1 matches: [+(2)] NTYPE(var(1),common) 1 --> SPEC(var(1),def)============================================Rule 2: NUM(%F,pl) ?=>NUM(%F,sg) File /tilde/thking/courses/ling187/hws/thk.pl, lines 11-17
Rule 2 matches: [3] NUM(var(1),pl) 1 --> NUM(var(1),sg)============================================Rule 5: SUBJ(%Verb,%Subj), arg(%Verb,%Num,%Subj), OBJ(%Verb,%Obj), CASE(%Obj,acc) ==>SUBJ(%Verb,%Obj), arg(%Verb,%Num,NULL), CASE(%Obj,nom), PASSIVE(%Verb,+), VFORM(%Verb,pass) File /tilde/thking/courses/ling187/hws/thk.pl, lines 28-37
Rule does not apply
girls laughed
Running the Rewrite SystemRunning the Rewrite System
create-transfer : adds menu items load-transfer-rules FILE : loads rules from file f-str window under commands has:
– transfer : prints output of rules in XLE window– translate : runs output through generator
Need to do (where path is $XLEPATH/lib):setenv LD_LIBRARY_PATH
/afs/ir.stanford.edu/data/linguistics/XLE/SunOS/lib
Rewrite SummaryRewrite Summary
The XLE rewrite system lets you manipulate the output of parsing– Creates versions of output suitable for applications– Can involve significant reprocessing
Rules are ordered Ambiguity management is as with parsing
Grammatical Machine TranslationGrammatical Machine Translation
Stefan Riezler & John Maxwell
Source
Translation SystemTranslation System
XLEParsing
TargetF-structuresXLE
Generation F-structures.Transfer
GermanLFG
Translationrules
EnglishLFG
+ Lots of statistics
Transfer-Rule Induction from Transfer-Rule Induction from aligned bilingual corporaaligned bilingual corpora
1. Use standard techniques to find many-to-many candidate word-alignments in source-target sentence-pairs
2. Parse source and target sentences using LFG grammars for German and English
3. Select most similar f-structures in source and target4. Define many-to-many correspondences between
substructures of f-structures based on many-to-many word alignment
5. Extract primitive transfer rules directly from aligned f-structure units
6. Create powerset of possible combinations of basic rules and filter according to contiguity and type matching constraints
InductionInduction
Example sentences: Dafür bin ich zutiefst dankbar.
I have a deep appreciation for that.
Many-to-many word alignment:Dafür{6 7} bin{2} ich{1} zutiefst{3 4 5} dankbar{5}
F-structure alignment:
Extracting Primitive Transfer RulesExtracting Primitive Transfer Rules
Rule (1) maps lexical predicates Rule (2) maps lexical predicates and interprets subj-to-subj link as indication to
map subj of source with this predicate into subject of target and xcomp of source into object of target
%X1, %X2, %X3, … are variables for f-structures
(2) PRED(%X1, sein), SUBJ(%X1,%X2), XCOMP(%X1,%X3) ==> PRED(%X1, have), SUBJ(%X1,%X2) OBJ(%X1,%X3)
(1) PRED(%X1, ich) ==> PRED(%X1, I)
Extracting Complex Transfer RulesExtracting Complex Transfer Rules Complex rules are created by taking all
combinations of primitive rules, and filtering
(4) zutiefst dankbar sein ==> have a deep appreciation
(5) zutiefst dankbar dafür sein ==> have a deep appreciation for that
(6) ich bin zutiefst dankbar dafür ==> I have a deep appreciation for that
Transfer Contiguity constraintTransfer Contiguity constraint Transfer contiguity constraint:
1. Source and target f-structures each have to be connected2. F-structures in the transfer source can only be aligned with
f-structures in the transfer target, and vice versa Analogous to constraint on contiguous and
alignment-consistent phrases in phrase-based SMT Prevents extraction of rule that would translate
dankbar directly into appreciation since appreciation is aligned also to zutiefst
Transfer contiguity allows learning idioms like es gibt - there is from configurations that are local in f-structure but non-local in string, e.g., es scheint […] zu geben - there seems […] to be
Linguistic Filters on Transfer RulesLinguistic Filters on Transfer Rules Morphological stemming of PRED values (Optional) filtering of f-structure snippets based on
consistency of linguistic categories– Extraction of snippet that translates zutiefst dankbar into a
deep appreciation maps incompatible categories adjectival and nominal; valid in string-based world
– Translation of sein to have might be discarded because of adjectival vs. nominal types of their arguments
– Larger rule mapping zutiefst dankbar sein to have a deep appreciation is ok since verbal types match
TransferTransfer
Parallel application of transfer rules in non-deterministic fashion– Unlike XLE ordered-rule rewrite system
Each fact must be transferred by exactly one rule Default rule transfers any fact as itself Transfer works on chart using parser’s unification
mechanism for consistency checking Selection of most probable transfer output is done by
beam-decoding on transfer chart
GenerationGeneration
Bi-directionality allows us to use same grammar for parsing training data and for generation in translation application
Generator has to be fault-tolerant in cases where transfer-system operates on FRAGMENT parse or produces non-valid f-structures from valid input f-structures
Robust generation from unknown (e.g., untranslated) predicates and from unknown f-structures
Robust GenerationRobust Generation
Generation from unknown predicates: – Unknown German word “Hunde” is analyzed by German
grammar to extract stem (e.g., PRED = Hund, NUM = pl) and then inflected using English default morphology (“Hunds”)
Generation from unknown constructions:– Default grammar that allows any attribute to be generated in
any order is mixed as suboptimal option in standard English grammar, e.g. if SUBJ cannot be generated as sentence-initial NP, it will be generated in any position as any category
» extension/combination of set-gen-adds and OT ranking
Statistical ModelsStatistical Models
1. Log-probability of source-to-target transfer rules, where probability r(e|f) or rule that transfers source snippet f into target snippet e is estimated by relative frequency
2. Log-probability of target-to-source transfer rules, estimated by relative frequency
r(e | f ) count( f e)
count( f e' )e
Statistical Models, cont.Statistical Models, cont.
3. Log-probability of lexical translations l(e|f) from source to target snippets, estimated from Viterbi alignments a* between source word positions i=1, …n and target word positions j=1,…,m for stems fi and ej in snippets f and e with relative word translation frequencies t(ej|fi):
4. Log-probability of lexical translations from target to source snippets
l(e | f ) 1
| {i | (i, j) a*} |t(ej | fi )
(i, j)a*
j
Statistical Model, cont.Statistical Model, cont.
5. Number of transfer rules6. Number of transfer rules with frequency 17. Number of default transfer rules8. Log-probability of strings of predicates from
root to frontier of target f-structure, estimated from predicate trigrams in English f-structures
9. Number of predicates in target f-structure10. Number of constituent movements during
generations based on original order of head predicates of the constituents
Statistical Models, cont.Statistical Models, cont.
11. Number of generation repairs
12. Log-probability of target string as computed by trigram language model
13. Number of words in target string
Experimental EvaluationExperimental Evaluation Experimental setup
– German-to-English on Europarl parallel corpus (Koehn ‘02)– Training and evaluation on sentences of length 5-15, for quick
experimental turnaround– Resulting in training set of 163,141 sentences, development
set of 1,967 sentences, test of 1,755 sentences (used in Koehn et al. HLT’03)
– Improved bidirectional word alignment based on GIZA++ (Och et al. EMNLP’99)
– LFG grammars for German and English (Butt et al. COLING’02; Riezler et al. ACL’02)
– SRI trigram language model (Stocke’02)– Comparison with PHARAOH (Koehn et al. HLT’03) and IBM
Model 4 as produced by GIZA++ (Och et al. EMNLP’99)
Experimental Evaluation, cont.Experimental Evaluation, cont.
Around 700,000 transfer rules extracted from f-structures chosen by dependency similarity measure
System operates on n-best lists of parses (n=1), transferred f-structures (n=10), and generated strings (n=1,000)
Selection of most probable translations in two steps:– Most probable f-structure by beam search (n=20) on transfer
chart using features 1-10– Most probable string selected from strings generated from
selected n-best f-structures using features 11-13
Feature weights for modules trained by MER on 750 in-coverage sentences of development set
Automatic EvaluationAutomatic Evaluation
NIST scores (ignoring punctuation) & Approximate Randomization for significance testing (see above)
44% in-coverage of grammars; 51% FRAGMENT parses and/or generation repair; 5% timeouts– In-coverage: Difference between LFG and P not significant– Suboptimal robustness techniques decrease overall quality
M4 LFG P
in-coverage 5.13 *5.82 *5.99
full test set *5.57 *5.62 6.40
Manual EvaluationManual Evaluation
Closer look at in-coverage examples:– Random selection of 500 in-coverage examples – Two independent judges indicated preference for
LFG or PHARAOH, or equality, in blind test– Separate evaluation under criteria of
grammaticality/fluency and translational/semantic adequacy
– Significance assessed by Approximate Randomization via stratified shuffling of preference ratings between systems
Manual EvaluationManual Evaluation
Result differences on agreed-on ratings are statistically significant at p < 0.0001
Net improvement in translational adequacy on agreed-on examples is 11.4% on 500 sentences (57/500), amounting to 5% overall improvement in hybrid system (44% of 11.4%)
Net improvement in grammaticality on agreed-on examples is 15.4% on 500 sentences, amounting to 6.7% overall improvement in hybrid system
adequacy grammaticality
j1\j2 P LFG eq P LFG eq
P 48 8 7 36 2 9
LFG 10 105 18 6 113 17
equal 53 60 192 51 44 223
Examples: LFG > PHARAOHExamples: LFG > PHARAOH
src: in diesem fall werde ich meine verantwortung wahrnehmen
sef: then i will exercise my responsibility
LFG: in this case i accept my responsibility
P: in this case i shall my responsibilities
src: die politische stabilität hängt ab von der besserung der lebensbedingungen
ref: political stability depends upon the improvement of living conditions
LFG: the political stability hinges on the recovery the conditions
P: the political stability is rejects the recovery of the living conditions
Examples: PHARAOH > LFGExamples: PHARAOH > LFG
src: das ist schon eine seltsame vorstellung von gleichheit
ref: a strange notion of equality
LFG: equality that is even a strange idea
P: this is already a strange idea of equality
src: frau präsidentin ich beglückwünsche herrn nicholson zu seinem ausgezeichneten bericht
ref: madam president I congratulate mr nicholson on his excellent report
LFG: madam president I congratulate mister nicholson on his report excellented
P: madam president I congratulate mr nicholson for his excellent report
DiscussionDiscussion High percentage of out-of-coverage examples
– Accumulation of 2 x 20% error-rates in parsing training data – Errors in rule extraction – Together result in ill-formed transfer rules causing high
number of generation failures/repairs
Propagation of errors through the system also for in-coverage examples– Error analysis: 69% transfer errors, 10% due to parse errors
Discrepancy between NIST and manual evaluation– Suboptimal integration of generator, making training and
translation with large n-best lists infeasible– Language and distortion models applied after generation
ConclusionConclusion
Integration of grammar-based generator into dependency-based SMT system achieves state-of-the-art NIST and improved grammaticality and adequacy on in-coverage examples
Possibility of hybrid system since it is determinable when sentences are in coverage of system
Grammatical Machine Grammatical Machine Translation IITranslation II
Ji Fang, Martin Forst, John Maxwell, and Michael Tepper
Overview over different Overview over different approaches to MT approaches to MT
Level of transfer Transfer Disambiguation
“Traditional” MT(e.g. Systran)
String(with minimal analysis)
Mainly hand-developed rules
Heuristics
Statistical MT(e.g. Google)
String(morpholical analysis)(synt. rearrangements)
Phrase correspondences with statistics acquired on
bitexts
Machine-Learned (transfer
probabilities, LM)
Grammatical MT I (2006)
F-structure Term-rewriting rules with statistics induced from
parsed bitexts
Machine-Learned (ME models, LM)
Context-Based MT (Meaningful
Machines)
String Semi-automatically developed phrase pairs
Machine-Learned (LM)
Grammatical MT II (2008)
F-structure Term-rewriting rules without statistics induced from semi-automatically developed phrase pairs,
potentially bitexts
Machine-Learned (ME models, LM)
Limitations of string-based Limitations of string-based approaches approaches
Transfer rules/correspondences of little generality
Problems with long-distance dependencies Perform less well for morphologically rich
(target) languages N-gram LM-based disambiguation seems to
have leveled out
Limitations of string-based Limitations of string-based approaches - little generalityapproaches - little generality
From Europarl: Das tut mir leid. = I’m sorry [about that].
Google (SMT): I’m sorry. Perfect! But: As soon as input changes a bit, we get garbage.
Das tut ihr leid. ‘She is sorry about that.’ It does their suffering.
Der Tod deines Vaters tut mir leid. ‘I am sorry about the death
of your father.’ The death of your father I am sorry.Der Tod deines Vaters tut ihnen leid. ‘They are sorry about the
death of your father.’ The death of your father is doing them sorry.
Limitations of string-based Limitations of string-based approaches - problems with LDDsapproaches - problems with LDDs
From Europarl: Dies stellt eine der großen Herausforderungen für die französische Präsidentschaft dar . =
This is one of the major issues of the French Presidency .
Google (SMT): This is one of the major challenges for the French presidency represents.
Particle verb is identified and translated correctly But: two verbs ungrammatical; seem to be too far
apart to be filtered by LM
Limitations of string-based Limitations of string-based approaches - rich morphology approaches - rich morphology
Language pairs involving morphologically rich languages, e.g., Finnish, are hard
From Koehn (2005, MT Summit)
Limitations of string-based Limitations of string-based approaches - rich morphology approaches - rich morphology
Morphologically rich, free word order languages, e.g. German, are particularly hard as target languages.
Again from Koehn (2005, MT Summit)
Limitations of string-based Limitations of string-based approaches - n-gram LMs approaches - n-gram LMs Even for morphologically poor languages,
improving n-gram LMs becomes increasingly expensive.
Adding data helps improve translation quality (BLEU scores), but not enough.
Assuming best improvement rate observed in Brants et al. (2007), ~400 million times available data needed to attain human translation quality by LM improvement.
Limitations of string-based Limitations of string-based approaches - n-gram LMs approaches - n-gram LMs
From Brants et al. (2007)
Best improvement rate: +0.7 BP/x2
Would need 40 more doublings to obtain human translation quality. (42 + 0.7*40 ≈ 70)
Necessary training data in tokens: 1e22 (1e10*2^40 ≈ 1e22)
4e8 times current English Web (estimate) (2.5e13*4e8 = 1e22)
Limitations of bitext-based Limitations of bitext-based approaches approaches Generally available bitexts are limited in size and
specialized in genre– Parliament proceedings– UN texts– Judiciary texts (from multilingual countries)
Makes it hard to repurpose bitext-based systems to new genres
Induced transfer rules/correspondences often of mediocre quality– “Loose” translations– Bad alignments
Limitations of bitext-based Limitations of bitext-based approaches - availability and qualityapproaches - availability and quality
Readily available bitexts are limited in size and specialized in genre
Approaches to auto-extracting bitexts from the web exist.
Additional data help to some degree, but then effect levels out.– Still a genre bias in bitexts, despite automatic
acquisition?– Still more general problems with alignment quality
etc.?
Limitations of bitext-based Limitations of bitext-based approaches - availability and qualityapproaches - availability and quality
Much more data needed to attain human translation quality
Logarithmic gains (at best) by adding bitext data
From Munteanu & Marcu (2005)
Base Line: 100K - 95M English Words
Mid Line (+auto): + 90K - 2.1M
Top Line (+oracle): + 90K - 2.1M
Context-Based MT /Context-Based MT /Meaningful MachinesMeaningful Machines
Combines example-based MT (EBMT) and SMT
Very large (target) language model, large amount of monolingual text required
No transfer statistics, thus no parallel text required
Translation lexicon is developed semi-automatically (i.e. hand-validated)
Lexicon has slotted phrase pairs (like EBMT), i.e. “NP1 biss ins Gras.” = “NP1 bit the dust.”
Context-Based MT /Context-Based MT /Meaningful Machines - prosMeaningful Machines - pros
High-quality translation lexicon seems to allow for– Easier repurposing of system(s) to new genres– Better translation quality
From Carbonell (2006)
Context-Based MT /Context-Based MT /Meaningful Machines - consMeaningful Machines - cons
Works really well for English-Spanish. How about other language pairs?
Same problems with n-gram LMs as “traditional” SMT; probably affects pairs involving morphologically rich (target) language particularly badly.
How much manual labor involved in development of translation lexicon?
Computationally expensive
Grammatical Machine TranslationGrammatical Machine Translation
Syntactic transfer-based approach Parsing and generation identical/similar
between GMT I and GMT II
pyramid
String-level statistical methods
F-structure transfer rules
– transfer, score target FSs –
– pa
rse
sour
ce, s
core
f-st
ruct
ures
– – generate, pick best realization –
Grammatical Machine Translation Grammatical Machine Translation GMT I vs. GMT IIGMT I vs. GMT II
GMT I
Transfer rules induced from parsed bitexts
Target f-structures ranked using individual transfer rule statistics
GMT II
Transfer rules induced from manually/semi-automatically construc-ted phrase lexicon
Target f-structures ranked using monolingually trained bilexical dependency statistics and general transfer rule statistics
GMT IIGMT II Where do the transfer rules come from? Where do statistics/machine learning come
in?pyramid
String-level statistical methods
F-structure transfer rules
– transfer, score target FSs –
– pa
rse
sour
ce, s
core
f-st
ruct
ures
– – generate, pick best realization –
log-linear model trained on synt. annotated monolingual corpus
log-linear model trained on bitext data; includes score from parse ranking model and very general transfer features
log-linear model trained on bitext data; includes scores from other two models and features/score of monolingually trained model for realization ranking
induced from manually/semi-automatically compiled phrase pairs with ``slots’’; potentially, but not necessarily from bitexts
GMT II - The phrase dictionaryGMT II - The phrase dictionary
Contains phrase pairs with ``slot’’ categories (Ddeff, Ddef, NP1nom, NP1, etc.) that allow for well-formed phrases without being included in induced rules
Currently hand-written Will hopefully be compiled (semi-)automati-
cally from bilingual dictionaries Bitexts might also be used; how exactly
remains to be defined.
GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionaryphrase dictionary Sub-FSs of “slot” variables are not included FS attributes can be defined as irrelevant for
translation, e.g. CASE (in both en and de), GEND (in de). Attributes so defined are never included in induced rules.
set-gen-adds remove CASE GEND FS attributes can be defined as
“remove_equal_features”. Attributes defined as such are not included in induced rules when they are equal.
set remove_equal_features NUM OBJOBL-AG PASSIVE SUBJ TENSE
more general rules
GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionary (noun)phrase dictionary (noun) Ddeff Verfassung = Ddef constitution
PRED(%X1, Verfassung),NTYPE(%X1, %Z2),
NSEM(%Z2, %Z3),COMMON(%Z3, count),
NSYN(%Z2, common)==>PRED(%X1, constitution),NTYPE(%X1, %Z4),
NSYN(%Z4, common).
GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionary (adjective)phrase dictionary (adjective) europäische = European
PRED(%X1, europäisch) ==>PRED(%X1, European).
To accommodate certain non-parallelism with respect to SUBJs of adjectives etc., special mechanism removes SUBJs of non-verbs and makes them addable in generation.
GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionary (verb)phrase dictionary (verb) NP1nom koordiniert NP2acc. =
NP1 coordinates NP2.
PRED(%X1, koordinieren),arg(%X1, 1, %A2),arg(%X1, 2, %A3),VTYPE(%X1, main)==>PRED(%X1, coordinate),arg(%X1, 1, %A2),arg(%X1, 2, %A3),VTYPE(%X1, main).
GMT II - Rule inductionGMT II - Rule induction(argument switching)(argument switching) NP1nom tut NP2dat leid. = NP2
is sorry about NP1.
PRED(%X1, leid#tun),SUBJ(%X1, %A2),OBJ-TH(%X1, %A3),VTYPE(%X1, main)==>PRED(%X1,be),SUBJ(%X1,%A3),XCOMP-PRED(%X1,%Z1),
PRED(%Z1, sorry),OBL(%Z1,%Z2),
PRED(%Z2,about),OBJ(%Z2,%A2),
VTYPE(%X1,copular).
GMT II - Rule inductionGMT II - Rule induction(head switching)(head switching) Ich versuche nur, mich jeder Demagogie zu enthalten. =
It is just that I am trying not to indulge in demagoguery.
NP1nom Vfin nur. = It is ist just that NP1 Vs. +ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur), ADV-
TYPE(%X3,unspec)
==>
PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5), NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,-), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it),
PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just), SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that), COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative), DEGREE(%Z6, positive), nonarg(%Z4,1,%X3),
TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8, pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl), PASSIVE(%Z4,-), VTYPE(%A2,copular).
GMT II - Rule inductionGMT II - Rule induction(more on head switching)(more on head switching) In addition to rewriting terms, system re-attaches
rewritten FS if necessary. Here, this might be the case of %X1.
+ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur), ADV-TYPE(%X3,unspec)==>PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5), NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,-), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it),
PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just), SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that), COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative), DEGREE(%Z6, positive), nonarg(%Z4,1,%X3),
TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8, pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl), PASSIVE(%Z4,-), VTYPE(%A2,copular).
GMT II - Pros and cons of rule GMT II - Pros and cons of rule induction from a phrase dictionaryinduction from a phrase dictionary Development of phrase pairs can be carried out by someone
with little knowledge of grammar and transfer system; manual development of transfer rules would require experts (for boring, repetitive labor).
Phrase pairs can remain stable while grammars keep evolving. Since transfer rules are induced fully automatically, they can easily be kept in sync with grammars.
Induced rules are of much higher quality than rules induced from parsed bitexts (GMT I).
Although there is hope that phrase pairs can be constructed semi-automatically from bilingual dictionaries, it is not yet clear to what extent this can be automated.
If rule induction from parsed bitexts can be improved, the two approaches might well be complementary.
Lessons Learned for Parallel Lessons Learned for Parallel Grammar DevelopmentGrammar Development
Absence of a feature like PERF=+/- is not equivalent to PERF=-.
FS-internal features should not say anything about the function of the FS– Example: PRON-TYPE=poss instead of PRON-
TYPE=pers
Compounds should be analyzed similarly, whether spelt together (de) or apart (en)– Possible with SMOR– Very hard or even impossible with DMOR
Absence of PERF Absence of PERF PERF=- PERF=-
No function info in FS-internal No function info in FS-internal featuresfeatures
I think NP1 Vs. = In my opinion NP1 Vs.
Parallel analysis of compoundsParallel analysis of compounds
More Lessons Learned for Parallel More Lessons Learned for Parallel Grammar DevelopmentGrammar Development
ParGram needs to agree on a parallel PRED value for (personal) pronouns
We need an “interlingua” for numbers, clock times, dates etc.
Guessers should analyze (composite) names similarly
Parallel PRED values for Parallel PRED values for (personal) pronouns(personal) pronouns
Otherwise the number of rules we have to learn for them explodes.de-en: pro/er → he, pro/er → it, pro/sie → she, pro/sie → it,
pro/es → it, pro/es → he, pro/es → sheAlso: PRED-NUM-PERS combination may make no
sense!!! Result: A lot of generator effort for nothing…en-de: he → pro/er, she → pro/sie, it → pro/es, it → pro/er,
it → pro/sie, …
Interlingua for numbers, clock Interlingua for numbers, clock times, dates, etc.times, dates, etc.
We cannot possibly learn transfer rules for all dates.
Guessed (composite) namesGuessed (composite) names
We cannot possibly learn transfer rules for all proper names in this world.
And Yet More Lessons Learned for And Yet More Lessons Learned for Grammar DevelopmentGrammar Development
Reflexive pronouns - PERS and NUM agreement should be insured via inside-out function application, e.g. ((SUBJ ^) PERS)= (^PERS).
Semantically relevant features should not be hidden in CHECK
Reflexive pronounsReflexive pronouns
Introduce their own values for PERS and NUM– Overgeneration: *Ich wasche sich.– NUM ambiguity for (frequent) “sich”– Less generalization possible in transfer rules for
inherently reflexive verbs - 6 rules necessary instead of 1.
Reflexive pronounsReflexive pronouns
Semantically relevant features in Semantically relevant features in CHECKCHECK
sie = they Sie = you (formal)
Since CHECK features are not used for translations, the distinction between “sie” and “Sie” is lost.
Planned experiments - MotivationPlanned experiments - Motivation
We do not have the resources to develop a “general purpose” phrase dictionary in the short or medium term.
Nevertheless, we want to get an idea about how well our new approach may scale.
Planned Experiments 1Planned Experiments 1
Manually develop phrase dictionary for a few hundred Europarl sentences
Train target FS ranking model and realization ranking model on those sentences
Evaluate output in terms of BLEU, NIST and manually
Can we make this new idea work under ideal conditions? It seems we can.
Planned Experiments 2Planned Experiments 2
Manually develop phrase dictionary for a few hundred Europarl sentences
Use bilingual dictionary to add possible phrase pairs that may distract the system
Train target FS ranking model and realization ranking model on those sentences
Evaluate output in terms of BLEU, NIST and manually
How well can our system deal with the “distractors”?
Planned Experiments 3Planned Experiments 3
Manually develop phrase dictionary for a few hundred Europarl sentences
Use bilingual dictionary to add possible phrase pairs that may distract the system
Degrade the phrase dictionary at various levels of severity– Take out a certain percentage of phrase pairs– Shorter phrases may be penalized less than longer ones
Train target FS ranking model and realization ranking model on those sentences
Evaluate output in terms of BLEU, NIST and manually
How good or bad is the output of the system when the bilingual phrase dictionary lacks coverage?
Main Remaining ChallengesMain Remaining Challenges
Get comprehensive and high-quality dictionary of phrase pairs
Get more and better (i.e. more normalized and parallel) analyses from grammars
Improve ranking models, in particular on source side Improve generation behavior of grammars - So far,
grammar development has mostly been “parsing-oriented”.
Efficiency, in particular on the generation side, i.a. packed transfer and generation