international journal of comparative psychology
TRANSCRIPT
![Page 1: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/1.jpg)
Empirical Dependency-Based Head Finalization for Statistical Chinese-, English-, and French-to-Myanmar (Burmese) Machine Translation
Chenchen Ding†, Ye Kyaw Thu‡, Masao Utiyama‡, Andrew Finch‡, Eiichiro Sumita‡
† Department of Computer Science, University of Tsukuba 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan
‡ National Institute of Information and Communications Technology 3-5 Hikaridai, Seikacho, Sorakugun, Kyoto, 619-0289, Japan
![Page 2: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/2.jpg)
Normalized Kendall’s Tau• Often used to measure differences in word
order between languages• Related to the number of crossing word
alignments• Calculated from word aligned data• In the range from 0 to 1• 0 indicating identical word order
![Page 3: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/3.jpg)
Kendall’s Tau for Myanmar
Language Pair Normalized Kendall’s Tau
Engish-Myanmar 0.538
French-Myanmar 0.487
Chinese-Myanmar 0.315
Korean-Myanmar 0.156
Japanese Myanmar 0.123
![Page 4: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/4.jpg)
Importance of re-ordering
430 40 50 60 70 80 90
0.5
0.6
0.7
0.8
0.9
1.0
pbsmt
tau
Classification
BLEU
Nor
mal
ized
Ken
dall's
tau
Correlation=0.8
0
1
![Page 5: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/5.jpg)
Pre-ordering• Long distance word re-ordering is a
problem for SMT• Pre-ordering approaches have been
successful in SVO-to-SOV translation• Re-order the source in a pre-processing
step• Use re-ordering heuristics in
combination with a high-precision parser• Efficient
5
![Page 6: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/6.jpg)
Exploiting the Head Final Property
• Languages such as Japanese, have the property that the head word typically follows its dependent words
• Parse the source with a head-driven phrase structure grammar (HPSG)• Pre-order with rules operating on the
parse tree
6
• English-to-Japanese : [Isozaki+ 2012]• Chinese-to-Japanese : [Han+ 2012]
![Page 7: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/7.jpg)
Other approaches• HPSG parsers not available for all
languages• [Xu+ 2009] proposed using a
dependency parser• Statistical approaches are also possible,
e.g. LADER (Neubig+ 2012)• Myanmar (Burmese) is a typical SOV
language→ Head-finalization should work
7
![Page 8: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/8.jpg)
Myanmar Language• SOV language‣ Consistently head-final‣ Similar to Japanese and Korean‣ Function morphemes succeed content
morphemes‣ Unlike Japanese and Korean, Myanmar
is analytic (morphemes are non-inflected syllables)
8
![Page 9: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/9.jpg)
Pre-ordering for English-, Chinese-, French-to-Myanmar SMT
• Myanmar (Burmese) Language– Similar syntax to Japanese/Korean→ Transfer techniques used for JA/KO to MY
彼 が 本 を 先生 に あげるJapanese
그 가 책 을 선생님 에게 올린다Korean
သ" သည$ စ&အuပ$ ကiu ဆရ& eပ/အ&/ သည$Myanmar
he nominativemarker book accusative
marker teacher dativemarker give present
markerEnglish literally
9(Content morphemes in black, function morphemes in grey)
![Page 10: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/10.jpg)
Our Head-Finalization• Dependency-based head-finalization– A combination of [Isozaki+ 2012] and [Xu+
2009]• Available for more source languages• Just move the head after modifiers– Simple–With several exceptions → Examples
10
![Page 11: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/11.jpg)
Head finalization for MyanmarOur rules follow 3 basic principles:• Do not break a coordination structure
• English: conj, cc• Do not reorder across punctuation
• English: punct• Auxiliary verbs are placed after their head
verb• English: aux, auxpass, cop
11
![Page 12: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/12.jpg)
Pre-reordering Examples
’ll havei a slice of pizza with pepperoni and mushroom
’llhavei a sliceofpizzawithpepperoni and mushroom
dial zerofirst , then dial the number you would like to call
dialzerofirst , then dialthe numberyou wouldliketocall
12
![Page 13: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/13.jpg)
Myanmar Oriented Process• In [Isozaki+, 2012] topic, nominative, and
accusative markers are inserted on the source side• In Myanmar these are typically omitted, unless
there is ambiguity• We handle negation in Myanmar• Negation prefix and suffix (“ma … buu” like
“ne … pas” in French)• Place negation word immediately before verb,
and “neg” maker immediately after• Source side articles are deleted
![Page 14: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/14.jpg)
Experimental Methodology• Chinese-, English-, French-to-Myanmar– On BTEC corpus
• Train: 155,121 sentence pairs• Dev. : 5,000 sentence pairs• Test : 2,000 sentence pairs
• For dependency parsing– Chinese : Stanford parser– English : Stanford parser– French : Stanford tagger (CC set) + MALT parser
• Statistical approach for comparison– LADER
![Page 15: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/15.jpg)
Myanmar Segmentation• 2 approaches• Syllable segmentation (Fr-My)• Maximum matching (Ch-My, En-My)
syllable
max-matching
က ျေး ဇူး း တင် ပါ တယ်
က ျေးဇူး းတငူ် ပါ တယ်
morphological က ျေးဇူး း တင် ပါ တယ်
![Page 16: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/16.jpg)
Training LADER• 1000 sentences sampled randomly• Used automatic alignment since no
manually-aligned data was available• Using larger training data set sizes with
automatic alignment gave only a small improvement in the original work [Neubig+ 2012].
• Long training times.
![Page 17: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/17.jpg)
Evaluation• Evaluated using BLEU (default MOSES)• Also used RIBES• For distant language pairs, BLEU has
been shown to correlate poorly with human judgements [Goto+ 2011]
• RIBES was developed specifically to evaluate distant language pairs
• Results had similar characteristics with both metrics
17
![Page 18: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/18.jpg)
Results of Chinese-to-Myanmar• Average Kendall’s τ– Baseline : 0.31– LADER : 0.20– Head Final : 0.17
• Maximum matching segmentation
Test
Set
BLE
U
36.0
37.3
38.5
39.8
41.0
Distortion-Limit (the right-most is infinity)0 5.9667 11.9333 17.9
BaselineLADERHead Final
18
![Page 19: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/19.jpg)
Results of English-to-Myanmar• Average Kendall’s τ– Baseline : 0.47– LADER : 0.21– Head Final : 0.21
• Maximum matching segmentation
Test
Set
BLE
U
49.0
50.3
51.5
52.8
54.0
Distortion-Limit (the right-most is infinity)0 5.9667 11.9333 17.9
BaselineLADERHead Final
19
![Page 20: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/20.jpg)
Results of French-to-Myanmar• Average Kendall’s τ– Baseline : 0.46– LADER : 0.24– Head Final : 0.24
• Maximum matching segmentation
Test
Set
BLE
U
44.0
45.3
46.5
47.8
49.0
Distortion-Limit (the right-most is infinity)0 5.9667 11.9333 17.9
BaselineLADERHead Final
20
![Page 21: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/21.jpg)
Summary• Head-final approach works on distant
languages to Myanmar SMT– Can use dependency structure– Simple set of rules
• Future work– Experiment on larger corpora– Experiment on longer sentences
21
![Page 22: International Journal of Comparative Psychology](https://reader035.vdocuments.net/reader035/viewer/2022071601/613d4a8e736caf36b75b96fc/html5/thumbnails/22.jpg)
Thank you very much for listening!