forest-to-string smt for asian language translation: naist ...travatar! same as [neubig & duh,...
TRANSCRIPT
![Page 1: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/1.jpg)
1
NAIST at WAT 2014
Forest-to-String SMT for Asian Language Translation:NAIST at WAT 2014
Graham NeubigNara Institute of Science and Technology (NAIST)
2014-10-4
![Page 2: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/2.jpg)
2
NAIST at WAT 2014
Features of ASPEC
● Translation between languages with different grammatical structures
流動 プラズマ を 正確 に 測定 する ため に 画像 を 再 構成 した 。
an image was reconstituted in order to measure flowing plasma correctly .
● We all know: Phrase-based MT is not enough
for the accurate measurement of plasma flow image was reconstructed .
![Page 3: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/3.jpg)
3
NAIST at WAT 2014
Solution?: 2-step Translation Process
● Pre-ordering [Weblio, SAS_MT, NII, TMU, NICT]
● RBMT+Statistical Post Editing [TOSHIBA, EIWA]
我々 は 科学論文 を 翻訳する
我々 翻訳する 科学論文
we translatescientific papers
我々 は 科学論文 を 翻訳する
we translatescience thesis
we translatescientific papers
![Page 4: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/4.jpg)
4
NAIST at WAT 2014
This is a lot of work... :(How do I make good
Japanese-Englishpreordering rules?!
How do I make goodJapanese-Chinese
preorderering rules?!
What about error propagation?
What if better preorderingaccuracy doesn't equal better
translation accuracy?
![Page 5: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/5.jpg)
5
NAIST at WAT 2014
Evidence
![Page 6: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/6.jpg)
6
NAIST at WAT 2014
Our Solution: Tree-to-String Translation [Liu+ 06]
友達 と ご飯 を 食べ た
SUF5
VP0-5
PP0-1
VP2-5
PP2-3
N2
P3
V4
N0
P1
VP4-5
a meal
a meal
x1 x0
x1 x0
my friend
my friend
x1 with x0
x1 x0
with
ate
ate
![Page 7: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/7.jpg)
7
NAIST at WAT 2014
Requirements for aTree-to-String Model
This is a test . It uses data .
これはテストです。 データを使用します。ParallelCorpus
Source SentenceParser
Alignments
Rule ExtractionRule ScoringOptimization
Tree-to-StringModel
![Page 8: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/8.jpg)
8
NAIST at WAT 2014
Reducing our work load.How do I make good
Japanese-Englishpreordering rules?!
How do I make goodJapanese-Chinese
preorderering rules?!
What about error propagation?
What if better preorderingaccuracy doesn't equal better
translation accuracy?
XX
X
![Page 9: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/9.jpg)
9
NAIST at WAT 2014
Forest-to-string Translation[Mi+ 08]
I saw a girl with a telescope
PRP0,1
VBD1,2
DT2,3
NN3,4
IN4,5
DT5,6
NN6,7
NP5,7
NP2,4
PP4,7
VP1,7
S0,7
NP2,7
NP0,1
![Page 10: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/10.jpg)
10
NAIST at WAT 2014
Travatar Toolkit
● Forest-to-string translation toolkit
● Supports training, decoding
● Includes preprocessing scripts for parsing, etc.
● Many other features (optimization, Hiero, etc...)
Available open source!http://phontron.com/travatar
![Page 11: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/11.jpg)
11
NAIST at WAT 2014
NAIST WAT System
![Page 12: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/12.jpg)
12
NAIST at WAT 2014
WAT Results
en-ja ja-en zh-ja ja-zh0
10
20
30
40
50
BLEU
en-ja ja-en zh-ja ja-zh0
20
40
60
HUMAN
OtherNAIST
First place in all tasks!
+2.2
+2.7
+3.6+1.8
+13.0
+15.0
+28.3
+3.8
![Page 13: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/13.jpg)
13
NAIST at WAT 2014
System Elements
Travatar!Same as [Neubig & Duh, ACL2014]
Recurrent NeuralNet Language Model
Pre/post Processing(UNK splitting, transliteration)
Dictionaries
![Page 14: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/14.jpg)
14
NAIST at WAT 2014
Recurrent Neural Network LM
● Vector representation → robustness
● Recurrent architecture → longer context
I can eat an apple </s>
![Page 15: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/15.jpg)
15
NAIST at WAT 2014
Pre/post processingUNK segmentation (ja-en)
球内部
球 内部
試験 管立て
試験 管 立て
Kanji Normalization (ja-zh, zh-ja)
イチョウ黄叶
イチョウ黄葉
臭気鉴定师
臭気鑑定師
Transliteration (ja-en)
Japan インテック
Japan Intekku
Dictionary addition (ja-en)
膿瘍
apostema
典型
archetype
![Page 16: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/16.jpg)
16
NAIST at WAT 2014
Conclusion
![Page 17: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)](https://reader034.vdocuments.net/reader034/viewer/2022051901/5ff0f221045ae176ab56c46c/html5/thumbnails/17.jpg)
17
NAIST at WAT 2014
Future Work
LOSE at next year's WAT.(Make Travatar so easy to use that others can use it to make really good MT systems for Asian languages.)
Starting soon! Training scripts to be available:http://phontron.com/project/wat2014