summary of rule-based reordering space in statistical machine translation
DESCRIPTION
Summary of Rule-based Reordering Space in Statistical Machine Translation.TRANSCRIPT
![Page 1: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/1.jpg)
文献紹介長岡技術科学大学 自然言語処理研究室
松本宏
![Page 2: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/2.jpg)
文献• Title:
• Rule-based Reordering Space in Statistical Machine Translation
• Author:
• Nicolas P'echeux and Alexander Allauzen and Francois Yvon
• Booktitle:
• Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
• Pages:
• 1800--1806
![Page 3: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/3.jpg)
統計機械翻訳において• 並び替え(reordering)は重要
• 並び替え問題には
• 組み合わせ爆発
• 曖昧性
• 可能性高い組み合わせに絞り込むルールが必要
![Page 4: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/4.jpg)
フレーズベースでは
• フレーズごとの並び替えが行われる
• フレーズの中での並び替えを考慮
• しかし、枝刈りでの制限された検索空間しかみない
![Page 5: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/5.jpg)
本文献の貢献1. n-gram SMT system:
• 2-stepに分ける
1. 並び替え
• ソース文の順列ラティス構築
2. ディコーディング
2. SMT NCODEの紹介
• Crego, Josep, François Yvon, and José Mariño. "Ncode: an open source bilingual n-gram smt toolkit." The Prague Bulletin of Mathematical Linguistics 96 (2011): 49-58.
![Page 6: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/6.jpg)
並び替え
アライメント
並び替え
並び替えルール
![Page 7: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/7.jpg)
Reordering Rules Extraction
ソース文の語順関連タグの並び並び替え後の語順
順列順列集合
並び替えルールの取得
部分列 に対して
![Page 8: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/8.jpg)
Reordering Lattices Generation
文 を基本とするラティスを構築
部分単語列並び替えルール{
1.
2.
に対して部分パスを追加
NCODEが最適beam検索を行う3.
![Page 9: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/9.jpg)
Experiment• Data:
• 英仏Basic Traveling Expression Corpus
• 英仏, 英独 NEWS COMMENTARY from WMT’12
• 難しさ: 英独 >>> 英仏 とされている
• SMT tool
• NCODE
• 表記
• m: 翻訳, l: ラティス考慮, u: 目的言語順
![Page 10: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/10.jpg)
• oracle: Tromble, Roy W., et al. "Lattice Minimum Bayes-Risk decoding for statistical machine translation." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008.
![Page 11: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/11.jpg)
![Page 12: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/12.jpg)
![Page 13: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/13.jpg)
![Page 14: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/14.jpg)
Reordering Space Sizes
![Page 15: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/15.jpg)
Reordering Space Sizes
![Page 16: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/16.jpg)
汎化
• POSタグを利用しての書き換えルール
• POS(spos): 12 POS タグ
• Enhanced POS(e50pos): 50 POSタグ
• Brown classes(classes): クラスタリング
![Page 17: Summary of Rule-based Reordering Space in Statistical Machine Translation](https://reader034.vdocuments.net/reader034/viewer/2022042813/548701b0b479590a0d8b5307/html5/thumbnails/17.jpg)
Alternative Tagsets