Learning Machine Translation

Download Learning Machine Translation

Post on 22-Apr-2015

120 views

Category:

Documents

6 download

Embed Size (px)

TRANSCRIPT

<p>Learning Machine TransLaTionedited by Cyril Goutte, Nicola Cancedda, Marc Dymetman, and George FosterLearning Machine TransLaTionGoutte, Cancedda, Dymetman, and Foster, editorsLearning Machine Translationedited by Cyril Goutte, Nicola Cancedda, Marc Dymetman, and George FosterThe Internet gives us access to a wealth of information in lan-guages we dont understand. The investigation of automated or semi-automated approaches to translation has become a thriving research feld with enormous commercial potential. This volume investigates how machine learning techniques can improve statistical machine translation, currently at the fore-front of research in the feld. The book looks frst at enabling technologiestechnol-ogies that solve problems that are not machine translation proper but are linked closely to the development of a machine translation system. These include the acquisition of bilingual sentence-aligned data from comparable corpora, automatic construction of multilingual name dictionaries, and word alignment. The book then presents new or improved statistical machine translation techniques, including a discriminative training framework for leveraging syntactic information, the use of semi-supervised and kernel-based learning methods, and the combination of multiple machine translation outputs in order to improve overall translation quality.Cyril Goutte and George Foster are researchers in the Inter-active Language Technologies Group at the Canadian Nation-al Research Councils Institute for Information Technology. Nicola Cancedda and Marc Dymetman are researchers in the Cross-Language Technologies Research Group at the Xerox Research Centre Europe.Neural Information Processing seriescontributorsSrinivas Bangalore, Nicola Cancedda, Josep M. Crego, Marc Dymetman, Jakob Elming, George Foster, Jess Gimnez, Cyril Goutte, Nizar Habash, Gholamreza Haffari, Patrick Haffner, Hitoshi Isahara, Stephan Kanthak, Alexandre Klementiev, Gregor Leusch, Pierre Mah, Llus Mrquez, Evgeny Matusov, I. Dan Melamed, Ion Muslea, Hermann Ney, Bruno Pouliquen, Dan Roth, Anoop Sarkar, John Shawe-Taylor, Ralf Steinberger, Joseph Turian, Nicola Ueffng, Masao Utiyama, Zhuoran Wang, Benjamin Wellington, Kenji Yamadaof related interestPredicting structured Dataedited by Gkhan Bakr, Thomas Hofmann, Bernhard Schlkopf, Alexander J. Smola, Ben Taskar, and S. V. N. VishwanathanMachine learning develops intelligent computer systems that are able to generalize from previously seen examples. A new domain of machine learning, in which the prediction must satisfy the additional constraints found in structured data, poses one of machine learnings greatest challenges: learning functional dependencies between arbitrary input and output domains. This volume presents and ana-lyzes the state of the art in machine learning algorithms and theory in this novel feld.Large-scale Kernel Machinesedited by Lon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason WestonPervasive and networked computers have dramatically reduced the cost of collecting and distributing large datasets. In this context, machine learning algorithms that scale poorly could simply become irrel-evant. We need learning algorithms that scale linearly with the volume of the data while maintaining enough statistical effciency to outperform algorithms that simply process a random subset of the data. This volume offers researchers and engineers practical solutions for learning from large-scale datasets, with detailed descriptions of algorithms and experiments carried out on realistically large datasets. At the same time it offers researchers information that can address the relative lack of theoretical grounding for many useful algorithms.The MIT Press | Massachusetts Institute of Technology | Cambridge, Massachusetts 02142 | http://mitpress.mit.edu978-0-262-07297-7computer science/machine learningLearning Machine TranslationNeural Information Processing SeriesMichael I. Jordan and Thomas Dietterich, editorsAdvances in Large Margin Classiers, Alexander J. Smola, Peter L. Bartlett,Bernhard Sch olkopf, and Dale Schuurmans, eds., 2000Advanced Mean Field Methods: Theory and Practice, Manfred Opper and DavidSaad, eds., 2001Probabilistic Models of the Brain: Perception and Neural Function, Rajesh P. N.Rao, Bruno A. Olshausen, and Michael S. Lewicki, eds., 2002Exploratory Analysis and Data Modeling in Functional Neuroimaging, Friedrich T.Sommer and Andrzej Wichert, eds., 2003Advances in Minimum Description Length: Theory and Applications, Peter D.Gr unwald, In Jae Myung, and Mark A. Pitt, eds., 2005Nearest-Neighbor Methods in Learning and Vision: Theory and Practice, GregoryShakhnarovich, Piotr Indyk, and Trevor Darrell, eds., 2006New Directions in Statistical Signal Processing: From Systems to Brains, SimonHaykin, Jose C. Prncipe, Terrence J. Sejnowski, and John McWhirter, eds., 2007Predicting Structured Data, G okhan Bakr, Thomas Hofmann, Bernhard Sch olkopf,Alexander J. Smola, Ben Taskar, S. V. N. Vishwanathan, eds., 2007Toward Brain-Computer Interfacing, Guido Dornhege, Jose del R. Mill an, ThiloHinterberger, Dennis J. McFarland, Klaus-Robert M uller, eds., 2007LargeScale Kernel Machines, Leon Bottou, Olivier Chapelle, Denis DeCoste, JasonWeston, eds., 2007Learning Machine Translation, Cyril Goutte, Nicola Cancedda, Marc Dymetman,George Foster, eds., 2009Learning Machine TranslationCyril GoutteNicola CanceddaMarc DymetmanGeorge FosterThe MIT PressCambridge, MassachusettsLondon, Englandc 2009 Massachusetts Institute of TechnologyAll rights reserved. No part of this book may be reproduced in any form by any electronicor mechanical means (including photocopying, recording, or information storage and retrieval)without permission in writing from the publisher.For information about special quantity discounts, please email special_sales@mitpress.mit.eduThis book was set in LaTex by the authors and was printed and bound in the United States ofAmericaLibrary of Congress Cataloging-in-Publication DataLearning machine translation / edited by Cyril Goutte. . . [et al.].p. cm. (Neural information processing)Includes bibliographical references and index.ISBN 978-0-262-07297-7 (hardcover : alk. paper) 1. MachinetranslatingStatistical methods. I. Goutte, Cyril.P309.L43 2009418.020285dc22200802932410 9 8 7 6 5 4 3 2 1ContentsSeries Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 A Statistical Machine Translation Primer . . . . . . . . . . . . . 1Nicola Cancedda, Marc Dymetman, George Foster, and Cyril Goutte1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Evaluation of Machine Translation . . . . . . . . . . . . . . . . . . . 31.3 Word-Based MT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Phrase-Based MT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.6 Syntax-Based SMT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.7 Some Other Important Directions . . . . . . . . . . . . . . . . . . . . 301.8 Machine Learning for SMT . . . . . . . . . . . . . . . . . . . . . . . 321.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36I Enabling Technologies 392 Mining Patents for Parallel Corpora . . . . . . . . . . . . . . . . 41Masao Utiyama and Hitoshi Isahara2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.3 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.4 Alignment Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.5 Statistics of the Patent Parallel Corpus . . . . . . . . . . . . . . . . 482.6 MT Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 Automatic Construction of Multilingual Name Dictionaries . . 59Bruno Pouliquen and Ralf Steinberger3.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . 593.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3 Multilingual Recognition of New Names . . . . . . . . . . . . . . . . 683.4 Lookup of Known Names and Their Morphological Variants . . . . . 70vi Contents3.5 Evaluation of Person Name Recognition . . . . . . . . . . . . . . . . 723.6 Identication and Merging of Name Variants . . . . . . . . . . . . . 743.7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 784 Named Entity Transliteration and Discovery in Multilingual Cor-pora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Alexandre Klementiev and Dan Roth4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.3 Co-Ranking: An Algorithm for NE Discovery . . . . . . . . . . . . . 834.4 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925 Combination of Statistical Word Alignments Based on MultiplePreprocessing Schemes . . . . . . . . . . . . . . . . . . . . . . . . 93Jakob Elming, Nizar Habash, and Josep M. Crego5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.3 Arabic Preprocessing Schemes . . . . . . . . . . . . . . . . . . . . . . 955.4 Preprocessing Schemes for Alignment . . . . . . . . . . . . . . . . . . 965.5 Alignment Combination . . . . . . . . . . . . . . . . . . . . . . . . . 975.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.7 Postface: Machine Translation and Alignment Improvements . . . . 1075.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106 Linguistically Enriched Word-Sequence Kernels for DiscriminativeLanguage Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 111Pierre Mahe and Nicola Cancedda6.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.2 Linguistically Enriched Word-Sequence Kernels . . . . . . . . . . . . 1136.3 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . 1196.4 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 125II Machine Translation 1297 Toward Purely Discriminative Training for Tree-Structured Trans-lation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Benjamin Wellington, Joseph Turian, and I. Dan Melamed7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1317.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327.3 Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1407.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Contents vii8 Reranking for Large-Scale Statistical Machine Translation . . . 151Kenji Yamada and Ion Muslea8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1518.2 Background . . . . . . . ....</p>

Recommended

View more >