taus mt showcase, moses past, present and future, hieu hoang, university of edinburgh, 12 june 2013
DESCRIPTION
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCoreTRANSCRIPT
TAUS MACHINE TRANSLATION SHOWCASE
Moses Past, Present and Future 09:20 – 09:40 Wednesday, 12 June 2013 Hieu Hoang University of Edinburgh
Sta$s$cal Machine Transla$on with Moses
Hieu Hoang Localiza$on World 2013
0.6227
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 3
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 4
What is Sta$s$cal Machine Transla$on?
It is very temp,ng to say that a book wri5en in Chinese is simply a book wri5en in English which was coded into the “Chinese code.” If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpreta,on we already have useful methods for transla,on?
Warren Weaver 1949
Moses by Hieu Hoang, University of Edinburgh 5
• NLP Applica$on – search engines, text mining etc.
• Big-‐data – bi-‐text from the Internet
• eg. mul$lingual websites, documents
– large monolingual data
• Learn to translate – from previous transla$ons – models of language
What is Sta$s$cal Machine Transla$on?
Moses by Hieu Hoang, University of Edinburgh 6
What is Sta$s$cal Machine Transla$on? Training
Training Data Linguis$c Tools bi-‐text monolingual data dic$onary
SMT System transla$on model language model lots of numbers…
Using
Source Text
SMT System transla$on model language model lots of numbers…
§
Source Text
Moses by Hieu Hoang, University of Edinburgh 7
What is a model?
Moses by Hieu Hoang, University of Edinburgh 8
thanks to Precision Transla$on Tools
• Transla$on Model • Language Model – (of the target language)
What is a model? • Transla$on model – source à transla$on – probability
Moses by Hieu Hoang, University of Edinburgh 9
source target probability
den Vorschlag the proposal 0.6227
‘s proposal 0.1068
a proposal 0.0341
the idea 0.0250
this proposal 0.0227
proposal 0.0205
…. ….
What is a model? • Language model – Likelihood of sentence – in target language
Moses by Hieu Hoang, University of Edinburgh 10
text probability
I would like 0.489
would like to 0.905
like to commend 0.002
to commend the 0.472
commend the rapporteur
0.147
…. ….
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 11
What is Moses?
• Replacement for Pharoah – Academic so_ware – Closed-‐source
• Open source • Re-‐wriaen, clean code – More features
• Large developer community – Ini$ated by Hieu Hoang – Developed at NLP Workshop
Moses by Hieu Hoang, University of Edinburgh 12
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Timeline – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 13
What is Moses?
• Only for Linux • Difficult to use • Unreliable • Only phrase-‐based • Developed by one person • Slow
Common Misconcep$ons
Moses by Hieu Hoang, University of Edinburgh 14
Only works on Linux
• Tested on – Windows 7 (32-‐bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts – Ubuntu 12.10, 32 and 64-‐bit – Debian 6.0, 32 and 64-‐bit – Fedora 17, 32 and 64-‐bit – openSUSE 12.2, 32 and 64-‐bit
• Project files for – Visual Studio – Eclipse on Linux and Mac OSX
Moses by Hieu Hoang, University of Edinburgh 15
Difficult to use • Easier compile and install – Boost bjam – No installa$on required
• Binaries available for – Linux – Mac – Windows/Cygwin – Moses + Friends
• IRSTLM • GIZA++ and MGIZA
• Ready-‐made models trained on Europarl Moses by Hieu Hoang, University of
Edinburgh 16
Unreliable • Monitor check-‐ins • Unit tests • More regression tests • Nightly tests – Run end-‐to-‐end training – hap://www.statmt.org/moses/cruise/
• Tested on all major OSes • Train Europarl models – Phrase-‐based, hierarchical, factored – 8 language-‐pairs – hap://www.statmt.org/moses/RELEASE-‐1.0/models/
Moses by Hieu Hoang, University of Edinburgh 17
Only phrase-‐based model – replacement for Pharoah – extension of Pharaoh
• From the beginning – Factored models – Lamce and confusion network input – Mul$ple LMs, mul$ple phrase-‐tables
• since 2009 – Hierarchical model – Syntac$c models
Moses by Hieu Hoang, University of Edinburgh 18
Developed by one person • ANYONE can contribute
– 50 contributors
‘git blame’ of Moses repository
0% 5% 10% 15% 20% 25% 30% 35% 40%
Moses by Hieu Hoang, University of Edinburgh 19
Slow
thanks to Ken!!
Decoding
-101.7
-101.6
-101.5
-101.4
1 2 3 4 5
Mod
elscore
CPU seconds/sentence excluding loading
Mosescdec
Joshua
Moses by Hieu Hoang, University of Edinburgh 20
Slow
• Mul$threaded
• Reduced disk IO – compress intermediate files
• Reduce disk space requirement
Time (mins) 1-‐core 2-‐cores 4-‐cores 8-‐cores Size (MB)
Phrase-‐based
60 47 (79%)
37 (63%)
33 (56%)
893
Hierarchical 1030 677 (65%)
473 (45%)
375 (36%)
8300
Training
Moses by Hieu Hoang, University of Edinburgh 21
What is Moses? Common Misconcep$ons
• Only for Linux • Difficult to use • Unreliable • Only phrase-‐based • Developed by one person • Slow
Moses by Hieu Hoang, University of Edinburgh 22
What is Moses?
• Only for Linux Windows, Linux, Mac • Difficult to use Easier compile and install • Unreliable Mul$-‐stage tes$ng • Only phrase-‐based Hierarchical, syntax model • Developed by one person everyone • Slow Fastest decoder, mul$threaded training, less IO
Common Misconcep$ons
Moses by Hieu Hoang, University of Edinburgh 23
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 24
Coming up…
Moses by Hieu Hoang, University of Edinburgh 25
• Code cleanup • Incremental Training • Beaer transla$on – smaller model – bigger data – faster training and decoding
• Applica$ons – CAT tools – Speech transla$on
Applica$ons
• EU Project – CASMACAT – MATECAT
Moses by Hieu Hoang, University of Edinburgh 26
Computer-‐Aided Transla$on
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 27
What can we do for you?
– simpler Moses – graphical interface – Windows compa$bility – terminology and glossary – incremental training
• What can you do for us? – code – data – funding
Moses by Hieu Hoang, University of Edinburgh 28
What can we do for you?
– simpler Moses – graphical interface – Windows compa$bility – terminology and glossary – incremental training
• What can you do for us? – code – data – funding
Moses by Hieu Hoang, University of Edinburgh 29