philipp koehn is professor of computer science at johns ... · philipp koehn chief scientist...

73

Upload: others

Post on 21-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent
Page 2: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Philipp KoehnChief ScientistOmniscien Technologies

Professor of Computer ScienceJohns Hopkins University

Philipp Koehn is Professor of Computer Science at Johns Hopkins University andChief Scientist for Omniscien Technologies. He also holds the Chair for MachineTranslation in the School of Informatics at the University of Edinburgh. Philipp is aleader in the field of statistical machine translation research with over 100publications. He is the author of the seminal textbook in the field. Under hisleadership the open source Moses system became the de-facto standard toolkit formachine translation in research and commercial deployment.

Philipp led international research projects such as Euromatrix and CASMACAT.Philipp's research has been funded by the European Union, DARPA, Google,Facebook, Amazon, Bloomberg and several other funding agencies.

Philipp received his PhD in 2003 from the University of Southern California and wasa postdoctoral research associate at MIT. He was a finalist in the European PatentOffice's European Inventor Award in 2013 and received the Award of Honor fromthe International Association of Machine Translation in 2015.

At Omniscien Technologies Philipp has refined machine translation technology foruse in real-world deployments and helped to develop methods for data acquisitionand refinement. Philipp continues to drive innovation and technologicaldevelopment at Omniscien Technologies.

AI, MT and Language Processing Symposium

Page 3: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Philipp KoehnChief ScientistOmniscien Technologies

Professor of Computer ScienceJohns Hopkins University

The recent trend of using deep learning to solve a wide variety of problems inArtificial Intelligence has also reached machine translation - thus establishing a newstate-of-the-art approach for this application. This approach is not yet settled byany means. New neural architectures are proposed and ideas coming from suchdiverse fields as computer vision, game playing, and speech recognition can beapplied to machine translation as well.

At the practical end, we are just learning about the deployment challenges of thistechnology, since old methods, for example, to integrate terminology databases ordomain adaptation no longer apply.

This presentation will give an overview of the latest developments in research andwhat this means for practical deployment.

AI, MT and Language Processing Symposium

Page 4: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Facebook.com/omniscien @omniscientech Omniscien Technologies [email protected]

Research in Translation –What is Exciting and Shows Promise Ahead?

Philipp Koehn

Page 5: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

5Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Overview

• Evolution of Machine Translation

• Deep Learning

• Neural Machine Translation

• Challenges

• Looking Forward

Page 6: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Evolution of Machine Translation

Page 7: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

8Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Machine Translation Paradigms

• Various Approaches• Rule-based (1970s)

• Word-based (1990s)

• Phrase-based (2000s)

• Syntax-based (2010s)

• Neural-based (2016+)

Source Target

Interlingua

Semantic Transfer

Syntax Transfer

Lexical Transfer

Page 8: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

9Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Hype and Reality

Page 9: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

10Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Better Machine Learning

• Probabilistic models (1990s)

• Increased use of machine learning (2000s)

• Neural networks (since mid 2010s)

Page 10: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Deep Learning

Page 11: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

14Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Two Objectives

Fluency

• Translation must be fluent in the target language

• Need model that assigns a language score to each sentence

Adequacy

• Translation must have same meaning as source sentence

• Need model that assigns a translationscore to each sentence

Page 12: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

15Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Learning from Data

• Detect patterns in aligned segment pairs

Page 13: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

16Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Machine Learning

• Key to success• Analyze problem

• Feature engineering

• For instance: machine translation• What features are relevant for word order?

• What features are relevant for lexical translation?

Input

Features

Output

Page 14: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

17Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Learning

• Promise: no more feature engineering

• Several steps of processing features automatically discovered

Input

Hidden

Output

Page 15: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

18Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Deep Learning

• More layers

• More complex

feature interactionsHidden

Hidden

Output

Input

Hidden

Page 16: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Machine Translation

Page 17: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

20Copyright © 2018 Omniscien Technologies. All Rights Reserved.

word2vec

• Task: Predict word in the middle

Page 18: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

21Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Network Solution

• Learn mapping with a neural network

Page 19: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

22Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Map Word to Embedding

• Vector representation of word

• Mathematically: • a matrix multiplication

• followed by an non-linear activation function

Page 20: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

23Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Visualizing Neural Relationships and Features

Relationships are built much like the human brain.Collections of concepts and vocabulary.

Page 21: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

24Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Visualizing Neural Relationships and Features

Distance indicates closeness of relationships.Groupings are formed.

Page 22: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

25Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Visualizing Neural Relationships and Features

Groups are directly and indirectly interrelated.i.e. Sports + Broadcasting and Entertainment

Page 23: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

26Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Machine Translation

• Recall: two models

• Language model

… to ensure fluent output

• Translation model

… to ensure adequate translations

Page 24: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

27Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Language Models

• Sequential language models:

predict the next word

I …

Page 25: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

28Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Language Models

• Sequential language models:

predict the next word

I like ....

Page 26: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

29Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Language Models

• Sequential language models:

predict the next word

I like to ...

Page 27: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

30Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Language Models

• Sequential language models:

predict the next word

I like to learn ...

Page 28: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

31Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Language Models

• Sequential language models:

predict the next word

I like to learn about …

Page 29: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

32Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Language Models

• Sequential language models:

predict the next word

I like to learn about machine …

Page 30: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

33Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Language Models

• Sequential language models:

predict the next word

I like to learn about machine translation .

Page 31: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

37Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Recurrent Neural Language Model

Predict

the first word

of a sentence

same as before,

just drawn top-down

<s>

the

Given word

Embedding

Hidden state

Predicted word

Page 32: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

38Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Recurrent Neural Language Model

Predict

the second word

of a sentence

Re-use hidden state

from

first word prediction

<s>

the

the

house

Given word

Embedding

Hidden state

Predicted word

Page 33: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

39Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Recurrent Neural Language Model

<s>

the

the

house

house is big .

is big . </s>

Given word

Embedding

Hidden state

Predicted word

Page 34: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

40Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Encoder Decoder Model

• We predicted the words of a sentence

• Why not also predict their translations?

Page 35: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

41Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Encoder Decoder Model

<s>

the

the

house

house is big .

is big . </s>

Given word

Embedding

Hidden state

Predicted word

</s>

das

das

Haus

Haus ist groß .

ist groß . </s>

• Obviously madness

• Proposed by Google (Sutskever et al. 2014)

Page 36: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

42Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Attention Mechanism

• What is missing?

• Alignment of source words to target words

• Solution: attention mechanism

Page 37: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

43Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Machine Translation, 2016

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 38: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

44Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Machine Translation, 2016

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

• State of the art

• Used by Google, WIPO, Systran, Omniscien…

Page 39: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

45Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Input Sentence

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 40: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

46Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Encode with Word Embeddings

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 41: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

47Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Output Sentence

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 42: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

48Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Each Word Predicted by Embedding

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 43: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

49Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Embedding Predicted from Input Context

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 44: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

50Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Input Context Selected By Word Alignment

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 45: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

51Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Input Context: Weighted Sum of Input Embeddings

Input Word

Embeddings

Left-to-Right

Recurrent NN

Right-to-Left

Recurrent NN

Alignment

Input Context

Hidden State

Output Words

Page 46: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

52Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Benefits

• Each output predicted from• encoding of the full input sentence

• all previously produced output words

• Word embeddings allow generalization• “cat” and “cats” have similar representation

• “house” and “home” have similar representations

Page 47: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

53Copyright © 2018 Omniscien Technologies. All Rights Reserved.

WMT 2016 Evaluation (News, English-German)

Neural MT

Statistical MT

Page 48: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Challenges

Page 49: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

55Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Benefits of Neural Machine Translation

• Evidence of overall better translation quality

• Ability to better generalize training data

• Better handling sentence-level context

• Better fluency

Page 50: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

56Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Machine Translaton is Data-Hungry

Phrase Based SMT with Big Language Model

BLE

U S

core

Corpus Size 10,000,000100,000

100,000,0001,000,000

1,000,000,00010,000,000

30

20

10

0

Phrase Based SMT

Neural MT

WordsSentences

Page 51: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

57Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Machine Translation Failures

Page 52: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

58Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Adequacy or Fluency?

• Language model may take over

• Output unrelated to input

Page 53: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

59Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Fluency vs. Adequacy Errors

• Input

Ich will Kuchen essen

• Fluency error (more common in SMT)

I want cake eat

• Adequacy error (more common in NMT)

I want to cook chicken

Page 54: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

60Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Limited Vocabulary

• Words are encoded in highly dimension vector

• Only allows for limited vocabulary size• words are split into subwords

• maybe even split into characters?

• fall-back to dictionaries / phrase-based models

Page 55: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

62Copyright © 2018 Omniscien Technologies. All Rights Reserved.

NMT More Susceptible to Noisy Training Data

• More harmed by• Alignment errors

• Bad language

• Wrong language on target side

• Severely harmed by un-translated source text (over-learns to copy)

• Data cleaning more important

Page 56: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

63Copyright © 2018 Omniscien Technologies. All Rights Reserved.

NMT is Worse Out-of-Domain

• In nearly all cases, SMT was better than NMT when content was out of domain.

• More data is required for NMT to meet domain specific needs

• When sufficient data is available, NMT usually will be better than NMT for typical sentences

Page 57: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

64Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Deployment Challenges for Neural MT

• Speed• training takes weeks

• decoding slower than traditional SMT

• Hardware requirements

• GPUs needed ($ 2’000 each)

• Google even has specialized hardware

• Process is not transparent

• Practically impossible to find out “why wrong?”

• Mistakes cannot be easily fixed

Page 58: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

65Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Neural Machine Translation – A Mystery?

• Decisions of statistical often hard to understand

• Neural: even harder

input MAGIC output

• New studies reveal inner workings• Attention mechanism

• Word sense disambiguation

Page 59: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

66Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Attention States

• Attention mechanism plays role of “word alignment”

• “Soft alignment”: distributed over several input words

Page 60: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

67Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Word Sense Disambiguation

Deep embedding

of the word “right”

in encoder

Page 61: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

68Copyright © 2018 Omniscien Technologies. All Rights Reserved.

NMT vs SMT: What We Know By Now

• In ideal conditions, NMT much better

• Different types of error (fluency vs. adequacy)

• NMT more susceptible to noise

• NMT less robust (out-of-domain, low-resource, etc.

=> Hybrid approach of Omniscien Technologies

Page 62: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Looking Forward

Page 63: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

71Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Attention Sequence-to-Sequence Model

• Based on recurrent neural networks

• Attention mechanism (alignment)

• Standard Approach 2015-2017

Page 64: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

72Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Deeper Models

• More layers in encoder and decoder

• Models more complex relationships between words

• Significantly higher performance

Page 65: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

73Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Google’s “Transformer” Model

• Self-attention

• Encoder: Input words inform each other

• Decoder: Attention on some previous output words

Page 66: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

74Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Facebook’s Convolutional Model

• Hierarchical (“convolutions”) instead of sequential

• Faster (but more limited context)

• In encoder and decoder

Page 67: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

75Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Synthesizing Data

• Neural machine translation trained on parallel data

• Improve with monolingual data• Back-translate target language text into source language

• Add as training data

• Can be iterated (“dual learning”)

Page 68: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

76Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Domain Adapted Models

• Various techniques explored for cutomization

• One simple effective method• Train general system on all available data

• Fine-Tune on in-domain data

Page 69: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

77Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Terminology

• Terminology, brand names with fixed translations

Der neue Neurolierer XVQ-72 ist lieferbar.

Neurolizer XVQ-72

• XML markup

Der neue <x translation=“Neurolizer XVQ-72”>

Neurolierer XVQ-72</x> ist lieferbar.

• Use attention states to detect insertion point

Page 70: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

78Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Dynamic Software Environment

• Major players released deep learning frameworks• Tensorflow (Google)

• pyTorch (Facebook)

• MX-Net (Amazon)

• Theano framework discontinued development

• Also: dedicated NMT implementations (faster)

• Quick turn-around from research into deployment

Page 71: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

79Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Hardware Developments

New GPUs from NVIDIA in 2018

• Faster, more memory

• Enable deeper models

Page 72: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent

Copyright © 2018 Omniscien Technologies. All Rights Reserved.

Facebook.com/omniscien @omniscientech Omniscien Technologies [email protected]

Research in Translation –What is Exciting and Shows Promise Ahead?

Philipp Koehn

Page 73: Philipp Koehn is Professor of Computer Science at Johns ... · Philipp Koehn Chief Scientist Omniscien Technologies Professor of Computer Science Johns Hopkins University The recent