neural machine translation techniques used by google · a machine instead of a human google...

68
Neural Machine Translation Techniques Used By Google Sophia Mitchellette

Upload: others

Post on 21-Sep-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Machine Translation

Techniques Used By GoogleSophia Mitchellette

Page 2: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Outline

● Introduction

● Neural Networks

● Residual Connections

● Attention Network

● Summary

Page 3: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Outline

● Introduction

○ What Is Machine Translation?

○ Human Translation vs. Machine Translation

○ How Machine Translation Systems See Words

● Neural Networks

● Residual Connections

● Attention Network

● Summary

Page 4: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Introduction - What Is Machine Translation?

Machine Translation - translation from one language to another performed by

a machine instead of a human

Google Translate

Google’s Machine Translation Service

Behind the scenes: Google’s Neural Machine Translation System (GMNT)

Over 500 million people use every day

Hello! Bonjour!

Page 5: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Introduction - Human Translation

● Humans translate by finding words, sentences and phrases that have the

same meaning in both languages.

squirrel

écureuil

grenouille

chien

Same

meaning

Not the

same

meaning

Not the

same

meaning

http://www.clipartpanda.com/clipart_images/squirrel-clip-art-hd-5521910 http://clipartix.com/frog-clip-art-image-10263/

http://cliparting.com/free-dog-clipart-617/

Page 6: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Introduction - Machine Translation

● Look at statistical probabilities for the best translation

● Words and sentences are not a communication of meaning

“squirrel”

“écureuil”

“grenouille”

“chien”

96%

3%

1%

Page 7: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Introduction - How Machine Translation Systems See Words

● Words and sentences are vectors.

● Word segmentation - each word is represented as one vector

The squirrels eat the

acorns .

x31

x32

x33

x3n

x3

x11

x12

x13

x1n

x1

x21

x22

x23

x2n

x2

x41

x42

x43

x4n

x4

x51

x52

x53

x5n

x5

x61

x62

x63

x6n

x6

Page 8: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Introduction - How Machine Translation Systems See Words

● The points on the coordinate plane represent two-dimensional word vectors.

0.8

-0.3

Page 9: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Outline

● Introduction

● Neural Networks

○ An Overview

○ The Structure of a Node

○ A Neural Network

○ Activation Functions

○ Training a Neural Network

○ Training Error vs. Testing Error

● Residual Connections

● Attention Network

● Summary

Page 10: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - An Overview

● Nodes - the building blocks of

neural networks.

● Nodes map inputs to outputs.

● mapping = function

Pictured right: A neural network.

Each gray circle is a node. Gray

squares are inputs.

Hidden layersInput layer

Output

layer

Page 11: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - The Structure of a Node

Steps of a node:

1. Inputs are multiplied by their weights.

2. Weighted inputs are summed.

3. Summation is run through activation function.

4. Result of activation function is the output.

x1

xn

x2

Inputs

f y

Activation

Function Output

...y = f(x1*w1 + x2*w2 + … + xn*wn)

w1

w2

wn

Page 12: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Activation Functions

● Need non-linear activation functions for more complex data patterns.

● Hyperbolic tangent (tanh)

and sigmoid (σ) scale

values.

Page 13: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - A Neural Network

● Our neural network will map from the English words, to their French counterparts.

Page 14: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - A Neural Network

● Network has two nodes

● Takes in English word

● Outputs French word

y1

y2

tanh(x1*w1)

tanh(x2*w2)

x2

y-axisy2

x1

x-axisy1

tanh

tanh

w1

w2

English word vector French word vector

Page 15: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - A Neural Network

y1

y2

tanh(x1*w1)

tanh(x2*w2)

x2

y-axisy2

x1

x-axisy1

tanh

tanh

w1?

w2?

● Network has two nodes

● Takes in English word

● Outputs French word

English word vector French word vector

Page 16: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Training a Neural Network

● Neural network is given examples of what the output should be for given inputs.

English Word English Word Vector French Word Vector French Word

squirrel0.2

0.2

0.3

-0.4écureuils

acorns0.6

-0.4

0.9

0.8glands

eat-0.2

0.4

-0.3

-0.8mangent

the-0.4

-0.4

-0.6

0.8les

Inputs Desired Outputs

Page 17: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Training a Neural Network

● Goal: inputs map to desired vectors

● Network starts with initial weights

● Error evaluated. New weights tried.

● Iteration - an attempt

Iteration w1 w2

1

2

3

4

Page 18: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Training a Neural Network

Iteration w1 w2

1 1.059 1.059

2

3

4

● Goal: inputs map to desired vectors

● Network starts with initial weights

● Error evaluated. New weights tried.

● Iteration - an attempt

Page 19: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Training a Neural Network

Iteration w1 w2

1 1.059 1.059

2 1 -1.059

3

4

● Goal: inputs map to desired vectors

● Network starts with initial weights

● Error evaluated. New weights tried.

● Iteration - an attempt

Page 20: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Training a Neural Network

Iteration w1 w2

1 1.059 1.059

2 1 -1.059

3 1.252 -2.222

4

● Goal: inputs map to desired vectors

● Network starts with initial weights

● Error evaluated. New weights tried.

● Iteration - an attempt

Page 21: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Training a Neural Network

Iteration w1 w2

1 1.059 1.059

2 1 -1.059

3 1.252 -2.222

4 1.683 -2.515

● Goal: inputs map to desired vectors

● Network starts with initial weights

● Error evaluated. New weights tried.

● Iteration - an attempt

Page 22: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Testing Error vs. Training Error

● Training error - the network’s

error when run over the data it

was trained on.

English WordEnglish Word

Vector

French Word

VectorFrench Word

squirrel0.2

0.2

0.3

-0.4écureuils

acorns0.6

-0.4

0.9

0.8glands

eat-0.2

0.4

-0.3

-0.8mangent

the-0.4

-0.4

-0.6

0.8les

Page 23: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Neural Networks - Testing Error vs. Training Error

● Testing error - the network’s

error when run over new and

unfamiliar data.

English WordEnglish Word

Vector

French Word

VectorFrench Word

frogs0.6

0.2

0.9

-0.3grenouilles

loudly0.6

-0.3

0.9

0.6bruyamment

croak0

-0.4

0

-0.8croassent

two-0.2

-0.2

-0.3

0.1deux

Page 24: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Outline

● Introduction

● Neural Networks

● Residual Connections

○ Definitions

○ Multi-layer networks

○ Difficulty of Training Deep Neural Networks

○ Plain vs. Residual Mapping

○ Evaluations

● Attention Network

● Summary

Page 25: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Plain vs. Residually Connected Networks

● Plain Neural Network - does not have residual connections

● Residually Connected Neural Network - has residual connections

Page 26: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Multi-layer networks

x2

y2

x1

y1

tanh

tanh

w1

w2

Page 27: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Multi-layer networks

x2

y2

x1

y1

tanh

tanh

w1

w2

Network Unit

Page 28: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Network Unit

Residual Connections - Multi-layer networks

x yInput layer Output

Output layer

Page 29: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Network Unit

Residual Connections - Multi-layer networks

x

y

Input layer

Output

● In a single-layer network:

○ output layer input = input layer

○ output layer output = network output

Output layer

Page 30: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Multi-layer networks

x0

y

Input layer

Output

● In a plain multi-layer network:

○ 1st hidden layer input = input layer

○ 1st h. layer output = 2nd h. layer input

○ etc...

○ output layer output = network output

Hidden Layer 1 Network Unit

x1

Network Unit

x2

Network Unit

Hidden Layer 2

Output layer

Page 31: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - The Difficulty of Training Plain DNN

● Deep Neural Networks (DNN)

○ More than one hidden layer

○ More accurate

○ Can account for more

complex data patterns.

● Plain DNNs are more difficult to

train.

● Residually connected DNNs have

residual connections to help with

training.

y

Input vector

Output vector

Network Unit

Network Unit

Network Unit

Network Unit

xn-1

...

Hidden Layer 1

Hidden Layer 2

Hidden Layer 3

Output layer

x0

x1

x2

Page 32: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - The Difficulty of Training Plain DNN

● He et al.’s (2015) testing revealed that plain DNN suffered from higher

training error, and as a result, a higher testing error.

Training error Testing error

20% -

10% -

0% -

20% -

10% -

0% -

Page 33: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - The Difficulty of Training Deep Neural Networks

y

Input vector

Output vector

● Plain DNN are more difficult to train

because:

○ Different parts of output lose accuracy

○ Difficult to achieve identity mappings

○ Identity mapping - output = input

Network Unit

Network Unit

Network Unit

Network Unit

xn-1

...

Hidden Layer 1

Hidden Layer 2

Hidden Layer 3

Output layer

x0

x1

x2

Page 34: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Plain vs. Residual Mappings

● Perfectly retyping the original paragraph = identity mapping.

Parks are lovely

places to feed the

birds. Some people

have picnics in parks.

Frisbee is a common

activity to play in the

park. If the park has a

hill, then people might

go sledding in the

winter season.

Parks are lvely

places to feed the

birds. Some people

have picnics in parks.

Frisbeee is a common

activity to play in the

park. If the park has a

hill, then peeple might

go sledding in the

winter season.

Original paragraph Re-typed paragraph/

Plain Neural Network

Page 35: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - The Difficulty of Training Deep Neural Networks

● Example:

○ x-axis value is perfect

○ y-value needs improvement

x

y

y

Input vector

Output vector

Network Unit

Network Unit

Network Unit

Network Unit

xn-1

...

Hidden Layer 1

Hidden Layer 2

Hidden Layer 3

Output layer

x0

x1

x2

Page 36: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - The Difficulty of Training Deep Neural Networks

● Example:

○ x-axis value is perfect

○ y-value needs improvement

○ Goal: Maintain x, improve y

x

y

x

y

y

Input vector

Output vector

Network Unit

Network Unit

Network Unit

Network Unit

xn-1

...

Hidden Layer 1

Hidden Layer 2

Hidden Layer 3

Output layer

x0

x1

x2

Page 37: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - The Difficulty of Training Deep Neural Networks

● Example:

○ x-axis value is perfect

○ y-value needs improvement

○ Goal: Maintain x, improve y

○ Reality: x degrades, y improves

x

y

x

y

y

Input vector

Output vector

Network Unit

Network Unit

Network Unit

Network Unit

xn-1

...

Hidden Layer 1

Hidden Layer 2

Hidden Layer 3

Output layer

x0

x1

x2

Page 38: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Plain vs. Residual Mappings

● More room for error when re-typing a paragraph than when copy-and-pasting

Parks are lovely

places to feed the

birds. Some people

have picnics in parks.

Frisbee is a common

activity to play in the

park. If the park has a

hill, then people might

go sledding in the

winter season.

Parks are lvely

places to feed the

birds. Some people

have picnics in parks.

Frisbe is a common

activity to play in the

park. If the park has a

hill, then peeple might

go sledding in the

winter season.

Parks are lovely

places to feed the

birds. Some people

have picnics in parks.

Frisbee is a common

activity to play in the

park. If the park has a

hill, then people might

go sledding in the

winter season.

Original paragraph Re-typed paragraph/

Plain Neural Network

Copy-and-pasted paragraph/

Residually Connected Network

Page 39: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Structure

Residual Connections

● layer i mapping: xi - xi-1

● xi = xi-1 when weights = 0

Network Unit

xi-1

Network Unit

xi

mi

Network Unit

xi-1

Network Unit

xi

With Residual Connection Without Residual Connection

Layer i + 1

Layer i

Residual

Connection

Page 40: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Structure

Network Unit

xi-1

Network Unit

xi

mi

Network Unit

xi-1

Network Unit

xi

With Residual Connection Without Residual Connection

Layer i + 1

Layer i

Residual

Connection

Residual Connections

● layer i mapping: xi - xi-1

● xi = xi-1 when weights = 0

Retyping

← Result →

Page 41: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Structure

Network Unit

xi-1

Network Unit

xi

mi

Network Unit

xi-1

Network Unit

xi

With Residual Connection Without Residual Connection

Layer i + 1

Layer i

Retyping

← Result →

Residual

Connection

Residual Connections

● layer i mapping: xi - xi-1

● xi = xi-1 when weights = 0

← Copy

Page 42: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Structure

Network Unit

xi-1

Network Unit

xi

mi

Network Unit

xi-1

Network Unit

xi

With Residual Connection Without Residual Connection

Layer i + 1

Layer i

← Copy

← Paste

Retyping

← Result →

Residual

Connection

Residual Connections

● layer i mapping: xi - xi-1

● xi = xi-1 when weights = 0

Page 43: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Structure

Network Unit

xi-1

Network Unit

xi

mi

Network Unit

xi-1

Network Unit

xi

With Residual Connection Without Residual Connection

Layer i + 1

Layer i

Edits → Residual

Connection

Residual Connections

● layer i mapping: xi - xi-1

● xi = xi-1 when weights = 0

← Copy

← Paste

Retyping

← Result →

Page 44: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Structure

Network Unit

xi-1

Network Unit

xi

mi

Network Unit

xi-1

Network Unit

xi

With Residual Connection Without Residual Connection

Layer i + 1

Layer i

Edits → Retyping

← Result →

Residual

Connection

Residual Connections

● layer i mapping: xi - xi-1

● xi = xi-1 when weights = 0

← Copy

← Paste

Page 45: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Evaluations

● Google comparisons not

available.

● He et al. (2015) compared

four networks:○ an 18-layer plain network

○ a 34-layer plain network

○ an 18-layer residual network

○ and a 34-layer residual network

18-layer plain 18-layer residual

34-layer plain 34-layer residual

Page 46: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Residual Connections - Evaluations

Testing Error plain residual

18 layers 27.94% 27.88%

34 layers 28.54% 25.03%

60%

50%

40%

30%

20%

60%

50%

40%

30%

20%

Plain Residually Connected

He et al. (2015)

Page 47: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Outline

● Introduction

● Neural Networks

● Residual Connections

● Attention Network

○ Recurrent Neural Networks

○ Encoder-Decoder Model

○ Attention Mechanism

○ Evaluations

● Summary

Page 48: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Recurrent Neural Networks

● Recurrent Neural Networks (RNN) loop over the same unit multiple times.○ Each iteration of the loop is known as a time step.

“The squirrels

eat the acorns.”

“Les écureuils

mangent les

glands.”

RNN

“The”

“Les”

RNN1

“squirrels”

“écureuils”

“eat”

“mangent”

“the”

“les”

“acorns”

“glands”

RNN2 RNN3 RNN4 RNN5

“.”

“.”

RNN5

Page 49: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Recurrent Neural Networks

● Bidirectional RNN - information travels forwards and backwards, giving words

the context of words before and after them.

“The squirrels

eat the acorns.”

“Les écureuils

mangent les

glands.”

RNN

“The”

“Les”

RNN1

“squirrels”

“écureuils”

“eat”

“mangent”

“the”

“les”

“acorns”

“glands”

RNN2 RNN3 RNN4 RNN5

“.”

“.”

RNN5

Page 50: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Encoder-Decoder

● Two RNN

● Encoder network - produces

context vector

● context vector - contains sentence

information

● Decoder network - produces

translated sentence

The squirrels eat the acorns.

Les écureuils mangent les glands.

[c1,c2,c3,...,cn]

Encoder Network

Decoder Network

Context

vector

Page 51: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Encoder-Decoder

● Issues with longer sentences

● Context vector is a fixed size

The squirrels eat the acorns.

Les écureuils mangent les glands.

Encoder Network

Decoder Network

Context

vector

[c1,c2,c3,...,cn]

Page 52: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Encoder-Decoder

An admitting privilege is the right of a doctor to admit a patient to a hospital or a medical centre to carry out a diagnosis

or a procedure, based on his status as a health care worker at a hospital.

Un privilège d’admission est le droit d’un médecin de reconnaître un patient à l’hôpital ou un centre médical d’un

diagnostic ou de prendre un diagnostic en fonction de son état de santé.

[c1,c2,c3,...,cn]

Encoder Network

Decoder Network

Context

vector

● Issues with longer sentences

● Context vector is a fixed size

Page 53: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Translation Comparison

An admitting privilege is the right of a doctor to admit a patient to a hospital or a medical centre to carry

out a diagnosis or a procedure, based on his status as a health care worker at a hospital.

Input Sentence

Encoder-Decoder Translation

Attention Network Translation

Un privilège d’admission est le droit d’un médecin de reconnaître un patient à l’hôpital ou un centre

médical d’un diagnostic ou de prendre un diagnostic en fonction de son état de santé.

Un privilège d’admission est le droit d’un médecin d’admettre un patient à un hôpital ou un centre médical

pour effectuer un diagnostic ou une procédure, selon son statut de travailleur des soins de santé à l’hôpital.

[based on his state of health]

[Examples from Bahdanau et al. (2015)]

Page 54: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Addition of an Attention Mechanism

“The squirrels eat the acorns.”

“Les écureuils mangent les glands.”

Encoder Network

Decoder Network

Context vector

at time step t

h5h3h2h1 h4

Hidden state

vectors

ct

Attention

Network

w1w2 w3 w4 w5

● Attention network model extends

encoder-decoder model.

● Three networks:

○ Encoder Network

○ Attention Network

○ Decoder Network

Page 55: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

“Les écureuils mangent les glands.”

Decoder Network

Context vector

at time step tct

Attention

Network

w1w2 w3 w4 w5

Attention Networks - Addition of an Attention Mechanism

● Encoder produces hidden state

vectors.

● Hidden state vectors

○ One for each input word

○ Includes information about the whole

input sentence

○ Focuses strongly on what surrounds its

word

“The squirrels eat the acorns.”

Encoder Network

h5h3h2h1 h4

Hidden state

vectors

Page 56: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

“Les écureuils mangent les glands.”

Decoder Network

Context vector

at time step tct

Attention

Network

w1w2 w3 w4 w5

Attention Networks - Addition of an Attention Mechanism

“The squirrels eat the acorns.”

Encoder Network

h5h3h2h1 h4

Hidden state

vectors

● Encoder produces hidden state

vectors.

● Hidden state vectors

○ One for each input word

○ Includes information about the whole

input sentence

○ Focuses strongly on what surrounds its

word

Page 57: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

“The squirrels eat the acorns.”

Encoder Network

h5h3h2h1 h4

Hidden state

vectors

Attention

Network

w1w2 w3 w4 w5

Attention Networks - Addition of an Attention Mechanism

● Context vector for each output word

● Decoder uses tth context vector to

translate tth word

“Les écureuils mangent les glands.”

Decoder Network

Context vector

at time step tct

Page 58: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

“The squirrels eat the acorns.”

Encoder Network

h5h3h2h1 h4

Hidden state

vectors

Attention

Network

w1w2 w3 w4 w5

Attention Networks - Addition of an Attention Mechanism

“Les écureuils mangent les glands.”

Decoder Network

Context vector

at time step 3c3

● Context vector for each output word

● Decoder uses tth context vector to

translate tth word

Page 59: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention

Network

“The squirrels eat the acorns.”

“Les écureuils mangent les glands.”

Encoder Network

Attention Networks - Addition of an Attention Mechanism

Hidden state

vectors

● tth context vector = sum of the

weighted hidden state vectors.

● Weights are recalculated for each

time step.

○ determine strength of input word’s

effect on output word.

Decoder Network

Context vector

at time step t

h5h3h2h1 h4

ct

w1w2 w3 w4 w5

Page 60: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Mechanisms - Word Influence

The squirrels eat the

acorns .

Les écureuils mangent les

glands .

Page 61: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

“The squirrels eat the acorns.”

“Les écureuils mangent les glands.”

Encoder Network

Attention Networks - Addition of an Attention Mechanism

● Attention mechanism

○ Gives decoder more information for

longer sentences, and less information

for shorter sentences

● Attention Network

○ Determines the weights of the hidden

state vectors

○ Takes in information from decoder and

hidden state vectors.Decoder Network

Context vector

at time step t

h5h3h2h1 h4

Hidden state

vectors

ct

Attention

Network

w1w2 w3 w4 w5

Page 62: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Networks - Evaluations

● Google comparisons not available.

● Bahdanau et al. (2015) evaluated an encoder-decoder model against an

attention network model using the ACL WMT ‘14 dataset.

● Higher BLEU scores mean the translation is closer to human translation.

Model BLEU Score

RNNencdec-30 13.93

RNNencdec-50 17.82

RNNsearch-30 21.50

RNNsearch-50 26.75

Encoder-Decoder

Model

Attention

Network Model

Page 63: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Summary

Network Unit

xi-1

Network Unit

xi

mi

Residual Connections

Layer i + 1

Layer i

Residual

Connection

“The squirrels eat the acorns.”

“Les écureuils mangent les glands.”

Encoder Network

Decoder Network

h5h3h2h1 h4

ct

Attention

Network

w1w2 w3 w4 w5

Attention Network Model

Page 64: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Acknowledgements

Thank you for your time and attention!

Thank you to my advisor Elena Machkasova, KK Lamberty

and Mitchell Finzel for your guidance and feedback.

Page 65: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

References

[1] D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation by jointly learning to align and

translate.IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.IEEE

Conference on Computer Vision and Pattern Recognition, 2015.

[3] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi,W. Macherey, M. Krikun, Y. Cao, Q.

Gao,K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo,H.

Kazawa, K. Stevens, G. Kurian, N. Patil,W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick,O. Vinyals,

G. Corrado, M. Hughes, and J. Dean.Google’s neural machine translation system: Bridging the gap

between human and Machine Translation. arXiv preprint arXiv:1609.08144, 2016.

[4] M. Sundermeyer, H. Ney, and R. Schluter. From feedforward to recurrent lstm neural

networks for language modeling.IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE

PROCESSING, VOL. 23, NO. 3, MARCH 2015, 2015.

Page 66: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Questions?

Page 67: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine
Page 68: Neural Machine Translation Techniques Used By Google · a machine instead of a human Google Translate Google’s Machine Translation Service Behind the scenes: Google’s Neural Machine

Attention Mechanisms - Word Influence

English Sentence:

The girls played tag and then they became tired.

Incorrect French Translation:

Les filles ont joué une étiquette et ensuite ils sont devenus fatigués.

Correct French Translation:

Les filles ont joué une étiquette et ensuite elles sont devenues fatiguées.