fundamentals of machine learning for machine translation · fundamentals of machine learning for...

28
Fundamentals of Machine Learning for Machine Translation Dr. John D. Kelleher ADAPT Centre for Digital Content Technology 1 Dublin Institute of Technology, Ireland Translating Europe Forum 2016 Brussels, Charlemagne Building, Rue de la Loi 170 27 th October 2016 1 The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund 1 / 28

Upload: others

Post on 11-Oct-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Fundamentals of Machine Learningfor Machine Translation

Dr. John D. KelleherADAPT Centre for Digital Content Technology1

Dublin Institute of Technology, Ireland

Translating Europe Forum 2016Brussels, Charlemagne Building, Rue de la Loi 170

27th October 2016

1The ADAPT Centre is funded under the SFI Research Centres Programme(Grant 13/RC/2106) and is co-funded under the European RegionalDevelopment Fund

1 / 28

Page 2: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Outline

Basic Building Blocks: Neurons

What is a Neural Network?

Word Embeddings

Recurrent Neural Networks

Encoders

Decoders (Language Models)

Neural Machine Translation

Conclusions

Page 3: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

What is a function?

A function maps a set of inputs (numbers) to an output (number)

sum(2, 5, 4)→ 11

3 / 28

Page 4: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

What is a weightedSum function?

weightedSum([n1, n2, . . . , nm]︸ ︷︷ ︸Input Numbers

, [w1,w2, . . . ,wm]︸ ︷︷ ︸Weights

)

= (n1 × w1) + (n2 × w2) + · · ·+ (nm × wm)

weightedSum([3, 9], [−3, 1])

= (3×−3) + (9× 1)

= −9 + 9

= 0

4 / 28

Page 5: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

What is an activation function?

An activation function takes the output of ourweightedSum function and applies another mapping to it.

−10 −5 0 5 10

0.00

0.25

0.50

0.75

1.00

x

logistic(x)

5 / 28

Page 6: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

What is an activation function?

activation =

logistic(weightedSum(([n1, n2, . . . , nm]︸ ︷︷ ︸Input Numbers

, [w1,w2, . . . ,wm]︸ ︷︷ ︸Weights

))

logistic(weightedSum([3, 9], [−3, 1]))

= logistic((3×−3) + (9× 1))

= logistic(−9 + 9)

= logistic(0)

= 0.5

6 / 28

Page 7: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

What is a Neuron?

The simple list of operations that we have just described definesthe fundamental building block of a neural network: the Neuron.

Neuron =

activation(weightedSum(([n1, n2, . . . , nm]︸ ︷︷ ︸Input Numbers

, [w1,w2, . . . ,wm]︸ ︷︷ ︸Weights

))

7 / 28

Page 8: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

What is a Neural Network?

HiddenLayer

OutputLayer

InputLayer

I1

H2

H3

H1

I2

O1

O2

8 / 28

Page 9: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Where do the weights come from?

OutputLayer

HiddenLayer

InputLayer

I1

H1w1

H2

w2

H3

w3I2

w4

w5

w6

O1

w7

O2

w10

w8

w11

w9

w12

9 / 28

Page 10: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Training a Neural Network

I We train a neural network by iteratively updating the weights

I We start by randomly assigning weights to each edge

I We then show the network examples of inputs and expectedoutputs and update the weights using Backpropogationso that the network outputs match the expected outputs

I We keep updating the weights until the network is workingthe way we want

10 / 28

Page 11: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Word Embeddings

Each word is represented by a vector of numbers that positions theword in a multi-dimensional space, e.g.:

king =< 55,−10, 176, 27 >

man =< 10, 79, 150, 83 >

woman =< 15, 74, 159, 106 >

queen =< 60,−15, 185, 50 >

11 / 28

Page 12: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Word Embeddings

vec(King)− vec(Man) + vec(Woman) ≈ vec(Queen)2

2Linguistic Regularities in Continuous Space Word Representations (Mikolov et al., 2013)

12 / 28

Page 13: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

A particular type of neural network that is useful for processingsequential data (such as, language) is a Recurrent NeuralNetwork.

Using an RNN we process our sequential data one input at a time.

In an RNN the outputs of some of the neurons for one input arefeed back into the network as part the next input.

13 / 28

Page 14: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

InputLayer

OutputLayer

MemoryBuffer

HiddenLayerI1

H1

H2

H3

M1

M2

M3

O1

14 / 28

Page 15: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

HiddenLayer

MemoryBuffer

OutputLayer

InputLayer

I1

H2

H3

H1

M2

M3

M1

O1

15 / 28

Page 16: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

OutputLayer

HiddenLayer

MemoryBuffer

InputLayer

I1

H2

H3

H1

M2

M3

M1

O1

16 / 28

Page 17: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

HiddenLayer

OutputLayer

InputLayer

MemoryBuffer

I1

H2

H3

H1

M2

M3

M1

O1

17 / 28

Page 18: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

Output

Hidden

InputMemory

Figure: Recurrent Neural Network

yt

ht

xtht−1

Figure: Recurrent Neural Network

18 / 28

Page 19: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

Output:

Input:

y1 y2 y3 yt yt+1

h1 h2 h3 · · · ht ht+1

x1 x2 x3 xt xt+1

0 / 0

Figure: RNN Unrolled Through Time

19 / 28

Page 20: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Recurrent Neural Networks

1. RNN Encoders

2. RNN Language Models

20 / 28

Page 21: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Encoders

Encoding:

Input:

h1 h2 · · · hm C

Word1 Word2 Wordm < eos >

0 / 0

Figure: Using an RNN to Generate an Encoding of a Word Sequence

21 / 28

Page 22: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Language Models

Output:

Input:

⇤Word2 ⇤Word3 ⇤Word4 ⇤Wordt+1

h1 h2 h3 · · · ht

Word1 Word2 Word3 Wordt

0 / 0

Figure: RNN Language Model Unrolled Through Time

22 / 28

Page 23: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Decoder

Output:

Input:

⇤Word2 ⇤Word3 ⇤Word4 · · · ⇤Wordt+1

h1 h2 h3 · · · ht

Word1

0 / 0

Figure: Using an RNN Language Model to Generate (Hallucinate) aWord Sequence

23 / 28

Page 24: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Encoder-Decoder Architecture

Decoder

Encoder

Target1 Target2 · · · < eos >

h1 h2 · · · C d1 · · · dn

Source1 Source2 · · · < eos >

0 / 0

Figure: Sequence to Sequence Translation using an Encoder-DecoderArchitecture

24 / 28

Page 25: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Neural Machine Translation

Decoder

Encoder

Life is beautiful < eos >

h1 h2 h3 h4 C d1 d2 d3

belle est vie La < eos >

0 / 0

Figure: Example Translation using an Encoder-Decoder Architecture

25 / 28

Page 26: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Conclusions

I An advantage of the encoder-decoder architecture is thatthe system processes the entire input before it startstranslating

I This means that the decoder can use what it has alreadygenerated and the entire source sentence when generating thenext word in the translation

26 / 28

Page 27: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Conclusions

I There is ongoing research on what is the best way to presentthe source sentence to the encoder

I There is also ongoing research on giving the decoder theability to attend to different parts of the input duringtranslation

I There is also interesting work on improving how these systemshandle idiomatic language

27 / 28

Page 28: Fundamentals of Machine Learning for Machine Translation · Fundamentals of Machine Learning for Machine Translation Author: Dr. John D. Kelleher ADAPT Centre for Digital Content

Thank you for your attention

[email protected]

@johndkelleher

www.comp.dit.ie/jkelleher

www.machinelearningbook.com

Acknowledgements: The ADAPT Centre is funded under the SFI Research Centres

Programme (Grant 13/RC/2106) and is co-funded under the European Regional

Development Fund.