fundamentals of machine learning for machine translation · fundamentals of machine learning for...

Fundamentals of Machine Learningfor Machine Translation

Dr. John D. KelleherADAPT Centre for Digital Content Technology1

Dublin Institute of Technology, Ireland

Translating Europe Forum 2016Brussels, Charlemagne Building, Rue de la Loi 170

27th October 2016

1The ADAPT Centre is funded under the SFI Research Centres Programme(Grant 13/RC/2106) and is co-funded under the European RegionalDevelopment Fund

1 / 28

Outline

Basic Building Blocks: Neurons

What is a Neural Network?

Word Embeddings

Recurrent Neural Networks

Encoders

Decoders (Language Models)

Neural Machine Translation

Conclusions

What is a function?

A function maps a set of inputs (numbers) to an output (number)

sum(2, 5, 4)→ 11

3 / 28

What is a weightedSum function?

weightedSum([n1, n2, . . . , nm]︸︷︷︸Input Numbers

, [w1,w2, . . . ,wm]︸︷︷︸Weights

= (n1 × w1) + (n2 × w2) + · · ·+ (nm × wm)

weightedSum([3, 9], [−3, 1])

= (3×−3) + (9× 1)

= −9 + 9

4 / 28

What is an activation function?

An activation function takes the output of ourweightedSum function and applies another mapping to it.

−10 −5 0 5 10

logistic(x)

5 / 28

What is an activation function?

activation =

logistic(weightedSum(([n1, n2, . . . , nm]︸︷︷︸Input Numbers

, [w1,w2, . . . ,wm]︸︷︷︸Weights

logistic(weightedSum([3, 9], [−3, 1]))

= logistic((3×−3) + (9× 1))

= logistic(−9 + 9)

= logistic(0)

6 / 28

What is a Neuron?

The simple list of operations that we have just described definesthe fundamental building block of a neural network: the Neuron.

Neuron =

activation(weightedSum(([n1, n2, . . . , nm]︸︷︷︸Input Numbers

, [w1,w2, . . . ,wm]︸︷︷︸Weights

7 / 28

What is a Neural Network?

HiddenLayer

OutputLayer

InputLayer

8 / 28

Where do the weights come from?

OutputLayer

HiddenLayer

InputLayer

9 / 28

Training a Neural Network

I We train a neural network by iteratively updating the weights

I We start by randomly assigning weights to each edge

I We then show the network examples of inputs and expectedoutputs and update the weights using Backpropogationso that the network outputs match the expected outputs

I We keep updating the weights until the network is workingthe way we want

10 / 28

Word Embeddings

Each word is represented by a vector of numbers that positions theword in a multi-dimensional space, e.g.:

king =< 55,−10, 176, 27 >

man =< 10, 79, 150, 83 >

woman =< 15, 74, 159, 106 >

queen =< 60,−15, 185, 50 >

11 / 28

Word Embeddings

vec(King)− vec(Man) + vec(Woman) ≈ vec(Queen)2

2Linguistic Regularities in Continuous Space Word Representations (Mikolov et al., 2013)

12 / 28

A particular type of neural network that is useful for processingsequential data (such as, language) is a Recurrent NeuralNetwork.

Using an RNN we process our sequential data one input at a time.

In an RNN the outputs of some of the neurons for one input arefeed back into the network as part the next input.

13 / 28

InputLayer

OutputLayer

MemoryBuffer

HiddenLayerI1

14 / 28

HiddenLayer

MemoryBuffer

OutputLayer

InputLayer

15 / 28

OutputLayer

HiddenLayer

MemoryBuffer

InputLayer

16 / 28

HiddenLayer

OutputLayer

InputLayer

MemoryBuffer

17 / 28

Output

Hidden

InputMemory

Figure: Recurrent Neural Network

xtht−1

Figure: Recurrent Neural Network

18 / 28

Output:

Input:

y1 y2 y3 yt yt+1

h1 h2 h3 · · · ht ht+1

x1 x2 x3 xt xt+1

Figure: RNN Unrolled Through Time

19 / 28

1. RNN Encoders

2. RNN Language Models

20 / 28

Encoders

Encoding:

Input:

h1 h2 · · · hm C

Word1 Word2 Wordm < eos >

Figure: Using an RNN to Generate an Encoding of a Word Sequence

21 / 28

Language Models

Output:

Input:

⇤Word2 ⇤Word3 ⇤Word4 ⇤Wordt+1

h1 h2 h3 · · · ht

Word1 Word2 Word3 Wordt

Figure: RNN Language Model Unrolled Through Time

22 / 28

Decoder

Output:

Input:

⇤Word2 ⇤Word3 ⇤Word4 · · · ⇤Wordt+1

h1 h2 h3 · · · ht

Figure: Using an RNN Language Model to Generate (Hallucinate) aWord Sequence

23 / 28

Encoder-Decoder Architecture

Decoder

Encoder

Target1 Target2 · · · < eos >

h1 h2 · · · C d1 · · · dn

Source1 Source2 · · · < eos >

Figure: Sequence to Sequence Translation using an Encoder-DecoderArchitecture

24 / 28

Neural Machine Translation

Decoder

Encoder

Life is beautiful < eos >

h1 h2 h3 h4 C d1 d2 d3

belle est vie La < eos >

Figure: Example Translation using an Encoder-Decoder Architecture

25 / 28

Conclusions

I An advantage of the encoder-decoder architecture is thatthe system processes the entire input before it startstranslating

I This means that the decoder can use what it has alreadygenerated and the entire source sentence when generating thenext word in the translation

26 / 28

Conclusions

I There is ongoing research on what is the best way to presentthe source sentence to the encoder

I There is also ongoing research on giving the decoder theability to attend to different parts of the input duringtranslation

I There is also interesting work on improving how these systemshandle idiomatic language

27 / 28

Thank you for your attention

john.d.kelleher@dit.ie

@johndkelleher

www.comp.dit.ie/jkelleher

www.machinelearningbook.com

Acknowledgements: The ADAPT Centre is funded under the SFI Research Centres

Programme (Grant 13/RC/2106) and is co-funded under the European Regional

Development Fund.

fundamentals of machine learning for machine translation · fundamentals of machine learning for...

Documents

y-mt your machine translation yamagata machine...

machine translation- 3

machine translation vs human translation

machine translation - ucoz

machine translation speech translation

february 2006machine translation ii.21 postgraduate diploma...

taus machine translation showcase, machine translation at...

fundamentals of machine learning for neural machine...

machine translation overview

machine translation blunders

neural machine translation

machine translation

statistical machine translation - lxmls...

learning machine translation

statistical machine translation -...

machine translation - umd

applications 1: machine translation ii: … · 1...

machine translation solutions

english translation- machine translation

machine fundamentals