networks long short term memory - rctn.orglong short term memory networks brian cheung ... memory...

23
Long Short Term Memory Networks Brian Cheung

Upload: others

Post on 21-May-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Long Short Term Memory Networks

Brian Cheung

Page 2: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

● Proposed by Sepp Hochreiter in 1997● Originally used approximate error gradient

with Real Time Recurrent Learning and truncated backpropagation through time

● Used for processing long range contextual information

Hochreiter & Schmidhuber 1997

LSTM

Page 3: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

LSTMs reduce vanishing gradient problem

Standard Recurrent Network LSTM Network

● The darker the shade, the greater the sensitivity● The sensitivity decays exponentially over time as new inputs overwrite the

activation of hidden unit and the network ‘forgets’ the first input

Graves et al 2013

Page 4: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Graves et al 2013

LSTMs reduce vanishing gradient problem

● Memory cells and gating units allow information to be stored for long periods of time.

● Memory cells are additive in time○ Gradients also additive in time which

alleviates vanishing gradient

Page 5: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Backpropagation through time

● Derivative of objective function, O, with respect to linear activations, a

● Gradient descent weight update

Page 6: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

xιφωchg

X

Input Gate

Forget Gate

Output Gate

Cell State

Hidden State

g

Input to the LSTM node at a particular time point (character, phoneme, word)

Anatomy of an LSTM node

Determines whether the input will be stored in the memory cell

Determines whether current contents of memory will be forgotten (erased)

Determines if current memory contents will be output

Memory Cell

Output of LSTM node

Input nonlinearity of Memory Cell (i.e. tanh, sigmoid)

Page 7: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input Gate

Cell StateForget Gate

Output Gate

X

Hidden State

Anatomy of an LSTM node

Page 8: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

**

Time: t-1 t

LSTM Forward Propagation

Page 9: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

g

1. Input Gate

Page 10: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

g

2. Forget Gate

Page 11: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

3. Cell State

Page 12: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

4. Output Gate

Page 13: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

**

5. Hidden State

Page 14: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

*

6. Output

Page 15: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

LSTM Backprop, Hidden State

Page 16: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

2. Output Gate

Page 17: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

3. Cell State

Page 18: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

4. Forget Gate

Page 19: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

5. Input Gate

Page 20: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

6. Input

Page 21: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Image Captioning

Vinyals et al 2014Donahue et al 2014

LSTM Applications

Kiros et al 2014

Page 22: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

LSTM Applications

Graves 2014

Handwriting Synthesis

Page 23: Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

LSTM Applications● Speech recognition (Graves et al 2013)● Neural Machine Translation (Sutskever et al 2014)● Neural Turing Machine (Graves et al 2014)