deep learning with theano (with a case study) - yahoo … · liangliang cao 1 deep learning with...

33
Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Upload: vannhan

Post on 26-Jun-2018

251 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao 1

Deep Learning with Theano (with a case study)

Liangliang “Lyon” Cao

Yahoo! Labs

Page 2: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Outline  

•  An abstract view of deep networks

•  Implementing deep networks with Theano

•  A case study: synonym extraction with neural network

2

Page 3: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao 3

What  is  deep  learning?  

Page 4: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

An  abstract  view  of  deep  network  

•  Estimate the output

4

•  Compute the loss function

•  Compute the gradient

C = Loss(o5 , y)

L5

L4

L3

L2

L1

x

o5 = L5( L4( L3( L2( L1(x) ) ) ) )

o1 = L1(x) o2 = L2(L1 (x))

Page 5: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

An  abstract  view  of  deep  network  (2)  

•  Estimate the output (Forward propagation)

5

•  Compute the gradient (Backward propagation)

o5 = L5( L4( L3( L2( L1(x) ) ) ) )

Page 6: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

An  abstract  view  of  deep  network  (3)  

•  Suppose a layer is in the form of

•  We can compute the gradients s.t. parameters

•  Updating parameters by gradient descent

6

Page 7: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

An  abstract  view  of  deep  network  (Summary)  

7

•  There are many ways to define layers and cost functions

•  Layer definitions may differ from field to field –  Computer vision –  NLP –  Speech –  …

•  But there are only three key steps in deep network

L5

L4

L3

L2

L1

x

Page 8: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

An  abstract  view  of  deep  network  (Summary)  

8

1.  Forward propagation

2.  Backward propagation

o5 = L5( L4( L3( L2( L1(x) ) ) ) )

3.  Updating

L5

L4

L3

L2

L1

x

Page 9: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Gradient  descent  is  hard  for  large  scale  learning  

Very often, a machine learning model with parameter w aims to minimize

when N is big, we can see that •  The gradient becomes very expensive. •  Even worse, we may not be able to load all (xi, yi) in to

memory!

9

Page 10: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

StochasBc  Gradient  Descent  (SGD)  

Idea: estimate the gradient on a randomly picked sample

•  Gradient descent

•  Stochastic gradient descent

Theoretical requirement for convergence:

10

in deep learning practice we just choose a small rate and then decrease it

Page 11: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

SGD  as  a  typical  deep  learning  solver  

11

For every layer, compute the gradient and update.

Page 12: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

SGD  and  GPUs  

12

For every layer, compute the gradient and update.

•  Within every batch, SGD is mainly matrix multiplication: perfect task for GPU!

•  See a demo on how much a GPU can help

Page 13: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao 13

How  to  implement  deep  models?  

Page 14: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Theano  

•  Developed at Univ. Montreal (Yoshua Bengio’s group)

•  Users write in a language similar to Numpy, => Theano compiled them into C/CUDAC.

•  Use to be relatively slow but now catch up.

•  Easy to use: –  Easiest toolkit to compute gradient –  No need to touch the details of GPU programming

•  Limited to single machines

14

Page 15: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Theano  for  Symbolic  Math  

15

Example: Sigmoid function

import theano.tensor as T x = T.scalar() y = T.scalar()

z = 3*x + 4y +5 s = 1/(1+T.exp(z))

Page 16: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Theano  for  CompuBng  Gradients  

16

Example:

import theano.tensor as T x = T.scalar() gx = T.grad(x**2, x)

gx2 = T.grad(T.log(x), x) gx3 = T.grad(1/(x), x)

T.grad() is the most amazing function in theano.

Page 17: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Theano  for  SGD    

17

W = theano.shared(value=np.zeros((n_in, n_out)), name='W’)

b = theano.shared(value=np.zeros((n_out,)), name = ‘b’)

cost = hinge_loss(W*x+b, y)

g_W = T.grad(cost, W)

g_b = T.grad(cost, b)

updates_FP = [(W, W - learning_rate * g_W), (b, b - learning_rate * g_b)]

train_model = theano.function(inputs=[x,y], outputs=cost_func, updates=updates_FP)

for every epoch: for every batch (xi,yi):

train_model(xi,yi)

Page 18: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Theano  for  SGD    

18

W = theano.shared(value=np.zeros((n_in, n_out)), name='W’)

b = theano.shared(value=np.zeros((n_out,)), name = ‘b’)

cost = hinge_loss(W*x+b, y)

g_W = T.grad(cost, W)

g_b = T.grad(cost, b)

updates_FP = [(W, W - learning_rate * g_W), (b, b - learning_rate * g_b)]

train_model = theano.function(inputs=[x,y], outputs=cost_func, updates=updates_FP)

for every epoch: for every batch (xi,yi):

train_model(xi,yi)

We can extend this simple model to multi-layer nets

Page 19: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Layer  DefiniBons  

19

class AbstractLayer(object):

def __init__(self):

self.input_layer = []; self.params = []

def set_params_values(self, param_values):

for (p,v) in zip(self.params, param_values): p.set_value(v)

def get_params_values(self):

param_values = []

for p in self.params: param_values.append(p.get_value())

return param_values

def output(self, *args, **kwargs): # child class must inherited this!

return []

def get_output_shape(self): # child class must inherited this!

return []

From https://github.com/llcao/babyl

Page 20: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Layer  DefiniBons  

20

class HiddenLayer(AbstractLayer):

def output(self, *args, **kwargs):

input = self.input_layer.output( *args, **kwargs)

lin_output = T.dot(input, self.W) + self.b

return self.activation(lin_output)

class Conv2DLayer(AbstractLayer):

def output(self, *args, **kwargs):

conv_out = conv.conv2d(input=self.input_layer.output(), filters=self.W)

return self.activation(conv_out + self.b.dimshuffle('x', 0, 'x', 'x'))

From https://github.com/llcao/babyl

Page 21: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Other  Deep  Learning  Packages  

•  For speech: –  Kaldi

•  For computer vision –  Caffe –  Torch7 –  cuda-convnect (1, 2)

•  Others –  John Canny’s BIDMach

–  deep4j –  Word2vec –  RNNLM

21

Page 22: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao 22

A  Case  study:  Synonym  ExtracBon  

Page 23: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Problem  of  Synonym  ExtracBon  

•  Synonym: a word that has the same or nearly the same meaning as another word

23

Page 24: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Previous  Works  

•  Previous studies on synonym extraction are mostly on small datasets –  [Henriksoon 2014]: 340 medical synonym pairs –  [Wang & Hirst 2009]: 80 TOEFEL synonym questions –  [Collobert & Weston 2008]: thousands of synonym pairs

•  Our IJCAI’15 –  Word2Vect + feature expansion + linear SVM –  F1 = 0.71 on a medial synonym dataset with 2.4M pairs

24

Page 25: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Network-­‐1  

25

Page 26: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Theano  ImplementaBon  for  Network-­‐1  

26

input = T.matrix('input')

target = T.ivector('target’)

layers = []

layers += [InputLayer(dim, input)]

layers += [HiddenLayer(layers[-1], 100, activation = T.tanh)]

layers += [HiddenLayer(layers[-1], 1, activation = None)]

output = layers[-1].output().flatten()

cost = T.mean(T.switch ( (output-target) * target > 0.0 , 0.0, output-target)**2)

all_para = get_all_parameters()

updates = gen_updates_sgd(cost, all_para, learning_rate)

train_model = theano.function([input, target], cost, updates=updates)

Page 27: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

General  Feature  Expansion  

27

Hand-assigned feature expansion

Machine learned feature expansion

Page 28: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Network  2  

28

Page 29: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

ImplementaBon  of  Network-­‐2  

29

input = T.matrix('input')

target = T.ivector('target’)

layers = []

layers += [InputTensor3Layer(inputshape=[nbatch,nfeature,3])]

layers += [TensorHiddenLayer(layers[-1], outdim=10, activation = tanh)]

layers += [FlattenLayer(layers[-1], flattendim=2)]

layers += [HiddenLayer(layers[-1], outdim=100, activation = tanh)]

layers += [HiddenLayer( layers[-1], outdim=1, activation = None)]

output = layers[-1].output().flatten()

cost = T.mean(T.switch ( (output-target) * target > 0.0 , 0.0, output-target)**2)

all_para = get_all_parameters()

updates = gen_updates_sgd(cost, all_para, learning_rate)

train_model = theano.function([input, target], cost, updates=updates)

Page 30: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Performance  of  Deep  Model  For  Synonym  ExtracBon  

30

Experiments on Medial Synonym Dataset

Experiments on WordNet Synonym Dataset

Page 31: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

SummarizaBon  

•  We went through quickly how to implement deep learning in Theano –  Gradient –  Stochastic gradient decent –  Layers –  … and a case study

•  Hope this experience can help you learn Theano or other deep learning toolkits.

•  Let’s learn deep learning together!

31

Deep learning reading group:

https://yahoo.jiveon.com/groups/deep-learning-reading-group-nyc-labs

Page 32: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao

Thank you!

Questions and comments?

Page 33: Deep Learning with Theano (with a case study) - Yahoo … · Liangliang Cao 1 Deep Learning with Theano (with a case study) Liangliang “Lyon” Cao Yahoo! Labs

Liangliang Cao 33

Backup Slides