![Page 1: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/1.jpg)
@graphificRoelof Pieters
Introduc0on to Deep Learning for NLP
22 January 2015 Stockholm Natural Language Processing Meetup
FEEDA
Slides at:http://www.slideshare.net/roelofp/220115dlmeetup
1
![Page 2: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/2.jpg)
Deep Learning ???
2
![Page 3: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/3.jpg)
A couple of headlines… [all November ’14]
3
![Page 4: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/4.jpg)
(source: Google Trends)4
![Page 5: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/5.jpg)
Machine Learning ??
- Audience Check -
5
![Page 6: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/6.jpg)
• “Brain” inspired / simulations:
• vision: make learning algorithms better and easier to use
• goal: revolutions in (practical) advances for machine learning and AI
• Deep Learning = subfield of Machine Learning
Deep Learning ??
6
![Page 7: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/7.jpg)
Biological Inspiration
7
![Page 8: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/8.jpg)
Deep Learning ??
8
![Page 9: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/9.jpg)
DL: Impact
9
Speech Recognition
![Page 10: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/10.jpg)
DL: Impact
10
Deep Learning for the win!a few examples:
• IJCNN 2011 Traffic Sign Recognition Competition• ISBI 2012 Segmentation of neuronal structures in EM stacks
challenge• ICDAR 2011 Chinese handwriting recognition
![Page 11: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/11.jpg)
• Deals with “construction and study of systems that can learn from data”
Machine Learning ??
A computer program is said to learn from experience (E) with respect to some class of tasks (T) and performance measure (P), if its performance at tasks in T, as measured by P, improves with experience E
— T. Mitchell 1997
11
![Page 12: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/12.jpg)
Machine Learning ??
Traditional Programming:
Data
ProgramOutput
DataProgram
Output
Machine Learning:
12
![Page 13: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/13.jpg)
Supervised (inductive) learning
• Training data includes desired outputs
Unsupervised learning
• Training data does not include desired outputs
Semi-supervised learning
• Training data includes a few desired outputs
Reinforcement learning
• Rewards from sequence of actions
Types of Learning
13
![Page 14: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/14.jpg)
ML: Traditional Approach
1. Gather as much LABELED data as you can get
2. Throw some algorithms at it (mainly put in an SVM and keep it at that)
3. If you actually have tried more algos: Pick the best
4. Spend hours hand engineering some features / feature selection / dimensionality reduction (PCA, SVD, etc)
5. Repeat…
For each new problem/question::
14
![Page 15: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/15.jpg)
Machine Learning for NLP
Data
Classic Approach: Data is fed into a learning algorithm:
Learning Algorithm
15
![Page 16: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/16.jpg)
Machine Learning for NLP
some of the (many) treebank datasets
source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks
!
16
![Page 17: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/17.jpg)
Penn TreebankThat’s a lot of “manual” work:
17
![Page 18: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/18.jpg)
• the students went to class
DT NN VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
With a lot of issues:
Penn Treebank
18
![Page 19: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/19.jpg)
Machine Learning for NLP
Learning AlgorithmData
“Features”
PredictionPrediction/Classifier
train set
test set
19
![Page 20: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/20.jpg)
Machine Learning for NLP
Learning Algorithm
“Features”
PredictionPrediction/Classifier
train set
test set
20
![Page 21: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/21.jpg)
Machine Learning for NLP
• Until the early 1990’s, NLP systems were built manually with hand-crafted dictionaries and rules.
• As large electronic text corpora became increasingly available, researchers began using machine learning techniques to automatically build NLP systems.
• Today, the vast majority of NLP systems use machine learning.
21
![Page 22: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/22.jpg)
2. Neural Networks and a short history lesson
22
![Page 23: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/23.jpg)
Perceptron (1957)
Frank Rosenblatt (1928-1971)
Original Perceptron
Simplified model:
(From Perceptrons by M. L Minsky and S. Papert, 1969, Cambridge, MA: MIT Press. Copyright 1969
by MIT Press.
23
![Page 24: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/24.jpg)
Perceptron (1957)
Perceptron Research, youtube clip: https://www.youtube.com/watch?v=cNxadbrN_aI&feature=youtu.be&t=12
24
![Page 25: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/25.jpg)
Perceptron (1957)
25
![Page 26: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/26.jpg)
or
Multilayer Perceptron (1986)
inputs
weights
biasactivation
26
![Page 27: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/27.jpg)
Neuron Model
All you need to know:
27
![Page 28: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/28.jpg)
Activation functions
28
![Page 29: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/29.jpg)
Backpropagation (1974/1986)
1974 Paul Werbos’ invents Backpropagation algorithm for NN1986 Backdrop popularized by Rumelhart, Hinton, Williams1990: Renewed Interest in NN’s
29
![Page 30: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/30.jpg)
Backprop Renaissance
Forward Propagation
• Sum inputs, produce activation, feed-forward
30
![Page 31: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/31.jpg)
Backprop Renaissance
Back Propagation (of error)
• Calculate total error at the top
• Calculate contributions to error at each step going backwards
31
![Page 32: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/32.jpg)
• Compute gradient of example-wise loss wrt parameters
• Simply applying the derivative chain rule wisely
• If computing the loss (example, parameters) is O(n)computation, then so is computing the gradient
Backpropagation
32
![Page 33: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/33.jpg)
Simple Chain Rule
33
![Page 34: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/34.jpg)
Training procedure
• Initialize randomly• Sequentially give it data.• See what the difference is between network output
and actual output.• Update the weights according to this error.• End result: give a model input, and it produces a
proper output.
Quest for the weights. The weights are the model!
To reiterate:
34
![Page 35: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/35.jpg)
So why only now?
• Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks.
• No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998
• SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow architecture).
• Breakthrough in 2006!
35
![Page 36: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/36.jpg)
2006 Breakthrough
• More data
• Faster hardware: GPU’s, multi-core CPU’s
• Working ideas on how to train deep architectures
36
![Page 37: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/37.jpg)
2006 Breakthrough
• More data
• Faster hardware: GPU’s, multi-core CPU’s
• Working ideas on how to train deep architectures
37
![Page 38: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/38.jpg)
2006 Breakthrough
38
![Page 39: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/39.jpg)
2006 Breakthrough
• More data
• Faster hardware: GPU’s, multi-core CPU’s
• Working ideas on how to train deep architectures
39
![Page 40: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/40.jpg)
2006 Breakthrough
40
![Page 41: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/41.jpg)
2006 Breakthrough
• More data
• Faster hardware: GPU’s, multi-core CPU’s
• Working ideas on how to train deep architectures
41
![Page 42: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/42.jpg)
2006 Breakthrough
Stacked Restricted Boltzman Machines* (RBM) Hinton, G. E, Osindero, S., and Teh, Y. W. (2006).A fast learning algorithm for deep belief nets.Neural Computation, 18:1527-1554.
Stacked Autoencoders (AE) Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007).Greedy Layer-Wise Training of Deep Networks,Advances in Neural Information Processing Systems 19
* called Deep Belief Networks (DBN)42
![Page 43: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/43.jpg)
3. Deep Learning onwards we go…
43
![Page 44: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/44.jpg)
44
![Page 45: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/45.jpg)
Hierarchies
Efficient
Generalization
Distributed
Sharing
Unsupervised*
Black Box
Training Time
Major PWNAGE!
Much Data
Why go Deep ?
45
![Page 46: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/46.jpg)
No More Handcrafted Features !
46
![Page 47: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/47.jpg)
— Andrew Ng
“I’ve worked all my life in Machine Learning, and I’ve
never seen one algorithm knock over benchmarks like Deep
Learning”
Deep Learning: Why?
47
![Page 48: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/48.jpg)
Biological JustificationDeep Learning = Brain “inspired”Audio/Visual Cortex has multiple stages == Hierarchical
• Computational Biology • CVAP
• Jorge Dávila-Chacón • “that guy”
“Brainiacs” “Pragmatists”vs
48
![Page 49: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/49.jpg)
Different Levels of Abstraction
49
![Page 50: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/50.jpg)
Hierarchical Learning
• Natural progression from low level to high level structure as seen in natural complexity
Different Levels of AbstractionFeature Representation
50
![Page 51: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/51.jpg)
Hierarchical Learning
• Natural progression from low level to high level structure as seen in natural complexity
• Easier to monitor what is being learnt and to guide the machine to better subspaces
Different Levels of AbstractionFeature Representation
51
![Page 52: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/52.jpg)
Hierarchical Learning
• Natural progression from low level to high level structure as seen in natural complexity
• Easier to monitor what is being learnt and to guide the machine to better subspaces
• A good lower level representation can be used for many distinct tasks
Different Levels of AbstractionFeature Representation
52
![Page 53: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/53.jpg)
Hierarchical Learning
• Natural progression from low level to high level structure as seen in natural complexity
• Easier to monitor what is being learnt and to guide the machine to better subspaces
• A good lower level representation can be used for many distinct tasks
Different Levels of AbstractionFeature Representation
53
![Page 54: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/54.jpg)
• Shared Low Level Representations
• Multi-Task Learning
• Unsupervised Training
Generalizable Learning
54
![Page 55: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/55.jpg)
• Shared Low Level Representations
• Multi-Task Learning
• Unsupervised Training
• Partial Feature Sharing
• Mixed Mode Learning
• Composition of Functions
Generalizable Learning
55
![Page 56: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/56.jpg)
Classic Deep Architecture
Input layer
Hidden layers
Output layer
56
![Page 57: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/57.jpg)
Modern Deep Architecture
Input layer
Hidden layers
Output layer
57
![Page 58: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/58.jpg)
Deep Learning: Why? (again)
Beat state of the art in many areas:• Language Modeling (2012, Mikolov et al)• Image Recognition (Krizhevsky won
2012 ImageNet competition)• Sentiment Classification (2011, Socher et
al)• Speech Recognition (2010, Dahl et al)• MNIST hand-written digit recognition (Ciresan et al, 2010)
58
![Page 59: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/59.jpg)
One Model rules them all ?DL approaches have been successfully applied to:
Deep Learning: Why for NLP ?
Automatic summarization Coreference resolution Discourse analysis
Machine translation Morphological segmentation Named entity recognition (NER)
Natural language generation
Natural language understanding
Optical character recognition (OCR)
Part-of-speech tagging
Parsing
Question answering
Relationship extraction
sentence boundary disambiguation
Sentiment analysis
Speech recognition
Speech segmentation
Topic segmentation and recognition
Word segmentation
Word sense disambiguation
Information retrieval (IR)
Information extraction (IE)
Speech processing
59
![Page 60: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/60.jpg)
- COFFEE BREAK -after the break we return with: CODE
Download the code samples already now from:https://github.com/graphific/DL-Meetup-intro
http://goo.gl/abX1E2 shortened url: 60
![Page 61: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/61.jpg)
• Deep Neural Network
• Multilayer Perceptron (MLP) or Artificial Neural Network (ANN)
1. MLP
Logistic regression
Training regime: Stochastic Gradient Descent (SGD) with minibatches
MNIST dataset
Simple hidden layer
61
![Page 62: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/62.jpg)
2. Convolutional Neural Network
62
from: Krizhevsky, Sutskever, Hinton. (2012). ImageNet Classification with Deep Convolutional Neural Networks[breakthrough in object recognition, Imagenet 2012]
![Page 63: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/63.jpg)
Convolutional Neural Network
http://ufldl.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
movie time:http://www.cs.toronto.edu/~hinton/adi/index.htm
63
![Page 64: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/64.jpg)
Thats it, no more code! (for now)
64
![Page 65: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/65.jpg)
Deep Learning: Future Developments
Currently an explosion of developments• Hessian-Free networks (2010)• Long Short Term Memory (2011)• Large Convolutional nets, max-pooling (2011)• Nesterov’s Gradient Descent (2013)
Currently state of the art but...• No way of doing logical inference (extrapolation)• No easy integration of abstract knowledge• Hypothetic space bias might not conform with reality
65
![Page 66: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/66.jpg)
Deep Learning: Future Challenges
a
66
Szegedy, C., Wojciech, Z., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R. (2013) Intriguing properties of neural networks
L: correctly identified, Center: added noise x10, R: “Ostrich”
![Page 67: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/67.jpg)
• cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/CUDA, optimized for GTX 580) https://code.google.com/p/cuda-convnet2/
• Caffe (Berkeley) (Cuda/OpenCL, Theano, Python)http://caffe.berkeleyvision.org/
• OverFeat (NYU) http://cilvr.nyu.edu/doku.php?id=code:start
Wanna Play ?
![Page 68: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/68.jpg)
• Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http://deeplearning.net/software/theano/
• Pylearn2 - library designed to make machine learning research easy. http://deeplearning.net/software/pylearn2/
• Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/
• more info: http://deeplearning.net/software links/
Wanna Play ?
Wanna Play ?
![Page 69: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/69.jpg)
as PhD candidate KTH/CSC:“Always interested in discussing
Machine Learning, Deep Architectures, Graphs, and
Language Technology”
In touch!
[email protected]/~roelof/
Internship / EntrepeneurshipAcademic/Researchas CIO/CTO Feeda:
“Always looking for additions to our brand new R&D team”
[Internships upcoming on KTH exjobb website…]
Feeda
69
![Page 70: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/70.jpg)
Were Hiring!
Feeda
• Dev Ops • Software Developers • Data Scientists
70
![Page 71: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/71.jpg)
Thanks for listening
Mingling time!
71
![Page 72: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/72.jpg)
72
Can’t get enough? Come to my talk Tomorrow (friday)
Description on KTH website
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters TCS/CSCFriday jan 23 13:30.
Room 304, Teknikringen 14 level 3
![Page 73: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/73.jpg)
Appendum
Some of the exciting recent developments in NLPespecially Distributed Semantics
73
![Page 74: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/74.jpg)
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/ 74
![Page 75: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/75.jpg)
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/ 75
![Page 76: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/76.jpg)
Word Embeddings: Collobert & Weston (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) . Natural Language Processing (almost) from Scratch
76
![Page 77: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/77.jpg)
Multi-embeddings: Stanford (2012)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Improving Word Representations via Global Context and Multiple Word Prototypes
77
![Page 78: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/78.jpg)
Linguistic Regularities: Mikolov (2013)
code & info: https://code.google.com/p/word2vec/ Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations
78
![Page 79: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/79.jpg)
Word Embeddings for MT: Mikolov (2013)
Mikolov, T., Le, V. L., Sutskever, I. (2013) . Exploiting Similarities among Languages for Machine Translation79
![Page 80: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/80.jpg)
Recursive Deep Models & Sentiment: Socher (2013)
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013
code & demo: http://nlp.stanford.edu/sentiment/index.html80
![Page 81: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/81.jpg)
Paragraph Vectors: Le & Mikolov (2014)
Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents
81
• add context (sentence, paragraph, document) to word vectors during training
!
Results on Stanford Sentiment Treebank dataset:
![Page 82: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/82.jpg)
Global Vectors, GloVe: Stanford (2014)
Pennington, P., Socher, R., Manning,. D.M. (2014). GloVe: Global Vectors for Word Representation
code & demo: http://nlp.stanford.edu/projects/glove/
vsresults on the word analogy task
“similar accuracy”
82
![Page 83: Deep Learning, an interactive introduction for NLP-ers](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f9b384760da3da068bd6e0/html5/thumbnails/83.jpg)
Dependency-based Embeddings: Levy & Goldberg (2014)
Levy, O., Goldberg, Y. (2014). Dependency-Based Word Embeddings
code & demo: https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
- Syntactic Dependency Context
Australian scientist discovers star with telescope
- Bag of Words (BoW) Context
0.3$
0.4$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$
Precision
$
Recall$
“Dependency-based embeddings have more
functional similarities”
83