deep learning for natural language processing
TRANSCRIPT
Prerana Singhal
THE NEED FOR NATURAL LANGUAGE PROCESSING
No. of internet users – huge and growing
Treasure chest of data in the form of Natural Language
APPLICATIONS
Search
Customer SupportQ & A
Summarization
Sentiment Analysis
NATURAL LANGUAGE PROCESSING
Rule based systems (since 1960s) Statistical Machine Learning (since
late 1980s) Naïve Bayes, SVM, HMM, LDA, … Spam classifier, Google news, Google
Translate
WHY IS NLP HARD?
“Flipkart is a good website” (Easy)
“I didn’t receive the product on time” (Negation)
“Really shoddy service” (Rare words)
“It’s gr8 to see this” (Misspellings)
“Well played Flipkart! You’re giving IRCTC a run for their money”(Sarcasm)
Accuracy sometimes not good enough for production
EXCITING DEEP LEARNING RESULTS
Amazing results, esp. in image and speech domain Image Net: 6% error rate Facial Recognition: 97.35% accuracy Speech Recognition: 25% error
reduction Handwriting Recognition (ICDAR)
IMAGE MODELS
SENSIBLE ERRORS
DEEP LEARNING FOR NLP
Positive – Negative Sentiment Analysis Accuracy increase: 85% to 96% 73% error reduction
State-of-the-art results on various text classification tasks (Same Model)
Tweets, Reviews, Emails Beyond Text Classification
Why does it outperform statistical models?
STATISTICAL CLASSIFIERS
RAW DATA
Flipkart! You need to improve your delivery
FEATURE ENGINEERING
Functions which transform input (raw) data into a feature space
Discriminative – for decision boundary Feature engineering is painful Deep Neural Networks: Identify the
features automatically
Neural Networks
DEEP NEURAL NETWORKS
Higher layers form higher levels of abstractions.
DEEP NEURAL NETWORKS
Unsupervised pre-training
DEEP LEARNING FOR NLP
Why Deep Learning?
Problems with applying deep-learning to natural language
PROBLEMS WITH STATISTICAL MODELS
BAG OF WORDS
“FLIPKART IS BETTER THAN AMAZON”
PROBLEMS WITH STATISTICAL MODELS
Word ordering information lost Data sparsity Words as atomic symbols Very hard to find higher level
features Features other than BOW
HOW TO ENCODE THE MEANING OF A WORD?
Wordnet: Dictionary of synonyms
Synonyms: Adept, expert, good, practiced, proficient, skillful
WORD EMBEDDINGS: THE FIRST BREAKTHROUGH
NEURAL LANGUAGE MODEL
WORD EMBEDDINGS:VISUALIZATIONS
CAPTURE RELATIONSHIPS
WORD EMBEDDING: VISUALIZATIONS
WORD EMBEDDING: VISUALIZATIONS
WORD EMBEDDING: VISUALIZATIONS
Trained in a completely unsupervised way
Reduce data sparsity Semantic Hashing Appear to carry semantic
information about the words Freely available for Out of Box usage
COMPOSITIONALITY
How do we go beyond words (sentences and paragraphs)?
This turns out to be a very hard problem
Simple Approaches Word Vector Averaging Weighted Word Vector Averaging
CONVOLUTIONAL NEURAL NETWORKS
Excellent feature extractors in image Features are detected regardless of
position in image NLP Almost from Scratch: Collobert et
al 2011 First applied CNN for NLP
CNN FOR TEXT
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
0.46 0.04 -0.09 Composition
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
Weight Matrix (3 x 9)
[-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16]
[-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16]
12
[0.46 0.04 -0.09]0.46 0.04 -0.09
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57 0.81 0.25
0.46
0.04
-0.09
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.18 0.26 0.40
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
0.46
0.81
0.40
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
0.46
0.81
0.40
Neutral
DEMYSTIFYING MAX POOLING Finds the most important part(s) of
sentence
CNN FOR TEXT
Window sizes: 3,4,5 Static mode Non Static mode Multichannel mode Multiclass Classification
RESULTSDataset Source Labels Statistical
ModelsCNN
Flipkart Twitter Sentiment
Twitter Pos, Neg 85% 96%
Flipkart Twitter Sentiment
Twitter Pos, Neg, Neu 76% 89%
Fine grained sentiment in Emails
Emails Angry, Sad, Complaint, Request
55% 68%
SST2 Movie Reviews
Pos, Neg 79.4% 87.5%
SemEval Task 4 RestaurantReviews
food / service / ambience / price / misc
88.5% 89.6%
SENTIMENT: ANECDOTES
DRAWBACKS & LEARNINGS
Computationally Expensive How to scale training? How to scale prediction? Libraries for Deep Learning
Theano PyLearn2 Torch
“I THINK YOU SHOULD BE MORE EXPLICIT HERE IN STEP TWO”
OPEN SOURCED
https://github.com/flipkart-incubator/optimus
BEYOND TEXT CLASSIFICATION
Text Classification covers a lot of NLP problems (or problems can be reduced to it)
Word Embedding Unsupervised Learning Sequence Learning
RNN, LSTM
RECURRENT MODELS
RNNs, LSTMs Machine Translation, Chat,
Classification
ANY QUESTIONS ?