deep learning for natural language processing

57
Prerana Singhal

Upload: devashish-shanker

Post on 14-Aug-2015

277 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Deep Learning for Natural Language Processing

Prerana Singhal

Page 2: Deep Learning for Natural Language Processing

THE NEED FOR NATURAL LANGUAGE PROCESSING

No. of internet users – huge and growing

Treasure chest of data in the form of Natural Language

Page 3: Deep Learning for Natural Language Processing

APPLICATIONS

Search

Customer SupportQ & A

Summarization

Page 4: Deep Learning for Natural Language Processing

Sentiment Analysis

Page 5: Deep Learning for Natural Language Processing

NATURAL LANGUAGE PROCESSING

Rule based systems (since 1960s) Statistical Machine Learning (since

late 1980s) Naïve Bayes, SVM, HMM, LDA, … Spam classifier, Google news, Google

Translate

Page 6: Deep Learning for Natural Language Processing

WHY IS NLP HARD?

“Flipkart is a good website” (Easy)

Page 7: Deep Learning for Natural Language Processing

“I didn’t receive the product on time” (Negation)

Page 8: Deep Learning for Natural Language Processing

“Really shoddy service” (Rare words)

Page 9: Deep Learning for Natural Language Processing

“It’s gr8 to see this” (Misspellings)

Page 10: Deep Learning for Natural Language Processing

“Well played Flipkart! You’re giving IRCTC a run for their money”(Sarcasm)

Page 11: Deep Learning for Natural Language Processing

Accuracy sometimes not good enough for production

Page 12: Deep Learning for Natural Language Processing

EXCITING DEEP LEARNING RESULTS

Amazing results, esp. in image and speech domain Image Net: 6% error rate Facial Recognition: 97.35% accuracy Speech Recognition: 25% error

reduction Handwriting Recognition (ICDAR)

Page 13: Deep Learning for Natural Language Processing

IMAGE MODELS

Page 14: Deep Learning for Natural Language Processing

SENSIBLE ERRORS

Page 15: Deep Learning for Natural Language Processing

DEEP LEARNING FOR NLP

Positive – Negative Sentiment Analysis Accuracy increase: 85% to 96% 73% error reduction

State-of-the-art results on various text classification tasks (Same Model)

Tweets, Reviews, Emails Beyond Text Classification

Page 16: Deep Learning for Natural Language Processing

Why does it outperform statistical models?

Page 17: Deep Learning for Natural Language Processing

STATISTICAL CLASSIFIERS

Page 18: Deep Learning for Natural Language Processing

RAW DATA

Flipkart! You need to improve your delivery

Page 19: Deep Learning for Natural Language Processing

FEATURE ENGINEERING

Functions which transform input (raw) data into a feature space

Discriminative – for decision boundary Feature engineering is painful Deep Neural Networks: Identify the

features automatically

Page 20: Deep Learning for Natural Language Processing

Neural Networks

Page 21: Deep Learning for Natural Language Processing

DEEP NEURAL NETWORKS

Higher layers form higher levels of abstractions.

Page 22: Deep Learning for Natural Language Processing

DEEP NEURAL NETWORKS

Unsupervised pre-training

Page 23: Deep Learning for Natural Language Processing

DEEP LEARNING FOR NLP

Why Deep Learning?

Problems with applying deep-learning to natural language

Page 24: Deep Learning for Natural Language Processing

PROBLEMS WITH STATISTICAL MODELS

Page 25: Deep Learning for Natural Language Processing

BAG OF WORDS

“FLIPKART IS BETTER THAN AMAZON”

Page 26: Deep Learning for Natural Language Processing

PROBLEMS WITH STATISTICAL MODELS

Word ordering information lost Data sparsity Words as atomic symbols Very hard to find higher level

features Features other than BOW

Page 27: Deep Learning for Natural Language Processing

HOW TO ENCODE THE MEANING OF A WORD?

Wordnet: Dictionary of synonyms

Synonyms: Adept, expert, good, practiced, proficient, skillful

Page 28: Deep Learning for Natural Language Processing

WORD EMBEDDINGS: THE FIRST BREAKTHROUGH

Page 29: Deep Learning for Natural Language Processing

NEURAL LANGUAGE MODEL

Page 30: Deep Learning for Natural Language Processing

WORD EMBEDDINGS:VISUALIZATIONS

Page 31: Deep Learning for Natural Language Processing

CAPTURE RELATIONSHIPS

Page 32: Deep Learning for Natural Language Processing

WORD EMBEDDING: VISUALIZATIONS

Page 33: Deep Learning for Natural Language Processing

WORD EMBEDDING: VISUALIZATIONS

Page 34: Deep Learning for Natural Language Processing

WORD EMBEDDING: VISUALIZATIONS

Trained in a completely unsupervised way

Reduce data sparsity Semantic Hashing Appear to carry semantic

information about the words Freely available for Out of Box usage

Page 35: Deep Learning for Natural Language Processing

COMPOSITIONALITY

How do we go beyond words (sentences and paragraphs)?

This turns out to be a very hard problem

Simple Approaches Word Vector Averaging Weighted Word Vector Averaging

Page 36: Deep Learning for Natural Language Processing
Page 37: Deep Learning for Natural Language Processing

CONVOLUTIONAL NEURAL NETWORKS

Excellent feature extractors in image Features are detected regardless of

position in image NLP Almost from Scratch: Collobert et

al 2011 First applied CNN for NLP

Page 38: Deep Learning for Natural Language Processing
Page 39: Deep Learning for Natural Language Processing

CNN FOR TEXT

Page 40: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

Page 41: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

0.46 0.04 -0.09 Composition

Page 42: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

Weight Matrix (3 x 9)

[-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16]

[-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16]

12

[0.46 0.04 -0.09]0.46 0.04 -0.09

Page 43: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

-0.57 0.81 0.25

0.46

0.04

-0.09

Page 44: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

-0.18 0.26 0.40

-0.57

0.81

0.25

0.46

0.04

-0.09

Page 45: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

-0.57

0.81

0.25

0.46

0.04

-0.09

-0.13

0.26

0.40

Page 46: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

-0.57

0.81

0.25

0.46

0.04

-0.09

-0.13

0.26

0.40

0.46

0.81

0.40

Page 47: Deep Learning for Natural Language Processing

-0.33

0.56

0.98

-0.13

-0.81

-0.01

0.17

0.64

-0.16

0.97

0.99

0.90

-0.23

0.16

0.68

-0.57

0.81

0.25

0.46

0.04

-0.09

-0.13

0.26

0.40

0.46

0.81

0.40

Neutral

Page 48: Deep Learning for Natural Language Processing

DEMYSTIFYING MAX POOLING Finds the most important part(s) of

sentence

Page 49: Deep Learning for Natural Language Processing

CNN FOR TEXT

Window sizes: 3,4,5 Static mode Non Static mode Multichannel mode Multiclass Classification

Page 50: Deep Learning for Natural Language Processing

RESULTSDataset Source Labels Statistical

ModelsCNN

Flipkart Twitter Sentiment

Twitter Pos, Neg 85% 96%

Flipkart Twitter Sentiment

Twitter Pos, Neg, Neu 76% 89%

Fine grained sentiment in Emails

Emails Angry, Sad, Complaint, Request

55% 68%

SST2 Movie Reviews

Pos, Neg 79.4% 87.5%

SemEval Task 4 RestaurantReviews

food / service / ambience / price / misc

88.5% 89.6%

Page 51: Deep Learning for Natural Language Processing

SENTIMENT: ANECDOTES

Page 52: Deep Learning for Natural Language Processing

DRAWBACKS & LEARNINGS

Computationally Expensive How to scale training? How to scale prediction? Libraries for Deep Learning

Theano PyLearn2 Torch

Page 53: Deep Learning for Natural Language Processing

“I THINK YOU SHOULD BE MORE EXPLICIT HERE IN STEP TWO”

Page 54: Deep Learning for Natural Language Processing

OPEN SOURCED

https://github.com/flipkart-incubator/optimus

Page 55: Deep Learning for Natural Language Processing

BEYOND TEXT CLASSIFICATION

Text Classification covers a lot of NLP problems (or problems can be reduced to it)

Word Embedding Unsupervised Learning Sequence Learning

RNN, LSTM

Page 56: Deep Learning for Natural Language Processing

RECURRENT MODELS

RNNs, LSTMs Machine Translation, Chat,

Classification

Page 57: Deep Learning for Natural Language Processing

ANY QUESTIONS ?