![Page 1: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/1.jpg)
CS11-747 Neural Networks for NLP
Convolutional Networksfor Text
Pengfei Liu
Sitehttps://phontron.com/class/nn4nlp2020/
With some slides by Graham Neubig
![Page 2: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/2.jpg)
Outline1. Feature Combinations
2. CNNs and Key Concepts
3. Case Study on Sentiment Classification
4. CNN Variants and Applications
5. Structured CNNs
6. Summary
![Page 3: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/3.jpg)
An Example Prediction Problem: Sentiment Classification
I hate this movie
I love this movie
very goodgood
neutralbad
very bad
very goodgood
neutralbad
very bad
?
?
![Page 4: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/4.jpg)
An Example Prediction Problem: Sentiment Classification
I hate this movie
I love this movie
very goodgoodneutralbadvery bad
very goodgoodneutralbadvery bad
![Page 5: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/5.jpg)
An Example Prediction Problem: Sentiment Classification
I hate this movie
I love this movie
very goodgoodneutralbadvery bad
very goodgoodneutralbadvery bad
how does our machine to do this
task?
![Page 6: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/6.jpg)
Continuous Bag of Words (CBOW)
I hate this movie
+
bias
=
scores
+ + +
lookup lookup lookuplookup
W
=
• One of the simplest
methods
• Discrete symbols to
continuous vectors
![Page 7: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/7.jpg)
Continuous Bag of Words (CBOW)
I hate this movie
+
bias
=
scores
+ + +
lookup lookup lookuplookup
W
=
• One of the simplest
methods
• Discrete symbols to
continuous vectors
• Average all vectors
![Page 8: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/8.jpg)
Deep CBOW
I hate this movie
+
bias
=
scores
W
+ + +
=tanh( W1*h + b1)
tanh( W2*h + b2)
• More linear
transformations followed
by activation functions
(Multilayer Perceptron,
MLP)
![Page 9: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/9.jpg)
What’s the Use of the “Deep”
• Multiple MLP layers allow us easily to learn feature combinations (a node in the second layer might be “feature 1 AND feature 5 are active”)
• e.g. capture things such as “not” AND “hate”
• BUT! Cannot handle “not hate”
![Page 10: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/10.jpg)
Handling Combinations
![Page 11: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/11.jpg)
Bag of n-gramsI hate this movie
bias
sum( ) =
scores
softmax
probs
• A contiguous sequence of words• Concatenate word vectors
![Page 12: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/12.jpg)
Why Bag of n-grams?
• Allow us to capture combination features in a simple way “don’t love”, “not the best”
• Decent baseline and works pretty well
![Page 13: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/13.jpg)
What Problemsw/ Bag of n-grams?
• Same as before: parameter explosion
• No sharing between similar words/n-grams
• Lose the global sequence order
![Page 14: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/14.jpg)
What Problemsw/ Bag of n-grams?
• Same as before: parameter explosion
• No sharing between similar words/n-grams
• Lose the global sequence order
Other solutions?
![Page 15: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/15.jpg)
Neural Sequence Models
![Page 16: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/16.jpg)
Neural Sequence Models
Most of NLP tasks Sequence representation learning problem
![Page 17: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/17.jpg)
Neural Sequence Models
char: i-m-p-o-s-s-i-b-l-e
word: I-love-this-movie
![Page 18: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/18.jpg)
Neural Sequence Models
CBOWBag of n-grams
CNNsRNNs
TransformerGraphNNs
![Page 19: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/19.jpg)
Neural Sequence Models
CBOWBag of n-grams
CNNsRNNs
TransformerGraphNNs
![Page 20: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/20.jpg)
Convolutional Neural Networks
![Page 21: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/21.jpg)
Convolution -- > mathematical operation
• Continuous
• Discrete
Definition of Convolution
![Page 22: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/22.jpg)
Convolution -- > mathematical operation
• Continuous
• Discrete
Definition of Convolution
![Page 23: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/23.jpg)
Intuitive Understanding
Input: feature vector
Filter: learnable param.
Output: hidden vector
![Page 24: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/24.jpg)
Priori Entailed by CNNs
![Page 25: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/25.jpg)
Priori Entailed by CNNs
Local bias:Different words could interact with their neighbors
![Page 26: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/26.jpg)
Priori Entailed by CNNs
Different words could interact with their neighbors
Local bias:
![Page 27: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/27.jpg)
Priori Entailed by CNNs
Parameter sharing:The parameters of composition function are the same.
![Page 28: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/28.jpg)
Basics of CNNs
![Page 29: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/29.jpg)
Concept: 2d Convolution
• Deal with 2-dimension signal, i.e., image
![Page 30: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/30.jpg)
Concept: 2d Convolution
![Page 31: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/31.jpg)
Concept: 2d Convolution
![Page 32: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/32.jpg)
Concept: StrideStride: the number of units shifts over the input matrix.
![Page 33: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/33.jpg)
Concept: StrideStride: the number of units shifts over the input matrix.
![Page 34: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/34.jpg)
Concept: StrideStride: the number of units shifts over the input matrix.
![Page 35: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/35.jpg)
Concept: PaddingPadding: dealing with the units at the boundary of input vector.
![Page 36: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/36.jpg)
Concept: PaddingPadding: dealing with the units at the boundary of input vector.
![Page 37: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/37.jpg)
Three Types of Convolutions
Narrowm=7
n=3
m-n+1=5
![Page 38: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/38.jpg)
Three Types of Convolutions
Narrow
Equal
m=7
n=3
m-n+1=5
m=7
n=3
m-n+1=5
![Page 39: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/39.jpg)
Three Types of Convolutions
Narrowm=7
n=3
m-n+1=5
![Page 40: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/40.jpg)
Three Types of Convolutions
Narrow
Equal
m=7
n=3
m-n+1=5
m=7
n=3
m
![Page 41: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/41.jpg)
Three Types of Convolutions
Narrow
Equal
m=7
n=3
m-n+1=5
m=7
n=3
m
Widem=7
n=3
m+n-1=9
![Page 42: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/42.jpg)
Concept: Multiple Filters
Motivation: each filter represents a unique feature of the convolution window.
![Page 43: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/43.jpg)
Concept: Pooling
• Pooling is an aggregation operation, aiming to select informative features
![Page 44: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/44.jpg)
Concept: Pooling
• Pooling is an aggregation operation, aiming to select informative features
• Max pooling: “Did you see this feature anywhere in the range?” (most common)
• Average pooling: “How prevalent is this feature over the entire range”
• k-Max pooling: “Did you see this feature up to k times?”
• Dynamic pooling: “Did you see this feature in the beginning? In the middle? In the end?”
![Page 45: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/45.jpg)
Max pooling:
Concept: Pooling
![Page 46: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/46.jpg)
Max pooling:
Mean pooling:
Concept: Pooling
![Page 47: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/47.jpg)
Max pooling:
Mean pooling:
K-max pooling
Concept: Pooling
![Page 48: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/48.jpg)
Max pooling:
Mean pooling:
K-max pooling
Concept: Pooling
Dynamic pooling:
![Page 49: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/49.jpg)
Case Study:Convolutional Networks for Text
Classification (Kim 2015)
![Page 50: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/50.jpg)
CNNs for Text Classification(Kim 2015)
• Task: sentiment classification
• Input: a sentence
• Output: a class label (positive/negative)
![Page 51: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/51.jpg)
CNNs for Text Classification(Kim 2015)
• Task: sentiment classification
• Input: a sentence
• Output: a class label (positive/negative)
• Model:
• Embedding layer
• Multi-Channel CNN layer
• Pooling layer/Output layer
![Page 52: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/52.jpg)
Overview of the ArchitectureFilterInput CNN Pooling Output
Dict
![Page 53: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/53.jpg)
Embedding LayerInput
Look-up Table
• Build a look-up table (pre-trained? Fine-tuned?)
• Discrete distributed
![Page 54: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/54.jpg)
Conv. Layer
![Page 55: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/55.jpg)
Conv. Layer
• Stride size?
![Page 56: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/56.jpg)
Conv. Layer
• Stride size?
• 1
![Page 57: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/57.jpg)
Conv. Layer
• Wide, equal, narrow?
![Page 58: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/58.jpg)
Conv. Layer
• Wide, equal, narrow?
• narrow
![Page 59: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/59.jpg)
Conv. Layer
• How many filters?
![Page 60: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/60.jpg)
Conv. Layer
• How many filters?
• 4
![Page 61: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/61.jpg)
Pooling Layer
• Max-pooling
• Concatenate
![Page 62: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/62.jpg)
Output Layer
• MLP layer
• Dropout
• Softmax
![Page 63: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/63.jpg)
CNN Variants
![Page 64: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/64.jpg)
Priori Entailed by CNNs
• Local bias
• Parameter sharing
![Page 65: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/65.jpg)
Priori Entailed by CNNs
• Local bias
• Parameter sharing
How to handle long-term dependencies?
How to handle different types of compositionality?
![Page 66: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/66.jpg)
Priori Entailed by CNNs
Priori
Advantage
Limitation
![Page 67: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/67.jpg)
CNN Variants
• Long-term dependency
• increase receptive fields (dilated)
• Complicated Interaction
• dynamic filters
Locality Bias
Sharing Parameters
![Page 68: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/68.jpg)
Dilated Convolution(e.g. Kalchbrenner et al. 2016)
i _ h a t e _ t h i s _ f i l m
sentence class(classification)
next char(languagemodeling)word class(tagging)
• Long-term dependency with less layers
![Page 69: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/69.jpg)
Dynamic Filter CNN(e.g. Brabandere et al. 2016)
• Parameters of filters are static, failing to capture rich interaction patterns.
• Filters are generated dynamically conditioned on an input.
![Page 70: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/70.jpg)
Common Applications
![Page 71: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/71.jpg)
CNN Applications
• Word-level CNNs• Basic unit: word
• Learn the representation of a sentence
• Phrasal patterns
• Char-level CNNs• Basic unit: character
• Learn the representation of a word
• Extract morphological patters
![Page 72: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/72.jpg)
CNN Applications
• Word-level CNN• Sentence representation
![Page 73: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/73.jpg)
NLP (Almost) from Scratch(Collobert et al.2011)
• One of the most important papers in NLP
• Proposed as early as 2008
![Page 74: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/74.jpg)
CNN Applications
• Word-level CNN• Sentence representation
• Char-level CNN• Text Classification
![Page 75: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/75.jpg)
CNN-RNN-CRF for Tagging(Ma et al. 2016)
• A classic framework and de-facto standard for tagging
• Char-CNN is used to learn word representations (extract morphological information).
• Complementarity
![Page 76: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/76.jpg)
Structured Convolution
![Page 77: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/77.jpg)
Why Structured Convolution?
The man ate the egg.
![Page 78: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/78.jpg)
Why Structured Convolution?
The man ate the egg.vanilla CNNs
![Page 79: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/79.jpg)
Why Structured Convolution?
The man ate the egg.• Some convolutional
operations are not necessary
• e.g. noun-verb pairs very informative, but not captured by normal CNNs
vanilla CNNs
![Page 80: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/80.jpg)
Why Structured Convolution?
The man ate the egg.• Some convolutional
operations are not necessary
• e.g. noun-verb pairs very informative, but not captured by normal CNNs
• Language has structure, would like it to localize features
![Page 81: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/81.jpg)
Why Structured Convolution?
The man ate the egg.• Some convolutional
operations are not necessary
• e.g. noun-verb pairs very informative, but not captured by normal CNNs
• Language has structure, would like it to localize features
The “Structure” provides stronger prior!
![Page 82: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/82.jpg)
Tree-structured Convolution(Mou et al. 2014, Ma et al. 2015)
• Convolve over parents, grandparents, siblings
![Page 83: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/83.jpg)
Graph Convolution(e.g. Marcheggiani et al. 2017)
• Convolution is shaped by graph structure
• For example, dependencytree is a graph with1) Self-loop connection2) Dependency connections3) Reverse connections
![Page 84: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/84.jpg)
Summary
![Page 85: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/85.jpg)
Neural Sequence Models
CBOWBag of n-grams
CNNsRNNs
TransformerGraphNNs
![Page 86: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/86.jpg)
Neural Sequence Models
How do we make the choices of different neural sequence models?
![Page 87: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/87.jpg)
Understand the design philosophy of a model
• Inductive bias: the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered (from wikipedia)
• Structural bias: a set of prior knowledge incorporated into your model design
![Page 88: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/88.jpg)
Structural Bias
• Structural bias: a set of prior knowledge incorporated into your model design
Local Non-local
• Locality
![Page 89: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/89.jpg)
Structural Bias
• Structural bias: a set of prior knowledge incorporated into your model design
• Topological structure
Sequential
Tree
Graph
![Page 90: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/90.jpg)
What inductive bias does a neural component entail?
Locality Bias
Topological Structure
Local Non-local
Seq. Tree Graph
![Page 91: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/91.jpg)
What inductive bias does a neural component entail?
Locality Bias
Topological Structure
Local Non-local
Seq. Tree Graph
CNNRNN
![Page 92: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/92.jpg)
What inductive bias does a neural component entail?
Locality Bias
Topological Structure
Local Non-local
Seq. Tree Graph
Structured CNN
![Page 93: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/93.jpg)
What inductive bias does a neural component entail?
Locality Bias
Topological Structure
Local Non-local
Seq. Tree Graph
?
![Page 94: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/94.jpg)
What inductive bias does a neural component entail?
Locality Bias
Topological Structure
Local Non-local
Seq. Tree Graph
?
![Page 95: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/95.jpg)
What inductive bias does a neural component entail?
Locality Bias
Topological Structure
Local Non-local
Seq. Tree Graph
?
![Page 96: CS11-747 Neural Networks for NLP Convolutional …...Outline 1. Feature Combinations 2. CNNs and Key Concepts 3. Case Study on Sentiment Classification 4. CNN Variants and Applications](https://reader033.vdocuments.net/reader033/viewer/2022060223/5f07eb8e7e708231d41f6bda/html5/thumbnails/96.jpg)
Questions?