dcnn for text
TRANSCRIPT
A CNN for modeling Sentences
Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences."arXiv:1404.2188 (2014).
Sentence model
• Sentence -> feature vector, that’s all !• However, it is the core of:• Sentiment analysis, paraphrase detection,
entailment recognition, summarisation, discourse analysis, machine translation, grounded language learning, image retrieval …
How to model a sentence?
• Composition based method• Need human knowledge to compose
• Automatically extracted logical forms• Ex. RNN, TDNN
Brief network structure
• Interleaving k-max pooling & 1-dim-conv. + TDNN => generate a sentence graph
A kind of syntax tree ?
NN sentence model with syntax tree(Recursive NN, RecNN)
Reference syntax treewhile training
Share weightand stack up to form the network
K-max pooling
• Given k, no matter how many dimension an input get, pool the top-k ones as output, “the order of output corresponds to their input”
• Better than max-TDNN by:– Preserve the order of features– Discern more finely how high activated feature
react• Guarantee the length of input to FC
independent of sentence length
Only fully connected need fix length
• Intermediate layers can be more flexible• Dynamic k-max Pooling !
Dynamic k-max Pooling
• K is a function of length of the input sentence and depth of the network
The k of currently concerned layer
Fixed k-max pooling’s k at the top
Total # of conv. in the network ( the depth)
Input sentence length
Folding
• Feature detectors in different rows are independent of each other until the top fully connected layer
• Simply do vector sum
Properties
• Sensitive to the order of words• Filters of the first layer model n-grams, n ≤m• Properties invariance of absolute position
captured by upper layer convs.• Induce feature graph property
A CNN for matching nature language sentences
Hu, Baotian, et al. "Convolutional neural network architectures for matching natural language sentences." Advances in Neural Information Processing Systems. 2014
Contribution
• Hierarchical sentence modeling
• The capturing of rich matching patterns at different levels of abstraction
A trick on zero-padding
• The variable length of sentence may be in a fairly broad range
• Introduce gate operation
• g(z) = <0> while z = <0>, otherwise, <1>• No bias !
RNN vs ConvNet
ConvNet RNN
Hierarchical structure
W L
Parallelism W L
Capture far away information
- -
Explainable W L
Variety L W
Architecture-I
• Drawback: in forward phase, the representation of each sentenceIs built without knowledge of each other
Architecture-II
• Build directly on the interaction space between 2 sentences• From 1D to 2D convolution
Good trick at pooling
Zhang, Xiang, and Yann LeCun. "Text Understanding from Scratch." arXiv preprint arXiv:1502.01710 (2015)
Text Understanding from Scratch
The model
character encoding spaceNot encoded character or space=> All-zero vector
Fixed length window
H e l l o w o r l
What about various input length?
• Set to the longest sentence we are going to see (1014 character used in their experiments)
Data augmentation - Thesaurus
• Thesaurus: “a book that lists words in groups of synonyms and related concepts”
• http://www.libreoffice.org/
Comparison models
• Bag-of-word: 5000 most freq. words
• Bag-of-centroids: 5000-means word vectors on Google News corpus
Amazon review sentiment analysis
• 1~5 indicating user’s subjective rating of a product.
• Collected by SNAP project
News Categorization in Chinese
• SogouCA and SogouCS• pypinyin package + jieba Chinese
segmentation system