dcnn for text

DCNN for text

B01902004 蔡捷恩

A CNN for modeling Sentences

Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences."arXiv:1404.2188 (2014).

Sentence model

• Sentence -> feature vector, that’s all !• However, it is the core of:• Sentiment analysis, paraphrase detection,

entailment recognition, summarisation, discourse analysis, machine translation, grounded language learning, image retrieval …

contribution

• Does not rely on parse tree• Easily applicable to any language ?

How to model a sentence?

• Composition based method• Need human knowledge to compose

• Automatically extracted logical forms• Ex. RNN, TDNN

Brief network structure

• Interleaving k-max pooling & 1-dim-conv. + TDNN => generate a sentence graph

A kind of syntax tree ?

NN sentence model with syntax tree(Recursive NN, RecNN)

Reference syntax treewhile training

Share weightand stack up to form the network

RNN for sentence modelLinear “structure”

Back to DCNN

• Convolution• TDNN• K-max pooling( Dynamic k-max pooling)

ConvolutionNarrow type, win=5

wide type, win=5 (0-padding)

Max-TDNNGOAL: recognize features independent of time-shift

(i.e. sequence position)

Take a look at DCNN

Need to be optimized during training

If we use Max-TDNN

K-max pooling

• Given k, no matter how many dimension an input get, pool the top-k ones as output, “the order of output corresponds to their input”

• Better than max-TDNN by:– Preserve the order of features– Discern more finely how high activated feature

react• Guarantee the length of input to FC

independent of sentence length

Only fully connected need fix length

• Intermediate layers can be more flexible• Dynamic k-max Pooling !

Dynamic k-max Pooling

• K is a function of length of the input sentence and depth of the network

The k of currently concerned layer

Fixed k-max pooling’s k at the top

Total # of conv. in the network ( the depth)

Input sentence length

Folding

• Feature detectors in different rows are independent of each other until the top fully connected layer

• Simply do vector sum

Properties

• Sensitive to the order of words• Filters of the first layer model n-grams, n ≤m• Properties invariance of absolute position

captured by upper layer convs.• Induce feature graph property

ExperimentsSentiment analysisStanford Sentiment TreebankMovie review, 5 scense, +/- label

Experiments Question type predictionon TREC

Experiments Twitter sentiment dataset, binary label

Experiments

• Visualizing feature detectors

Think about it

• Can this kind of k-max pooling apply to image tasks ?

A CNN for matching nature language sentences

Hu, Baotian, et al. "Convolutional neural network architectures for matching natural language sentences." Advances in Neural Information Processing Systems. 2014

Why convolution approach

• No need prior knowledge

Contribution

• Hierarchical sentence modeling

• The capturing of rich matching patterns at different levels of abstraction

Convolutional Sentence Modeling

Word2vec pre-trained

2-window max poolingFixed input len

A trick on zero-padding

• The variable length of sentence may be in a fairly broad range

• Introduce gate operation

• g(z) = <0> while z = <0>, otherwise, <1>• No bias !

Conv + Max poolComposition

RNN vs ConvNet

ConvNet RNN

Hierarchical structure

W L

Parallelism W L

Capture far away information

- -

Explainable W L

Variety L W

Architecture-I

• Drawback: in forward phase, the representation of each sentenceIs built without knowledge of each other

Architecture-II

• Build directly on the interaction space between 2 sentences• From 1D to 2D convolution

Good trick at pooling

2D max-pooling

Model Generality

• Arc-II subsumes Arc-I as a special case

Cost function

• Large margin objective

e(.)

Experiment – Sentence Completion

Experiment – Matching Response to Tweet

Experiment – Paraphrase Identification

• Determine whether two sentences have the same meaning

Discussion

• Sequence is important

Zhang, Xiang, and Yann LeCun. "Text Understanding from Scratch." arXiv preprint arXiv:1502.01710 (2015)

Text Understanding from Scratch

Contribution

• Character-level input• No OOV• Work for both English and Chinese

The model

character encoding spaceNot encoded character or space=> All-zero vector

Fixed length window

H e l l o w o r l

More detail

What about various input length?

• Set to the longest sentence we are going to see (1014 character used in their experiments)

Data augmentation - Thesaurus

• Thesaurus: “a book that lists words in groups of synonyms and related concepts”

• http://www.libreoffice.org/

Comparison models

• Bag-of-word: 5000 most freq. words

• Bag-of-centroids: 5000-means word vectors on Google News corpus

DBpedia Ontology Classification

Amazon review sentiment analysis

• 1~5 indicating user’s subjective rating of a product.

• Collected by SNAP project

Amazon review sentiment analysis

Yahoo! Answer Topic Classification

News Categorization in English

News Categorization in Chinese

• SogouCA and SogouCS• pypinyin package + jieba Chinese

segmentation system

News Categorization in Chinese

Conclusion

• We can play a lot of trick with Pooling

Thank you

dcnn for text

Technology