virtual adversarial training for semi-supervised …virtual adversarial training for semi-supervised...

49
Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee Aharoni BIU NLP summer 2016 reading group

Upload: others

Post on 03-Aug-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for Semi-Supervised Text Classification

Takeru Miyato, Andrew M. Dai, Ian Goodfellow

Slides By Roee Aharoni BIU NLP summer 2016 reading group

Page 2: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Motivation

Page 3: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Outline• Introduction to (virtual) adversarial training

• Virtual adversarial training for text classification

• Experimental Setup

• Results (and some analysis)

• Conclusions

Page 4: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

A basic NN training procedure

Page 5: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Training Example

A basic NN training procedure

Page 6: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Training Example

A basic NN training procedure

Page 7: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Training Example

A basic NN training procedure

Page 8: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Training Example

Prediction

A basic NN training procedure

Page 9: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Training Example

True Label

Prediction

A basic NN training procedure

Page 10: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Training Example

True Label

Prediction

Compute Loss

A basic NN training procedure

Page 11: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Training Example

True Label

Prediction

Compute Loss

Backpropagate loss

A basic NN training procedure

Page 12: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Examples

Page 13: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Examples

That’s problematic. How can we solve it?

Page 14: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Page 15: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

Page 16: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

Page 17: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

Perturbed Example

Page 18: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

Perturbed Example

Page 19: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

Perturbed Example

Page 20: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

Prediction

Perturbed Example

Page 21: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

True Label

Prediction

Perturbed Example

“panda”

Page 22: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

True Label

Prediction

Compute Loss Function

Perturbed Example

“panda”

Page 23: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

Training Example

True Label

Prediction

Compute Loss Function

Backpropagate loss

Perturbed Example

“panda”

Page 24: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Adversarial Training for NN’s

• The general addition to the cost function for adversarial training:

• In practice, take a change depending on the gradient :

Page 25: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training

• Extends adversarial training to the semi-supervised regime

• The key idea - make the output distribution for an original and perturbrated example close to each other

• Enables the use of large amounts of unlabeled data

Page 26: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

Page 27: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

Unlabeled Example

Page 28: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

Unlabeled Example

Perturbed Example

Page 29: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

Unlabeled Example

Perturbed Example

Page 30: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

Unlabeled Example

Perturbed Example

Prediction Distribution Prediction Distribution

Page 31: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

Unlabeled Example

Perturbed Example

Prediction Distribution Prediction Distribution

Measure KL Divergence-based loss

Page 32: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

Unlabeled Example

Backpropagate loss

Perturbed Example

Prediction Distribution Prediction Distribution

Measure KL Divergence-based loss

Page 33: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for NN’s

• The general addition to the cost function for virtual adversarial training:

• Again, in practice, there is an efficient way to approximate this (as detailed in Miyato et. al., 2016)

Page 34: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Model - Adversarial Training for Text Classification

• Adversarial perturbations typically consist of making small modifications to very many real-valued inputs (i.e. pixels in the previous examples)

• For text classification, the input is discrete, and usually represented as a series of high-dimensional one-hot vectors (where such small modifications are impossible).

• Solution: define the perturbation on continuous word embeddings instead of discrete word inputs.

Page 35: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Model - Adversarial Training for Text Classification

• The perturbation is introduced to normalized embeddings to avoid the network from learning to ignore them:

Page 36: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

• As we model the input text as:

• The perturbation is defined as:

• And the addition to the loss function is:

Adversarial Training for Text Classification

=

Page 37: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Virtual Adversarial Training for Text Classification

• Here, the perturbation is defined as:

• And the addition to the loss function is then:

Page 38: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Experimental Settings• 5 datasets:

• Sentiment classification (binary): IMDB, Rotten Tomatoes, Elec

• Topic classification (multiclass): DBpedia, RCV1

Page 39: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

• Treat punctuation as spaces

• Convert words to lower case

• Remove words which appear in only one document

• RCV1 - remove stop words

Experimental Settings - Preprocessing

Page 40: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Pre-Training Tricks and Hyperparams

• Initialize word embeddings and LSTM weights with RNNLM on labeled and unlabeled examples

• Single layer LSTM, 1024 units (512 for BiLSTM)

• Embedding size: 256(IMDB, BiLSTM)/512(Rest)

• Sampled softmax loss with 1024 candidate samples (?)

• Adam optimization, 256 samples per batch

• 0.5 dropout rate on the word embeddings

Page 41: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Classification Model Tricks and Hyperparams

• 1 Hidden layer before softmax, 30(IMDB, Elec, Rotten)/128(Rest) units

• ReLU activation function

• batch size - 64(IMDB, Elec, RCV1)/128(Rest)

• 10k-20k training steps for each model

• Truncated back propagation - stop back propagating after 400 steps

• Generate perturbation after dropout

• Optimize epsilon, dropout rate on validation set

Page 42: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Results - IMDB• Adversarial and virtual adversarial training show lower

negative log-likelihood

• Virtual adversarial training also improves the adversarial training loss

Page 43: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Results - IMDB

Page 44: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Embedding-Based Analysis

Page 45: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Results - Topic classifiction: Elec, RCV1

• Improved SOTA on Elec, RCV1, without using CNN’s

Page 46: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Results - Sentiment analysis: Rotten Tomatoes

• Adversarial+Virtual adv. performs equally to SOTA

• Virtual adversarial is weaker than baseline - could be due to small amount of supervised examples, short sentences

Page 47: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Results - DBpedia• Baseline itself improves over SOTA, virtual

adversarial performs best

Page 48: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Related Work

• Dropout/Random Noise

• Generative Models

• Pre-Training as semi-supervised learning

Page 49: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee

Conclusion

• Adversarial and virtual adversarial training provides good regularization performance for text classification with RNN’s

• Provides SOTA results or on-par results for the examined datasets

• “Improved Quality” of word embeddings