integrate logical background knowledge adversarially...

25
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge Pasquale Minervini, Sebastian Riedel Presented by: Tiantian Feng

Upload: others

Post on 24-Apr-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge

Pasquale Minervini, Sebastian Riedel

Presented by: Tiantian Feng

Page 2: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

ContributionsThe author explored the usage of adversarial examples to:

1. Identifying cases where models violate existing background knowledge, expressed in the form of logic rules.

2. Training models that are robust to such violations

Page 3: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

What is NLI (Natural Language Inference)?In NLI, an input is presented with two sentences, a premise p and a hypothesis h with possible relationships:

1. Entailment – h is definitely true given p (p entails h)

2. Contradiction – h is definitely not true given p (p contradicts h)

3. Neutral – h might be true given p

NLI model is asked to classify the relationship between p and h.

Page 4: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

An example of NLI

The fig is referenced from: Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning, “A large annotated corpus for learning natural language inference”, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Page 5: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Function given parameter : is a model dependent score function

NLI Model - BackgroundInput sentence a and b with length and belong to sentence set , with word embedding of size k:

Sentence a and b can be encoded as:

Conditional probability distribution over all three classes is computed using:

Page 6: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

NLI Model - Background

State-of-art models:

1. cBiLSTM (Rocktäschel et al., 2016)2. Decomposable Attention Model (DAM) (Parikh et al., 2016)3. Enhanced LSTM model (ESIM) (Chen et al., 2017)

Evaluation datasets:

1. Stanford Natural Language Inference (SNLI) (Bowman et al., 2015)2. MultiNLI (Williams et al., 2017)

Training using cross entropy loss:

Page 7: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Background Knowledge

In this paper, background knowledge is defined as a set of First-Order Logic (FOL) rules, each having the following body ⇒ head form:

Page 8: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Background Knowledge ViolationViolation Example to First-Order Logic rules:

Consider , , a violation is, when according to NLI model,

1. Sentence contradict

2. Sentence didn’t contradict

Page 9: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Background Knowledge - Inconsistency LossTo measure the degree of Rule 2 violation, we define inconsistency loss as:

is a substitution set that maps the variables and in Rule 2 to the sentences and , , specifies the conditional probability that contradict to

Page 10: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Background Knowledge - Inconsistency LossWe can generalize the inconsistency loss as:

To compute the conditional probability of body with multiple conjunction atoms in Rule 5, the author applied Gödel t-norm (Gupta and Qi, 1991):

Page 11: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Generating Adversarial Examples - ConstrainingAdversarial example based only on inconsistency loss can lead NLI models to violate available background knowledge, but they may not be well-formed and meaningful.

Solution: Constraining the perplexity of generated sentences

Page 12: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Generating Adversarial Examples - SummaryThe search objective for adversarial examples can be formalised by the following optimisation problem:

The goal of the optimisation problem:

1. Maximise the inconsistency loss described in Eq. (4)

2. Composed sentences with a low perplexity

Page 13: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Generating Adversarial Examples With Low PerplexityGenerating low-perplexity sentences set S :

1. Sample sentences close to the data manifold (i.e. with a low perplexity)

2. Make small variations to the sentences

a. Change one word in one of the input sentences

b. Remove one parse subtree from one of the input sentences

c. Insert one parse sub-tree from one sentence in the parse tree of existing

sentences.

Page 14: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Adversarial RegularisationInstead of minimizing

We use the adversarial examples to regularise the training process, and we have:

Specifies the trade of between data loss and the inconsistency loss, measured using substitution set S

ensures the perplexity of sentences is low,

Page 15: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Adversarial RegularisationThe paper solved the optimisation problem using Mini-Batch Gradient Descent

Page 16: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Experiment - Background Knowledge ViolationsThe results showed violations on SNLI dataset, yield by original presented works using cBiLSTM, DAM, ESIM.

Observations:

The model tends to detect entailment relationships between longer (i.e., possibly more specific) and shorter (i.e., possibly more general) sentences.

Page 17: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Experiment - Adversarial Regularisation

The author didn’t specify which lamda produces the accuracy prediction in this table.

Page 18: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher
Page 19: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Experiment - Adversarial Regularisation

Page 20: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Experiment - Generating Adversarial ExamplesTo further validate the robustness of Adversarial Regulizing, the author crafted a series of datasets for evaluation.

is the generated dataset, where m identifies model in selecting sentences, and k denote number samples in generated dataset. is the original dataset.

Page 21: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Experiment - Generating Adversarial Examples, Continue

For each in ,we consider substitution sets:

The summed consistency loss is :

2. For each selected sentence , , we create and

3. We added and , where is known and is annotated by annotators

1. We rank the loss of each sentence , and select top k instances with highest summed consistency loss

Page 22: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Experiment - Generating Adversarial Examples, Continue

Page 23: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

Conclusions1. Results showed that the proposed method consistently yields significant increases

to the predictive accuracy on adversarially-crafted datasets – up to a 79.6% relative

improvement

2. Drastically reducing the number of background knowledge violations

3. Adversarial examples transfer across model architectures, and the proposed

adversarial training procedure produces generally more robust models.

Page 24: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

References1. Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, and Phil Blunsom. 2016. Reasoning about

entailment with neural attention. In International Conference on Learning Representations (ICLR).

2. Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural

language inference. In (Su et al., 2016), pages 2249–2255.

3. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language

inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pages

1657–1668. Association for Computational Linguistics.

4. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for

learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language

Processing, EMNLP 2015, pages 632–642. The Association for Computational Linguistics.

5. Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2017. A broad-coverage challenge corpus for sentence

understanding through inference. CoRR, abs/1704.05426.

6. M. M. Gupta and J. Qi. 1991. Theory of t-norms and fuzzy inference methods. Fuzzy Sets Syst., 40(3):431–450.

Page 25: Integrate Logical Background Knowledge Adversarially …ink-ron.usc.edu/xiangren/ml4know19spring/presentations/... · 2019. 6. 29. · 4. Samuel R. Bowman, Gabor Angeli, Christopher

EndThank you!