integrate logical background knowledge adversarially...

Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge

Pasquale Minervini, Sebastian Riedel

Presented by: Tiantian Feng

ContributionsThe author explored the usage of adversarial examples to:

1. Identifying cases where models violate existing background knowledge, expressed in the form of logic rules.

2. Training models that are robust to such violations

What is NLI (Natural Language Inference)?In NLI, an input is presented with two sentences, a premise p and a hypothesis h with possible relationships:

1. Entailment – h is definitely true given p (p entails h)

2. Contradiction – h is definitely not true given p (p contradicts h)

3. Neutral – h might be true given p

NLI model is asked to classify the relationship between p and h.

An example of NLI

The fig is referenced from: Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning, “A large annotated corpus for learning natural language inference”, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Function given parameter : is a model dependent score function

NLI Model - BackgroundInput sentence a and b with length and belong to sentence set , with word embedding of size k:

Sentence a and b can be encoded as:

Conditional probability distribution over all three classes is computed using:

NLI Model - Background

State-of-art models:

1. cBiLSTM (Rocktäschel et al., 2016)2. Decomposable Attention Model (DAM) (Parikh et al., 2016)3. Enhanced LSTM model (ESIM) (Chen et al., 2017)

Evaluation datasets:

1. Stanford Natural Language Inference (SNLI) (Bowman et al., 2015)2. MultiNLI (Williams et al., 2017)

Training using cross entropy loss:

Background Knowledge

In this paper, background knowledge is defined as a set of First-Order Logic (FOL) rules, each having the following body ⇒ head form:

Background Knowledge ViolationViolation Example to First-Order Logic rules:

Consider , , a violation is, when according to NLI model,

1. Sentence contradict

2. Sentence didn’t contradict

Background Knowledge - Inconsistency LossTo measure the degree of Rule 2 violation, we define inconsistency loss as:

is a substitution set that maps the variables and in Rule 2 to the sentences and , , specifies the conditional probability that contradict to

Background Knowledge - Inconsistency LossWe can generalize the inconsistency loss as:

To compute the conditional probability of body with multiple conjunction atoms in Rule 5, the author applied Gödel t-norm (Gupta and Qi, 1991):

Generating Adversarial Examples - ConstrainingAdversarial example based only on inconsistency loss can lead NLI models to violate available background knowledge, but they may not be well-formed and meaningful.

Solution: Constraining the perplexity of generated sentences

Generating Adversarial Examples - SummaryThe search objective for adversarial examples can be formalised by the following optimisation problem:

The goal of the optimisation problem:

1. Maximise the inconsistency loss described in Eq. (4)

2. Composed sentences with a low perplexity

Generating Adversarial Examples With Low PerplexityGenerating low-perplexity sentences set S :

1. Sample sentences close to the data manifold (i.e. with a low perplexity)

2. Make small variations to the sentences

a. Change one word in one of the input sentences

b. Remove one parse subtree from one of the input sentences

c. Insert one parse sub-tree from one sentence in the parse tree of existing

sentences.

Adversarial RegularisationInstead of minimizing

We use the adversarial examples to regularise the training process, and we have:

Specifies the trade of between data loss and the inconsistency loss, measured using substitution set S

ensures the perplexity of sentences is low,

Adversarial RegularisationThe paper solved the optimisation problem using Mini-Batch Gradient Descent

Experiment - Background Knowledge ViolationsThe results showed violations on SNLI dataset, yield by original presented works using cBiLSTM, DAM, ESIM.

Observations:

The model tends to detect entailment relationships between longer (i.e., possibly more specific) and shorter (i.e., possibly more general) sentences.

Experiment - Adversarial Regularisation

The author didn’t specify which lamda produces the accuracy prediction in this table.

Experiment - Adversarial Regularisation

Experiment - Generating Adversarial ExamplesTo further validate the robustness of Adversarial Regulizing, the author crafted a series of datasets for evaluation.

is the generated dataset, where m identifies model in selecting sentences, and k denote number samples in generated dataset. is the original dataset.

Experiment - Generating Adversarial Examples, Continue

For each in ,we consider substitution sets:

The summed consistency loss is :

2. For each selected sentence , , we create and

3. We added and , where is known and is annotated by annotators

1. We rank the loss of each sentence , and select top k instances with highest summed consistency loss

Experiment - Generating Adversarial Examples, Continue

Conclusions1. Results showed that the proposed method consistently yields significant increases

to the predictive accuracy on adversarially-crafted datasets – up to a 79.6% relative

improvement

2. Drastically reducing the number of background knowledge violations

3. Adversarial examples transfer across model architectures, and the proposed

adversarial training procedure produces generally more robust models.

References1. Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, and Phil Blunsom. 2016. Reasoning about

entailment with neural attention. In International Conference on Learning Representations (ICLR).

2. Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural

language inference. In (Su et al., 2016), pages 2249–2255.

3. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language

inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pages

1657–1668. Association for Computational Linguistics.

4. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for

learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language

Processing, EMNLP 2015, pages 632–642. The Association for Computational Linguistics.

5. Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2017. A broad-coverage challenge corpus for sentence

understanding through inference. CoRR, abs/1704.05426.

6. M. M. Gupta and J. Qi. 1991. Theory of t-norms and fuzzy inference methods. Fuzzy Sets Syst., 40(3):431–450.

EndThank you!

integrate logical background knowledge adversarially...

Documents