interpretability in machine learninginfogan: interpretable representation learning by information...

29
Interpretability in Machine Learning Tameem Adel Machine Learning Group University of Cambridge, UK February 22, 2018 February 22, 2018 1 / 29

Upload: others

Post on 01-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Interpretability in Machine Learning

Tameem Adel

Machine Learning GroupUniversity of Cambridge, UK

February 22, 2018

February 22, 2018 1 / 29

Page 2: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Growth of ML

ML algorithms optimized:

Not only for task performance, e.g. accuracy.

But also other criteria, e.g. safety, fairness, providing the right toexplanation.

There are often trade-offs among these goals.

However,

Accuracy can be quantified.

Not precisely the case for the other criteria.

February 22, 2018 2 / 29

Page 3: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

What is interpretability?

Interpret means to explain or to present in understandable terms.

In the ML context: The ability to explain or to present inunderstandable terms to humans.

What constitutes an explanation? What makes some explanationsbetter than others? How are explanations generated? When areexplanations sought?

Automatic ways to generate and, to some extent, evaluateinterpretability.

February 22, 2018 3 / 29

Page 4: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Taxonomy

Task-related:

Global interpretability: A general understanding of how the system isworking as a whole, and of the patterns present in the data.

Local interpretability: Providing an explanation of a particularprediction or decision.

Method-related (what are the basic units of the explanation?):

Raw features.

Derived features that have some semantic meaning to the expert.

Prototypes.

The nature of the data/tasks should match the type of the explanation.

February 22, 2018 4 / 29

Page 5: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Visualizing Deep Neural Network Decisions:Prediction Difference Analysis

Zintgraf, Cohen, Adel, Welling, ICLR 2017

5 / 29

Page 6: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Idea

Visualize the response of a deep neural network to a specific input.

For an individual classifier prediction, assign each feature a relevancevalue reflecting its contribution towards or against the predicted class.

6 / 29

Page 7: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Visualizing deep networks

Looking under the hood: explaining why a decision was made.

Can help to understand strengths and limitations of a model, help toimprove it [wolves/huskies based on presence/absence of snow].

Important for liability: why does the algorithm decide this patient hasAlzheimer?

Can lead to new insights and theories in poorly understood domains.

7 / 29

Page 8: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Approach

Relevance of a feature xi can be estimated by measuring how theprediction changes if the feature is unknown.

The difference between p(c |x) and p(c |x\i ), where x\i denotes the setof all input features except xi .

But how would a classifier recognize a feature as unknown?

Label the feature as unknown.Retrain the classifier with the feature left out.Marginalize the feature.

8 / 29

Page 9: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Marginalization of a feature

p(c |x\i ) =∑xi

p(xi |x\i )p(c |x\i , xi ) (1)

Assume xi is independent of x\i

p(c|x\i ) ≈∑xi

p(xi )p(c |x\i , xi ) (2)

9 / 29

Page 10: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Weight of evidence

Compare p(c |x\i ) to p(c |x):

odds(c |x) = p(c|x)(1−p(c|x))

WEi (c |x) = log2 (odds(c |x))− log2(odds(c |x\i )

), (3)

A large prediction difference → the feature contributed substantiallyto the classification.

A small prediction difference → the feature was not important for thedecision.

A positive value WEi → the feature has contributed evidence for theclass of interest.

A negative value WEi → the feature displays evidence against theclass.

10 / 29

Page 11: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Conditional sampling

A pixel depends most strongly on a small neighbourhood around it.

The conditional of a pixel given its neighbourhood does not dependon the position of the pixel in the image.

p(xi |x\i ) ≈ p(xi |x̂\i ) (4)

11 / 29

Page 12: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Multivariate Analysis

A neural network is relatively robust to the marginalization of just onefeature.

Remove several features at once

Connected pixels.

patches of size k × k .

12 / 29

Page 13: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Experiments

Conditional sampling

Red: For.

Blue: Against.

13 / 29

Page 14: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Experiments

Multivariate analysis

14 / 29

Page 15: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

MRI data

15 / 29

Page 16: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

MRI data

16 / 29

Page 17: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

Conclusions

A method for visualizing deep neural networks by using a morepowerful conditional, multivariate model.

The visualization method shows which pixels of a specific input imageare evidence for or against a node in the network.

17 / 29

Page 18: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

InfoGAN: Interpretable Representation Learning byInformation Maximizing Generative Adversarial Nets

Chen, Duan, Houthooft, Schulman, Sutskever, Abbeel, NIPS 2016

18 / 29

Page 19: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

19 / 29

Page 20: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

20 / 29

Page 21: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

21 / 29

Page 22: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

22 / 29

Page 23: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

23 / 29

Page 24: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

24 / 29

Page 25: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

25 / 29

Page 26: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

26 / 29

Page 27: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

27 / 29

Page 28: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

28 / 29

Page 29: Interpretability in Machine LearningInfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Chen, Duan, Houthooft, Schulman, Sutskever,

The End

29 / 29