modern deep learning through bayesian...

123
Modern Deep Learning through Bayesian Eyes Yarin Gal [email protected] To keep things interesting, a photo or an equation in every slide! (unless specified otherwise, photos are either original work or taken from Wikimedia, under Creative Commons license)

Upload: nguyenkhanh

Post on 13-Mar-2018

244 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Modern Deep Learning through Bayesian Eyes

Yarin Gal

[email protected]

To keep things interesting, a photo or an equation in every slide! (unless specified otherwise, photos are eitheroriginal work or taken from Wikimedia, under Creative Commons license)

Page 2: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Modern deep learning

Conceptually simplemodels...

I Attracts tremendous attentionfrom popular media,

I Fundamentally affected the wayML is used in industry,

I Driven by pragmaticdevelopments...

I of tractable models...

I that work well...

I and scale well.

2 of 52

Page 3: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

But many unanswered questions...I Why does my model work

We don’t understand many of the tools that we use...E.g. stochastic reg. techniques (dropout, MGN1) are used in mostdeep learning models to avoid over-fitting. Why do they work?

I What does my model know?

I Why does my model predict this and not that?1Wager et al. (2013) and Baldi and Sadowski (2013) attempt to explain dropout

as sparse regularisation but cannot generalise to other techniques.3 of 52

Page 4: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

But many unanswered questions...I Why does my model work

I What does my model know?

We can’t tell whether our models are certain or not...E.g. what would be the CO2 concentration level in Mauna Loa,Hawaii, in 20 years’ time?

I Why does my model predict this and not that?

3 of 52

Page 5: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

But many unanswered questions...I Why does my model work

I What does my model know?

I Why does my model predict this and not that?

Our models are black boxes and not interpretable...Physicians and others need to understand why a model predicts anoutput.

3 of 52

Page 6: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

But many unanswered questions...I Why does my model work

I What does my model know?

I Why does my model predict this and not that?

Surprisingly, we can use Bayesian modelling to answer thequestions above

3 of 52

Page 7: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?

I What does my model know?

I Why does my model predict this and not that, and other openproblems

I Conclusions

4 of 52

Page 8: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?I Bayesian modelling and neural networks

I Modern deep learning as approximate inference

I Real-world implications

I What does my model know?

I Why does my model predict this and not that, and other openproblems

I Conclusions

4 of 52

Page 9: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Bayesian modelling and inferenceI Observed inputs X = {xi}Ni=1 and outputs Y = {yi}Ni=1

I Capture stochastic process believed to have generated outputs

I Def. ω model parameters as r.v.

I Prior dist. over ω: p(ω)

I Likelihood: p(Y|ω,X)

I Posterior: p(ω|X,Y) = p(Y|ω,X)p(ω)p(Y|X) (Bayes’ theorem)

I Predictive distribution given new input x∗

p(y∗|x∗,X,Y) =∫

p(y∗|x∗,ω) p(ω|X,Y)︸ ︷︷ ︸posterior

I But... p(ω|X,Y) is often intractable

5 of 52

Page 10: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Bayesian modelling and inferenceI Observed inputs X = {xi}Ni=1 and outputs Y = {yi}Ni=1

I Capture stochastic process believed to have generated outputs

I Def. ω model parameters as r.v.

I Prior dist. over ω: p(ω)

I Likelihood: p(Y|ω,X)

I Posterior: p(ω|X,Y) = p(Y|ω,X)p(ω)p(Y|X) (Bayes’ theorem)

I Predictive distribution given new input x∗

p(y∗|x∗,X,Y) =∫

p(y∗|x∗,ω) p(ω|X,Y)︸ ︷︷ ︸posterior

I But... p(ω|X,Y) is often intractable

5 of 52

Page 11: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Bayesian modelling and inferenceI Observed inputs X = {xi}Ni=1 and outputs Y = {yi}Ni=1

I Capture stochastic process believed to have generated outputs

I Def. ω model parameters as r.v.

I Prior dist. over ω: p(ω)

I Likelihood: p(Y|ω,X)

I Posterior: p(ω|X,Y) = p(Y|ω,X)p(ω)p(Y|X) (Bayes’ theorem)

I Predictive distribution given new input x∗

p(y∗|x∗,X,Y) =∫

p(y∗|x∗,ω) p(ω|X,Y)︸ ︷︷ ︸posterior

I But... p(ω|X,Y) is often intractable

5 of 52

Page 12: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Approximate inferenceI Approximate p(ω|X,Y) with simple dist. qθ(ω)

I Minimise divergence from posterior w.r.t. θ

KL(qθ(ω) || p(ω|X,Y))

6 of 52

qθ1(ω)

p(ω|X,Y)

Page 13: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Approximate inferenceI Approximate p(ω|X,Y) with simple dist. qθ(ω)

I Minimise divergence from posterior w.r.t. θ

KL(qθ(ω) || p(ω|X,Y))

6 of 52

qθ2(ω)

p(ω|X,Y)

Page 14: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Approximate inferenceI Approximate p(ω|X,Y) with simple dist. qθ(ω)

I Minimise divergence from posterior w.r.t. θ

KL(qθ(ω) || p(ω|X,Y))

6 of 52

qθ3(ω)

p(ω|X,Y)

Page 15: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Approximate inferenceI Approximate p(ω|X,Y) with simple dist. qθ(ω)

I Minimise divergence from posterior w.r.t. θ

KL(qθ(ω) || p(ω|X,Y))

6 of 52

qθ4(ω)

p(ω|X,Y)

Page 16: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Approximate inferenceI Approximate p(ω|X,Y) with simple dist. qθ(ω)

I Minimise divergence from posterior w.r.t. θ

KL(qθ(ω) || p(ω|X,Y))

6 of 52

qθ5(ω) p(ω|X,Y)

Page 17: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Approximate inference

I Approximate p(ω|X,Y) with simple dist. qθ(ω)

I Minimise divergence from posterior w.r.t. θ

KL(qθ(ω) || p(ω|X,Y))

I Identical to minimising

LVI(θ) := −∫

qθ(ω) log

likelihood︷ ︸︸ ︷p(Y|X,ω)dω + KL(qθ(ω)||

prior︷ ︸︸ ︷p(ω))

I We can approximate the predictive distribution

qθ(y∗|x∗) =∫

p(y∗|x∗,ω)qθ(ω)dω.

6 of 52

Page 18: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

What to this and deep learning?We’ll look at dropout specifically:

I Used in most modern deep learning models

I It somehow circumvents over-fitting

I And improves performance

With Bayesian modelling we can explain why

7 of 52

Page 19: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

The link —Bayesian neural networks

I Place prior p(Wi):Wi ∼ N (0, I)

for i ≤ L (and write ω := {Wi}Li=1).

I Output is a r.v. f(x,ω

)= WLσ

(...W2σ

(W1x + b1

)...).

I Softmax likelihood for class.: p(y |x,ω

)= softmax

(f(x,ω

))or a Gaussian for regression: p

(y|x,ω

)= N

(y; f(x,ω

), τ−1I

).

I But difficult to evaluate posteriorp(ω|X,Y

).

Many have tried...

8 of 52

Page 20: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

The link —

Bayesian neural networksI Place prior p(Wi):

Wi ∼ N (0, I)

for i ≤ L (and write ω := {Wi}Li=1).

I Output is a r.v. f(x,ω

)= WLσ

(...W2σ

(W1x + b1

)...).

I Softmax likelihood for class.: p(y |x,ω

)= softmax

(f(x,ω

))or a Gaussian for regression: p

(y|x,ω

)= N

(y; f(x,ω

), τ−1I

).

I But difficult to evaluate posteriorp(ω|X,Y

).

Many have tried...

8 of 52

Page 21: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

The link —

Bayesian neural networksI Place prior p(Wi):

Wi ∼ N (0, I)

for i ≤ L (and write ω := {Wi}Li=1).

I Output is a r.v. f(x,ω

)= WLσ

(...W2σ

(W1x + b1

)...).

I Softmax likelihood for class.: p(y |x,ω

)= softmax

(f(x,ω

))or a Gaussian for regression: p

(y|x,ω

)= N

(y; f(x,ω

), τ−1I

).

I But difficult to evaluate posteriorp(ω|X,Y

).

Many have tried...

8 of 52

Page 22: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

The link —

Bayesian neural networksI Place prior p(Wi):

Wi ∼ N (0, I)

for i ≤ L (and write ω := {Wi}Li=1).

I Output is a r.v. f(x,ω

)= WLσ

(...W2σ

(W1x + b1

)...).

I Softmax likelihood for class.: p(y |x,ω

)= softmax

(f(x,ω

))or a Gaussian for regression: p

(y|x,ω

)= N

(y; f(x,ω

), τ−1I

).

I But difficult to evaluate posteriorp(ω|X,Y

).

Many have tried...

8 of 52

Page 23: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Long history1

I Denker, Schwartz, Wittner, Solla, Howard, Jackel, Hopfield (1987)

I Denker and LeCun (1991)

I MacKay (1992)

I Hinton and van Camp (1993)

I Neal (1995)

I Barber and Bishop (1998)

And more recently...I Graves (2011)

I Blundell, Cornebise, Kavukcuoglu, and Wierstra (2015)

I Hernandez-Lobato and Adam (2015)

But we don’t use these... do we?1Complete references at end of slides

9 of 52

Page 24: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?I Bayesian modelling and neural networks

I Modern deep learning as approximate inference

I Real-world implications

I What does my model know?

I Why does my model predict this and not that, and other openproblems

I Conclusions

10 of 52

Page 25: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Approximate inference in Bayesian NNs

I Def qθ(ω)

to approximate posterior p(ω|X,Y

)I KL divergence to minimise:

KL(qθ(ω)|| p(ω|X,Y

))∝ −

∫qθ(ω)

log p(Y|X,ω

)dω + KL

(qθ(ω)|| p(ω))

=: L(θ)

I Approximate the integral with MC integration ω ∼ qθ(ω):

L(θ) := − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

11 of 52

Page 26: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Approximate inference in Bayesian NNs

I Def qθ(ω)

to approximate posterior p(ω|X,Y

)I KL divergence to minimise:

KL(qθ(ω)|| p(ω|X,Y

))∝ −

∫qθ(ω)

log p(Y|X,ω

)dω + KL

(qθ(ω)|| p(ω))

=: L(θ)

I Approximate the integral with MC integration ω ∼ qθ(ω):

L(θ) := − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

11 of 52

Page 27: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Approximate inference in Bayesian NNs

I Def qθ(ω)

to approximate posterior p(ω|X,Y

)I KL divergence to minimise:

KL(qθ(ω)|| p(ω|X,Y

))∝ −

∫qθ(ω)

log p(Y|X,ω

)dω + KL

(qθ(ω)|| p(ω))

=: L(θ)

I Approximate the integral with MC integration ω ∼ qθ(ω):

L(θ) := − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

11 of 52

Page 28: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Stochastic approx. inference in Bayesian NNs

I Unbiased estimator:

Eω∼qθ(ω)

(L(θ)

)= L(θ)

I Converges to the same optima as L(θ)

I For inference, repeat:I Sample ω ∼ qθ(ω)

I And minimise (one step)

L(θ) = − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

w.r.t. θ.

12 of 52

Page 29: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Stochastic approx. inference in Bayesian NNs

I Unbiased estimator:

Eω∼qθ(ω)

(L(θ)

)= L(θ)

I Converges to the same optima as L(θ)

I For inference, repeat:I Sample ω ∼ qθ(ω)

I And minimise (one step)

L(θ) = − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

w.r.t. θ.

12 of 52

Page 30: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Stochastic approx. inference in Bayesian NNs

I Unbiased estimator:

Eω∼qθ(ω)

(L(θ)

)= L(θ)

I Converges to the same optima as L(θ)

I For inference, repeat:I Sample ω ∼ qθ(ω)

I And minimise (one step)

L(θ) = − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

w.r.t. θ.

12 of 52

Page 31: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Specifying qθ(·)

I Given zi,j Bernoulli r.v. and variational parameters θ = {Mi}Li=1(set of matrices):

zi,j ∼ Bernoulli(pi) for i = 1, ...,L, j = 1, ...,Ki−1

Wi = Mi · diag([zi,j ]Kij=1)

qθ(ω) =∏

qMi (Wi)

13 of 52

Page 32: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

In summary:

Minimise divergence between qθ(ω) and p(ω|X,Y):

I Repeat:I Sample zi,j ∼ Bernoulli(pi) and set

Wi = Mi · diag([zi,j ]Kij=1)

ω = {Wi}Li=1

I Minimise (one step)

L(θ) = − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

w.r.t. θ = {Mi}Li=1 (set of matrices).

14 of 52

Page 33: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

In summary:

Minimise divergence between qθ(ω) and p(ω|X,Y):

I Repeat:I = Randomly set columns of Mi to zero

I Minimise (one step)

L(θ) = − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

w.r.t. θ = {Mi}Li=1 (set of matrices).

14 of 52

Page 34: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

In summary:

Minimise divergence between qθ(ω) and p(ω|X,Y):

I Repeat:I = Randomly set units of the network to zero

I Minimise (one step)

L(θ) = − log p(Y|X, ω

)+ KL

(qθ(ω)|| p(ω))

w.r.t. θ = {Mi}Li=1 (set of matrices).

14 of 52

Page 35: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep learning as approx. inference

Sounds familiar?2

L(θ) =

= loss︷ ︸︸ ︷− log p

(Y|X, ω

)+

= L2 reg︷ ︸︸ ︷KL(qθ(ω)|| p(ω))

2For more details see appendix of Gal and Ghahramani (2015) –yarin.co/dropout

15 of 52

Page 36: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why does my model work?

Now we can answer: “Why does dropout work?”

I It adds noise

I Sexual reproduction3

I Because it approximately integrates over model parameters

I The noise is a side-effect of approx. integration

I Explains model over specification, “adaptive model capacity”

I We fit the process that generated our data

3Srivastava et al. (2014)16 of 52

Page 37: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why does my model work?

Now we can answer: “Why does dropout work?”

I It adds noise

I Sexual reproduction3

I Because it approximately integrates over model parameters

I The noise is a side-effect of approx. integration

I Explains model over specification, “adaptive model capacity”

I We fit the process that generated our data

3Srivastava et al. (2014)16 of 52

Page 38: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why does my model work?

Now we can answer: “Why does dropout work?”

I It adds noise

I Sexual reproduction3

I Because it approximately integrates over model parameters

I The noise is a side-effect of approx. integration

I Explains model over specification, “adaptive model capacity”

I We fit the process that generated our data

3Srivastava et al. (2014)16 of 52

Page 39: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why does my model work?

Now we can answer: “Why does dropout work?”

I It adds noise

I Sexual reproduction3

I Because it approximately integrates over model parameters

I The noise is a side-effect of approx. integration

I Explains model over specification, “adaptive model capacity”

I We fit the process that generated our data

3Srivastava et al. (2014)16 of 52

Page 40: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why does my model work?

Now we can answer: “Why does dropout work?”

I It adds noise

I Sexual reproduction3

I Because it approximately integrates over model parameters

I The noise is a side-effect of approx. integration

I Explains model over specification, “adaptive model capacity”

I We fit the process that generated our data

3Srivastava et al. (2014)16 of 52

Page 41: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Diving deeper into dropout

I “Why this qθ(·)?”I Bernoullis are cheap

I Dropout at test time ≈ propagate the mean E(Wi) = piMi

I Constrains the weights to near the origin:I Posterior uncertainty decreases with more data

I Var(Wi) = MiMTi (pi − p2

i )

I For fixed pi , to decrease uncertainty must decrease ||Mi ||

I Smallest ||Mi || = strongest reg. at pi = 0.5.

17 of 52

Page 42: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Diving deeper into dropout

I “Why this qθ(·)?”I Bernoullis are cheap

I Dropout at test time ≈ propagate the mean E(Wi) = piMi

I Constrains the weights to near the origin:I Posterior uncertainty decreases with more data

I Var(Wi) = MiMTi (pi − p2

i )

I For fixed pi , to decrease uncertainty must decrease ||Mi ||

I Smallest ||Mi || = strongest reg. at pi = 0.5.

17 of 52

Page 43: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Diving deeper into dropout

I “Why this qθ(·)?”I Bernoullis are cheap

I Dropout at test time ≈ propagate the mean E(Wi) = piMi

I Constrains the weights to near the origin:I Posterior uncertainty decreases with more data

I Var(Wi) = MiMTi (pi − p2

i )

I For fixed pi , to decrease uncertainty must decrease ||Mi ||

I Smallest ||Mi || = strongest reg. at pi = 0.5.

17 of 52

Page 44: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Diving deeper into dropout

I “Why this qθ(·)?”I Bernoullis are cheap

I Dropout at test time ≈ propagate the mean E(Wi) = piMi

I Constrains the weights to near the origin:I Posterior uncertainty decreases with more data

I Var(Wi) = MiMTi (pi − p2

i )

I For fixed pi , to decrease uncertainty must decrease ||Mi ||

I Smallest ||Mi || = strongest reg. at pi = 0.5.

17 of 52

Page 45: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Other stochastic reg. techniques

I Multiplicative Gaussian noise (Srivastava et al. 2014) –

I Multiply network units by N (1,1)

I Same performance as dropoutm

Multiplicative Gaussian noise as approximate inference4

zi,j ∼ N (1,1) for i = 1, ...,L, j = 1, ...,Ki−1

Wi = Mi · diag([zi,j ]Kij=1)

qθ(ω) =∏

qMi (Wi)

Similarly for drop-connect (Wan et al., 2013), hashed neuralnetworks (Chen et al., 2015)

4See Gal and Ghahramani (2015) and Kingma et al. (2015)18 of 52

Page 46: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?I Bayesian modelling and neural networks

I Modern deep learning as approximate inference

I Real-world implications

I What does my model know?

I Why does my model predict this and not that, and other openproblems

I Conclusions

19 of 52

Page 47: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Real world implications

“A theory is worth nothing if you can’t use it to make bettercode.”

– DeadMG Jun 10 ’12, stackexchange

I Better use of dropout

I Model structure selectionI (No time: use Bayesian statistics to understand model

architecture)

20 of 52

Page 48: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

How do we use dropout with convolutional neural networks(convnets)?

Figure : LeNet convnet structure

Image Source: LeCun et al. (1998)21 of 52

Page 49: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

How do we use dropout with convolutional neural networks(convnets)?

Figure : LeNet convnet structure

Image Source: LeCun et al. (1998)21 of 52

Dropout ↓

Page 50: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

How do we use dropout with convolutional neural networks(convnets)?

Figure : LeNet convnet structure

Image Source: LeCun et al. (1998)21 of 52

Dropout ↓No dropout ↓

Page 51: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

Why not use dropout et al. with convolutions?

I It doesn’t work

I Low co-adaptation in convolutions

I Because it’s not used correctlyI Standard dropout averages weights at test time

22 of 52

Page 52: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

Why not use dropout et al. with convolutions?

I It doesn’t work

I Low co-adaptation in convolutions

I Because it’s not used correctlyI Standard dropout averages weights at test time

22 of 52

Page 53: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

Instead, predictive mean, approx. with MC integration:

Eqθ(y∗|x∗)

(y∗)=

1T

T∑t=1

y(x∗, ωt).

with ωt ∼ qθ(ω).

I In practice, average stochastic forward passes through thenetwork (referred to as “MC dropout”).5

I Dropout after convolutions and averaging forward passes =approximate inference in Bayesian convnets.6

5Also suggested in Srivastava et al. (2014) as model averaging.6See yarin.co/bcnn for more details

23 of 52

Page 54: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

Instead, predictive mean, approx. with MC integration:

Eqθ(y∗|x∗)

(y∗)=

1T

T∑t=1

y(x∗, ωt).

with ωt ∼ qθ(ω).

I In practice, average stochastic forward passes through thenetwork (referred to as “MC dropout”).5

I Dropout after convolutions and averaging forward passes =approximate inference in Bayesian convnets.6

5Also suggested in Srivastava et al. (2014) as model averaging.6See yarin.co/bcnn for more details

23 of 52

Page 55: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Better use of dropout

Instead, predictive mean, approx. with MC integration:

Eqθ(y∗|x∗)

(y∗)=

1T

T∑t=1

y(x∗, ωt).

with ωt ∼ qθ(ω).

I In practice, average stochastic forward passes through thenetwork (referred to as “MC dropout”).5

I Dropout after convolutions and averaging forward passes =approximate inference in Bayesian convnets.6

5Also suggested in Srivastava et al. (2014) as model averaging.6See yarin.co/bcnn for more details

23 of 52

Page 56: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Huge improvement (MNIST)

104 105 106 107

Batches

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Err

or

(%)

Standard dropout (lenet-all)

MC dropout (lenet-all)

Standard dropout (lenet-ip)

MC dropout (lenet-ip)

No dropout (lenet-none)

Red: standard LeNet (no dropout)24 of 52

Page 57: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Huge improvement (MNIST)

104 105 106 107

Batches

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Err

or

(%)

Standard dropout (lenet-all)

MC dropout (lenet-all)

Standard dropout (lenet-ip)

MC dropout (lenet-ip)

No dropout (lenet-none)

Green: standard dropout LeNet (dropout at the end)24 of 52

Page 58: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Huge improvement (MNIST)

104 105 106 107

Batches

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Err

or

(%)

Standard dropout (lenet-all)

MC dropout (lenet-all)

Standard dropout (lenet-ip)

MC dropout (lenet-ip)

No dropout (lenet-none)

Dashed blue: Bayesian LeNet (weight averaging – FAIL)24 of 52

Page 59: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Huge improvement (MNIST)

104 105 106 107

Batches

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Err

or

(%)

Standard dropout (lenet-all)

MC dropout (lenet-all)

Standard dropout (lenet-ip)

MC dropout (lenet-ip)

No dropout (lenet-none)

Solid blue: Bayesian LeNet (MC dropout)24 of 52

Page 60: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Huge improvement (MNIST)

104 105 106 107

Batches

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Err

or

(%)

Standard dropout (lenet-all)

MC dropout (lenet-all)

Standard dropout (lenet-ip)

MC dropout (lenet-ip)

No dropout (lenet-none)

24 of 52

Before dropout

Page 61: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Huge improvement (MNIST)

104 105 106 107

Batches

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Err

or

(%)

Standard dropout (lenet-all)

MC dropout (lenet-all)

Standard dropout (lenet-ip)

MC dropout (lenet-ip)

No dropout (lenet-none)

24 of 52

Before dropout

Empirical dropout

Page 62: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Huge improvement (MNIST)

104 105 106 107

Batches

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Err

or

(%)

Standard dropout (lenet-all)

MC dropout (lenet-all)

Standard dropout (lenet-ip)

MC dropout (lenet-ip)

No dropout (lenet-none)

24 of 52

Before dropout

Empirical dropout

Principled dropout

Page 63: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Over-fitting on small dataI Robustness to over-fitting on smaller datasets:

0 200000 400000 600000 800000 1000000

Batches

0.66

0.68

0.70

0.72

0.74

0.76

0.78

0.80

0.82

Err

or

(%)

(a) Entire MNISTStandard dropout convnet

0 200000 400000 600000 800000 1000000

Batches

0.45

0.50

0.55

0.60

0.65

0.70

0.75

Err

or

(%)

(b) Entire MNISTBayesian convnet

0 200000 400000 600000 800000 1000000

Batches

1.00

1.05

1.10

1.15

1.20

1.25

Err

or

(%)

(c) 1/4 of MNISTStandard dropout convnet

0 200000 400000 600000 800000 1000000

Batches

0.75

0.80

0.85

0.90

0.95

1.00

1.05

Err

or

(%)

(d) 1/4 of MNISTBayesian convnet

25 of 52

Page 64: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

State-of-the-art on CIFAR-10

CIFAR Test Error (and Std.)

Model Standard Dropout MC Dropout

NIN 10.43 (Lin et al., 2013) 10.27± 0.05DSN 9.37 (Lee et al., 2014) 9.32± 0.02

Augmented-DSN 7.95 (Lee et al., 2014) 7.71± 0.09

Table : Bayesian techniques (MC dropout) with existing state-of-the-art

26 of 52

Page 65: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Camera pose localisation

I Find the location from which a picture was taken7

I Kendall and Cipolla (2015) show 10–15% improvement onstate-of-the-art with Bayesian convnets

7Figures used with author permission27 of 52

Page 66: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Camera pose localisation

I Find the location from which a picture was taken7

I Kendall and Cipolla (2015) show 10–15% improvement onstate-of-the-art with Bayesian convnets

Localisation accuracy for different error thresholds7Figures used with author permission

27 of 52

Page 67: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?

I What does my model know?I Why should I care about uncertainty?

I How can I get uncertainty in deep learning?

I What does this uncertainty look like?

I Real-world implications

I Why does my model predict this and not that, and other openproblems

I Conclusions28 of 52

Page 68: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why should I care about uncertainty?

I We train a model to recognise dog breeds

I And are given a cat to classify

I What would you want your model to do?

I Similar problems in decision making, physics, life science, etc.8

I For the practitioner: pass inputs with low confidence tospecialised models

I But I already have uncertainty in classification! well... no

I We need to be able to tell what our model knows and what itdoesn’t.9

8Complete references at end of slides9Friendly introduction given in yarin.co/blog

29 of 52

Page 69: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why should I care about uncertainty?

I We train a model to recognise dog breeds

I And are given a cat to classify

I What would you want your model to do?

I Similar problems in decision making, physics, life science, etc.8

I For the practitioner: pass inputs with low confidence tospecialised models

I But I already have uncertainty in classification! well... no

I We need to be able to tell what our model knows and what itdoesn’t.9

8Complete references at end of slides9Friendly introduction given in yarin.co/blog

29 of 52

Page 70: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why should I care about uncertainty?

I We train a model to recognise dog breeds

I And are given a cat to classify

I What would you want your model to do?

I Similar problems in decision making, physics, life science, etc.8

I For the practitioner: pass inputs with low confidence tospecialised models

I But I already have uncertainty in classification! well... no

I We need to be able to tell what our model knows and what itdoesn’t.9

8Complete references at end of slides9Friendly introduction given in yarin.co/blog

29 of 52

Page 71: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why should I care about uncertainty?

I We train a model to recognise dog breeds

I And are given a cat to classify

I What would you want your model to do?

I Similar problems in decision making, physics, life science, etc.8

I For the practitioner: pass inputs with low confidence tospecialised models

I But I already have uncertainty in classification! well... no

I We need to be able to tell what our model knows and what itdoesn’t.9

8Complete references at end of slides9Friendly introduction given in yarin.co/blog

29 of 52

Page 72: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why should I care about uncertainty?

I We train a model to recognise dog breeds

I And are given a cat to classify

I What would you want your model to do?

I Similar problems in decision making, physics, life science, etc.8

I For the practitioner: pass inputs with low confidence tospecialised models

I But I already have uncertainty in classification! well... no

I We need to be able to tell what our model knows and what itdoesn’t.9

8Complete references at end of slides9Friendly introduction given in yarin.co/blog

29 of 52

Page 73: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why should I care about uncertainty?

I We train a model to recognise dog breeds

I And are given a cat to classify

I What would you want your model to do?

I Similar problems in decision making, physics, life science, etc.8

I For the practitioner: pass inputs with low confidence tospecialised models

I But I already have uncertainty in classification! well... no

(a) Softmax input as a function ofdata x: f (x)

(b) Softmax output as a functionof data x: σ(f (x))

I We need to be able to tell what our model knows and what itdoesn’t.9

8Complete references at end of slides9Friendly introduction given in yarin.co/blog

29 of 52

Page 74: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Why should I care about uncertainty?

I We train a model to recognise dog breeds

I And are given a cat to classify

I What would you want your model to do?

I Similar problems in decision making, physics, life science, etc.8

I For the practitioner: pass inputs with low confidence tospecialised models

I But I already have uncertainty in classification! well... no

I We need to be able to tell what our model knows and what itdoesn’t.9

8Complete references at end of slides9Friendly introduction given in yarin.co/blog

29 of 52

Page 75: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

How to get uncertainty in deep learning

I We fit a distribution; Already used its first moment:

E(y∗)=

1T

T∑t=1

y(x∗, ωt)

with ωt ∼ qθ(ω).

I For uncertainty (in regression) look at the second moment:

Var(y∗)

= τ−1I +1T

T∑t=1

y(x∗, ωt)T y(x∗, ωt)− E(y∗)TE(y∗)

I As simple as looking at the sample variance of stochasticforward passes through the network (plus obs. noise).10

10See yarin.co/dropout for more details30 of 52

Page 76: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

How to get uncertainty in deep learning

I We fit a distribution; Already used its first moment:

E(y∗)=

1T

T∑t=1

y(x∗, ωt)

with ωt ∼ qθ(ω).

I For uncertainty (in regression) look at the second moment:

Var(y∗)

= τ−1I +1T

T∑t=1

y(x∗, ωt)T y(x∗, ωt)− E(y∗)TE(y∗)

I As simple as looking at the sample variance of stochasticforward passes through the network (plus obs. noise).10

10See yarin.co/dropout for more details30 of 52

Page 77: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

How to get uncertainty in deep learning

I We fit a distribution; Already used its first moment:

E(y∗)=

1T

T∑t=1

y(x∗, ωt)

with ωt ∼ qθ(ω).

I For uncertainty (in regression) look at the second moment:

Var(y∗)

= τ−1I +1T

T∑t=1

y(x∗, ωt)T y(x∗, ωt)− E(y∗)TE(y∗)

I As simple as looking at the sample variance of stochasticforward passes through the network (plus obs. noise).10

10See yarin.co/dropout for more details30 of 52

Page 78: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?

I What does my model know?I Why should I care about uncertainty?

I How can I get uncertainty in deep learning?

I What does this uncertainty look like?

I Real-world implications

I Why does my model predict this and not that, and other openproblems

I Conclusions31 of 52

Page 79: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

What does this uncertainty look like?

What would be the CO2 concentration level in Mauna Loa,Hawaii, in 20 years’ time?

I Normal dropout (weight averaging, 5 layers, ReLU units):

I Same network, Bayesian perspective:

32 of 52

Page 80: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

What does this uncertainty look like?

What would be the CO2 concentration level in Mauna Loa,Hawaii, in 20 years’ time?

I Normal dropout (weight averaging, 5 layers, ReLU units):

I Same network, Bayesian perspective:

32 of 52

Page 82: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

What does this uncertainty look like?

I How good is our uncertainty estimate?

34 of 52

Page 83: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?

I What does my model know?I Why should I care about uncertainty?

I How can I get uncertainty in deep learning?

I What does this uncertainty look like?

I Real-world implications

I Why does my model predict this and not that, and other openproblems

I Conclusions35 of 52

Page 84: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep Reinforcement Learning

I We have a “Roomba”12

I Penalised −5 for walking into a wall, +10 reward for collectingdirt

I Our environment is stochastic and ever changing

I We want a net to learn what actions to do in different situations

12Code based on Karpathy and authors. github.com/karpathy/convnetjs36 of 52

Page 85: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep Reinforcement Learning

Behavioural policies:I Epsilon-greedy – take random actions with probability ε and

optimal actions otherwise

I Using uncertainty we can learn faster

I Thompson sampling – draw realisation from current beliefover world, choose action with highest value

I In practice: simulate a stochastic forward pass through thedropout network and choose action with highest value

37 of 52

Page 86: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep Reinforcement Learning

Behavioural policies:I Epsilon-greedy – take random actions with probability ε and

optimal actions otherwise

I Using uncertainty we can learn faster

I Thompson sampling – draw realisation from current beliefover world, choose action with highest value

I In practice: simulate a stochastic forward pass through thedropout network and choose action with highest value

37 of 52

Page 87: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep Reinforcement Learning

Behavioural policies:I Epsilon-greedy – take random actions with probability ε and

optimal actions otherwise

I Using uncertainty we can learn faster

I Thompson sampling – draw realisation from current beliefover world, choose action with highest value

I In practice: simulate a stochastic forward pass through thedropout network and choose action with highest value

37 of 52

Page 88: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep Reinforcement Learning

Behavioural policies:I Epsilon-greedy – take random actions with probability ε and

optimal actions otherwise

I Using uncertainty we can learn faster

I Thompson sampling – draw realisation from current beliefover world, choose action with highest value

I In practice: simulate a stochastic forward pass through thedropout network and choose action with highest value

37 of 52

Page 90: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Deep Reinforcement Learning

102 103

Batches

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

Avera

ge r

ew

ard

Thompson sampling (using uncertainty)

Epsilon greedy

Average reward over time (log scale)39 of 52

Page 91: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Camera local. (with uncertainty!)

I Where was a picture taken? (Kendall and Cipolla, 2015)14

I Uncertainty increases as a test photo diverges from trainingdistribution

I Test photos with high uncertainty (strong occlusion fromvehicles, pedestrians or other objects)

I Localisation error correlates with uncertainty14Figures used with author permission

40 of 52

Page 92: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Camera local. (with uncertainty!)

I Where was a picture taken? (Kendall and Cipolla, 2015)14

I Uncertainty increases as a test photo diverges from trainingdistribution

I Test photos with high uncertainty (strong occlusion fromvehicles, pedestrians or other objects)

I Localisation error correlates with uncertainty14Figures used with author permission

40 of 52

Page 93: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Camera local. (with uncertainty!)

I Where was a picture taken? (Kendall and Cipolla, 2015)14

I Uncertainty increases as a test photo diverges from trainingdistribution

I Test photos with high uncertainty (strong occlusion fromvehicles, pedestrians or other objects)

I Localisation error correlates with uncertainty14Figures used with author permission

40 of 52

Page 94: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Camera local. (with uncertainty!)

I Where was a picture taken? (Kendall and Cipolla, 2015)14

I Uncertainty increases as a test photo diverges from trainingdistribution

I Test photos with high uncertainty (strong occlusion fromvehicles, pedestrians or other objects)

I Localisation error correlates with uncertainty

14Figures used with author permission40 of 52

Page 95: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Image segmentationI Scene understanding: what’s in a photo and where? (Kendall,

Badrinarayanan, and Cipolla, 2015)15

15Figures used with author permission41 of 52

Page 96: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: Image segmentationI Scene understanding: what’s in a photo and where? (Kendall,

Badrinarayanan, and Cipolla, 2015)15

15Figures used with author permission41 of 52

Page 97: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: DNA methylationI Angermueller and Stegle (2015) fit a network to predict DNA

methylation – used for gene regulation

I Look at methylation rate of different embryonic stem cells.Uncertainty increases in genomic contexts that are hard topredict (e.g. LMR or H3K27me3)

42 of 52

Page 98: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

RW: DNA methylationI Angermueller and Stegle (2015) fit a network to predict DNA

methylation – used for gene regulation

I Look at methylation rate of different embryonic stem cells.Uncertainty increases in genomic contexts that are hard topredict (e.g. LMR or H3K27me3)

42 of 52

Page 99: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?

I What does my model know?

I Why does my model predict this and not that, and other openproblems

I Conclusions

43 of 52

Page 100: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

What lays ahead

Use the theory to answer many questions: How can we...

I ... build interpretable models?

I ... combine Bayesian techniques & deep models?

I ... practically use deep learning uncertainty in existing models?

I ... extend deep learning in a principled way?

44 of 52

Page 101: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Interpretable models?

I Will you trust a decision made by a black-box?

I Rich literature in interpretable Bayesian models (e.g. Sun(2006), Letham (2014))

I Combine Bayesian and deep models in a principled way?

I Combine Bayesian techniques & deep models?I Unsupervised learning – Bayesian data analysis?

I Bayesian models with complex data? (sequence data, imagedata)

45 of 52

Page 102: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Interpretable models?

I Will you trust a decision made by a black-box?

I Rich literature in interpretable Bayesian models (e.g. Sun(2006), Letham (2014))

I Combine Bayesian and deep models in a principled way?

I Combine Bayesian techniques & deep models?I Unsupervised learning – Bayesian data analysis?

I Bayesian models with complex data? (sequence data, imagedata)

45 of 52

Page 103: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Interpretable models?

I Will you trust a decision made by a black-box?

I Rich literature in interpretable Bayesian models (e.g. Sun(2006), Letham (2014))

I Combine Bayesian and deep models in a principled way?

I Combine Bayesian techniques & deep models?I Unsupervised learning – Bayesian data analysis?

I Bayesian models with complex data? (sequence data, imagedata)

45 of 52

Page 104: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Interpretable models?

I Will you trust a decision made by a black-box?

I Rich literature in interpretable Bayesian models (e.g. Sun(2006), Letham (2014))

I Combine Bayesian and deep models in a principled way?

I Combine Bayesian techniques & deep models?I Unsupervised learning – Bayesian data analysis?

I Bayesian models with complex data? (sequence data, imagedata)

45 of 52

Page 105: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Interpretable models?

I Will you trust a decision made by a black-box?

I Rich literature in interpretable Bayesian models (e.g. Sun(2006), Letham (2014))

I Combine Bayesian and deep models in a principled way?

I Combine Bayesian techniques & deep models?I Unsupervised learning – Bayesian data analysis?

I Bayesian models with complex data? (sequence data, imagedata)

45 of 52

Page 106: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Practical deep learning uncertainty?

I Capture language ambiguity?

Image Source: cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf

I Weight uncertainty for model debugging?

I Principled extensions of deep learning?I Dropout in recurrent networks?

I New appr. distributions = new stochastic reg. techniques?

I Model compression: Wi ∼ discrete distribution w. continuousbase measure?

46 of 52

Page 107: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Practical deep learning uncertainty?

I Capture language ambiguity?

I Weight uncertainty for model debugging?

I Principled extensions of deep learning?I Dropout in recurrent networks?

I New appr. distributions = new stochastic reg. techniques?

I Model compression: Wi ∼ discrete distribution w. continuousbase measure?

46 of 52

Page 108: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Practical deep learning uncertainty?

I Capture language ambiguity?

I Weight uncertainty for model debugging?

I Principled extensions of deep learning?I Dropout in recurrent networks?

Image Source: karpathy.github.io/2015/05/21/rnn-effectiveness

I New appr. distributions = new stochastic reg. techniques?

I Model compression: Wi ∼ discrete distribution w. continuousbase measure?

46 of 52

Page 109: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Practical deep learning uncertainty?

I Capture language ambiguity?

I Weight uncertainty for model debugging?

I Principled extensions of deep learning?I Dropout in recurrent networks?

I New appr. distributions = new stochastic reg. techniques?

qθ(ω) =?

I Model compression: Wi ∼ discrete distribution w. continuousbase measure?

46 of 52

Page 110: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Practical deep learning uncertainty?

I Capture language ambiguity?

I Weight uncertainty for model debugging?

I Principled extensions of deep learning?I Dropout in recurrent networks?

I New appr. distributions = new stochastic reg. techniques?

I Model compression: Wi ∼ discrete distribution w. continuousbase measure?

46 of 52

Page 111: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Many unanswered questions leftI Practical deep learning uncertainty?

I Capture language ambiguity?

I Weight uncertainty for model debugging?

I Principled extensions of deep learning?I Dropout in recurrent networks?

I New appr. distributions = new stochastic reg. techniques?

I Model compression: Wi ∼ discrete distribution w. continuousbase measure?

Work in progress!

46 of 52

Page 112: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Outline

I Many unanswered questions

I Why does my model work?

I What does my model know?

I Why does my model predict this and not that, and other openproblems

I Conclusions

47 of 52

Page 113: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Conclusions

The theory above means that modern deep learning:

I captures stochastic processes underlying observed data

I can use vast Bayesian statistics literature

I can be explained by mathematically rigorous theory

I can be extended in a principled way

I can be combined with Bayesian models / techniques in apractical way (we saw this!)

I has uncertainty estimates built-in (we saw this as well!)

48 of 52

Page 114: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Conclusions

The theory above means that modern deep learning:

I captures stochastic processes underlying observed data

I can use vast Bayesian statistics literature

I can be explained by mathematically rigorous theory

I can be extended in a principled way

I can be combined with Bayesian models / techniques in apractical way (we saw this!)

I has uncertainty estimates built-in (we saw this as well!)

48 of 52

Page 115: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Conclusions

The theory above means that modern deep learning:

I captures stochastic processes underlying observed data

I can use vast Bayesian statistics literature

I can be explained by mathematically rigorous theory

I can be extended in a principled way

I can be combined with Bayesian models / techniques in apractical way (we saw this!)

I has uncertainty estimates built-in (we saw this as well!)

48 of 52

Page 116: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Conclusions

The theory above means that modern deep learning:

I captures stochastic processes underlying observed data

I can use vast Bayesian statistics literature

I can be explained by mathematically rigorous theory

I can be extended in a principled way

I can be combined with Bayesian models / techniques in apractical way (we saw this!)

I has uncertainty estimates built-in (we saw this as well!)

48 of 52

Page 117: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Conclusions

The theory above means that modern deep learning:

I captures stochastic processes underlying observed data

I can use vast Bayesian statistics literature

I can be explained by mathematically rigorous theory

I can be extended in a principled way

I can be combined with Bayesian models / techniques in apractical way (we saw this!)

I has uncertainty estimates built-in (we saw this as well!)

48 of 52

Page 118: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Conclusions

The theory above means that modern deep learning:

I captures stochastic processes underlying observed data

I can use vast Bayesian statistics literature

I can be explained by mathematically rigorous theory

I can be extended in a principled way

I can be combined with Bayesian models / techniques in apractical way (we saw this!)

I has uncertainty estimates built-in (we saw this as well!)

But...

48 of 52

Page 119: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

New horizons

Most exciting is work to come:I Practical uncertainty in deep learning

I Principled extensions to deep learning

I Hybrid deep learning – Bayesian models

and much, much, more.

49 of 52

Page 120: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

New horizons

Most exciting is work to come:I Practical uncertainty in deep learning

I Principled extensions to deep learning

I Hybrid deep learning – Bayesian models

and much, much, more.Thank you for listening.

49 of 52

Page 121: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

ReferencesI Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing

Model Uncertainty in Deep Learning”, arXiv preprint, arXiv:1506.02142 (2015).

I Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Appendix”, arXivpreprint, arXiv:1506.02157 (2015).

I Y Gal, Z Ghahramani, “Bayesian Convolutional Neural Networks with BernoulliApproximate Variational Inference”, arXiv preprint, arXiv:1506.02158 (2015).

I A Kendall, R Cipolla, “Modelling Uncertainty in Deep Learning for CameraRelocalization”, arXiv preprint, arXiv:1509.05909 (2015)

I C Angermueller and O Stegle, “Multi-task deep neural network to predict CpGmethylation profiles from low-coverage sequencing data”, NIPS MLCB workshop(2015).

I JM Hernndez-Lobato, RP Adams, “Probabilistic Backpropagation for ScalableLearning of Bayesian Neural Networks”, ICML (2015).

I DP Kingma, T Salimans, M Welling, “Variational Dropout and the LocalReparameterization Trick”, NIPS (2015).

I DJ Rezende, S Mohamed, D Wierstra, “Stochastic Backpropagation andApproximate Inference in Deep Generative Models”, ICML (2014).

50 of 52

Page 122: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Refs – Bayesian neural networksI Denker, Schwartz, Wittner, Solla, Howard, Jackel, and Hopfield, “Large Automatic

Learning, Rule Extraction, and Generalization”, Complex Systems (1987).

I Tishby, Levin, and Solla, “A statistical approach to learning and generalization inlayered neural networks”, COLT (1989).

I Denker and LeCun, “Transforming neural-net output levels to probabilitydistributions”, NIPS (1991).

I D MacKay, “A practical Bayesian framework for backpropagation networks”,Neural Computation (1992).

I GE Hinton and D van Camp, “Keeping the neural networks simple by minimizingthe description length of the weights”, Computational learning theory (1993).

I R Neal, “Bayesian Learning for Neural Networks”, PhD dissertation (1995).

I D Barber and CM Bishop, “Ensemble learning in Bayesian neural networks”,Computer and Systems Sciences, (1998).

I A Graves, “Practical variational inference for neural networks”, NIPS (2011).

I C Blundell, J Cornebise, K Kavukcuoglu, and D Wierstra, “Weight uncertainty inneural networks”, ICML (2015).

51 of 52

Page 123: Modern Deep Learning through Bayesian Eyesmlg.eng.cam.ac.uk/yarin/PDFs/2015_UCL_Bayesian_Deep_Learning_t… · Modern Deep Learning through Bayesian Eyes ... deep learning models

Refs – importance of being uncertain

I Krzywinski and Altman, “Points of significance: Importance of being uncertain”,Nature Methods (2013).

I Herzog and Ostwald, “Experimental biology: Sometimes Bayesian statistics arebetter”, Nature (2013).

I Nuzzo, “Scientific method: Statistical errors”, Nature (2014).

I Woolston, “Psychology journal bans P values”, Nature (2015).

I Ghahramani, “Probabilistic machine learning and artificial intelligence”, Nature(2015).

52 of 52