provably robust deep - carnegie mellon school of computer...

39
Provably robust deep learning J. Zico Kolter Carnegie Mellon University and Bosch Center for AI 1 Wooaah...

Upload: others

Post on 06-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Provably robust deep learning

J. Zico Kolter

Carnegie Mellon University and Bosch Center for AI

1

Wooaah...

Page 2: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

OutlineIntroduction

Attacking machine learning algorithms

Defending against adversarial attacks

Final thoughts

2

Page 3: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

OutlineIntroduction

Attacking machine learning algorithms

Defending against adversarial attacks

Final thoughts

3

Page 4: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

The AI breakthrough (some recent history)

4Karras et al., 2018 Radford et al., 2019 Vinyals et al., 2019

Page 5: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

…but the stakes are low

5

??

??

Page 6: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Adversarial attacks

6

Sharif et al., 2016Evtimov et al., 2017

Athalye et al., 2017

Figure from Madry et al.

Page 7: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

… and some recent work

7[Lee and Kolter, 2019], https://arxiv.org/abs/1906.11897

Page 8: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Why should we care?…you probably don’t have an adversary changing inputs to your classifier at a pixel level (or if you do, you have bigger problems)

1. Genuine security implications for deep networks (e.g., with physical attacks)

2. Says something fundamental about the representation of deep classifiers, smooth decision boundaries, sensitivity to distribution shift (within threat model), etc

8

Page 9: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

OutlineIntroduction

Attacking machine learning algorithms

Defending against adversarial attacks

Final thoughts

9

Page 10: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Adversarial attacks as optimization

10

𝐄",$

max(∈∆

Loss 𝑓/(𝑥 + 𝛿), 𝑦

𝐄",$

Loss 𝑓/(𝑥), 𝑦

Page 11: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

The adversarial optimization problemHow do we solve the “inner” optimization problem

max(∈∆

Loss 𝑓/(𝑥 + 𝛿), 𝑦

Key insight: the same process that enabled us to learn the model parameters via gradient descent also allows us to create an adversarial example via gradient descent

𝜕

𝜕𝛿Loss 𝑓

/(𝑥 + 𝛿), 𝑦

11

Page 12: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Solving with projected gradient descentSince we are trying to maximize the loss when creating an adversarial example, we repeatedly move in the direction of the positive gradient

Since we also need to ensure that 𝛿 ∈ Δ, we also project back into this set after each step, a process known as projected gradient descent (PGD)

𝛿 ≔ Proj∆

𝛿 + 𝛼𝜕

𝜕𝛿Loss 𝑓

/𝑥 + 𝛿 , 𝑦

Example: for Δ = {𝛿: 𝛿∞≤ 𝜖} (called the ℓ

∞ball), the projection operator just

clips each coordinate to [−𝜖, 𝜖]

12

Page 13: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

The Fast Gradient Sign MethodThe Fast Gradient Sign Method (FGSM) takes a single PGD step with step size 𝛼 →∞, which corresponds exactly to just taking a step in the signs of the gradient terms

Creates weaker attacks than running full PGD, but substantially faster

13

δ = 0

α∂

∂δGQbb(fθ(x + δ), y)

P∆

Page 14: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Illustration of adversarial examplesWe will demonstrate adversarial attacks on MNIST data set, using two different architectures

14

FC-100FC-10Conv-32x28x28

Conv-32x28x28

Conv-64x14x14

Conv-64x14x14

FC-200FC-10

2-layer fully connected MLP 6 layer ConvNet

Page 15: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Illustrations of FGSM/PGD

15

ConvNet(FGSM):

ConvNet(PDG)

2.9% 1.1%

92.6%

41.7%

96.4%

74.3%

MLP ConvNet

Test Error, epsilon=0.1

Clean FGSM PGD

Page 16: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

OutlineIntroduction

Attacking machine learning algorithms

Defending against adversarial attacks

Final thoughts

16

Page 17: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Adversarial robustness

min/

𝐄",$

Loss 𝑓/(𝑥), 𝑦 ⟹ min

/

𝐄",$

max(∈∆

Loss 𝑓/(𝑥 + 𝛿), 𝑦

1. Adversarial training: Take model SGD steps at (approximate) worst-case perturbations [Goodfellow et al., 2015, Kurakin et al., 2016; Madry et al., 2017]

2. Certified defenses: Provably upper bound inner maximization [Wong and Kolter, 2018; Ragunathan et al., 2018; Mirman et al., 2018; Cohen et al., 2019]

17

“pig”

Page 18: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Adversarial trainingHow do we optimize the objective

min/

",$∈O

max(∈∆

Loss 𝑓/(𝑥 + 𝛿), 𝑦

We would like to solve it with gradient descent, but how do we compute the gradient of the objective with the max term inside?

18

Page 19: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Danskin’s TheoremA fundamental result in optimization:

𝜕

𝜕𝜃max(∈∆

Loss 𝑓/(𝑥 + 𝛿), 𝑦 =

𝜕

𝜕𝜃Loss 𝑓

/(𝑥 + 𝛿

⋆), 𝑦

where 𝛿⋆ = argmax

(∈∆

Loss 𝑓/(𝑥 + 𝛿), 𝑦

Seems “obvious,” but it is a very subtle result; means we can optimize through the max by just finding it’s maximizing value

Note however, it only applies when max is performed exactly

19

Page 20: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Adversarial trainingRepeat

1. Select minibatch 𝐵2. For each 𝑥, 𝑦 ∈ 𝐵, compute

adversarial example 𝛿⋆ 𝑥3. Update parameters

𝜃 ≔ 𝜃 −𝛼

𝐵∑

",$∈T

𝜕

𝜕𝜃Loss 𝑓

/(𝑥 + 𝛿

⋆𝑥 ), 𝑦

Common to also mix robust/standard updates (not done in our case)

20

1.1% 0.9%

41.7%

2.6%

74.4%

2.8%

ConvNet Robust ConvNet

Test Error, epsilon=0.1

Clean FGSM PGD

Page 21: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Evaluating robust modelsOur model looks good, but we should be careful declaring success

Need to evaluate against different attacks, PGD attacks run for longer, with random restarts, etc

Note: it is not particularly informative to evaluate against a different type of attack, e.g. evaluate ℓ

∞robust model against ℓ

1or ℓ

2attacks

21

Page 22: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Adversarial robustness

min/

𝐄",$

Loss 𝑓/(𝑥), 𝑦 ⟹ min

/

𝐄",$

max(∈∆

Loss 𝑓/(𝑥 + 𝛿), 𝑦

1. Adversarial training: Take model SGD steps at (approximate) worst-case perturbations [Goodfellow et al., 2015, Kurakin et al., 2016; Madry et al., 2017]

2. Certified defenses: Provably upper bound inner maximization [Wong and Kolter, 2018; Ragunathan et al., 2018; Mirman et al., 2018; Cohen et al., 2019]

22

“pig”

Page 23: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Provable defenses

max(∈∆

Loss 𝑓/𝑥 + 𝛿 , 𝑦 ≤ max

(∈∆

Loss 𝑓/

rel𝑥 + 𝛿 , 𝑦 ≤ Loss(𝑓

/

dual𝑥,Δ , 𝑦)

23

ℓ u

z

zℓ u

z

z

uz

z

Dual from [Wong and Kolter, 2018], also independently

derived via hybrid zonotope [Mirman et al., 2018] and

forward Lipschitz arguments [Weng et al., 2018]

Maximization problem is now a convex linear program [Wong and Kolter, 2018]

[Wong and Kolter, 2018], https://arxiv.org/abs/1711.00851

Page 24: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Robust optimization: putting it all togetherIn the end, instead of minimizing the traditional loss…

minimize/

_=1

`

ℓ(ℎ/𝑥_, 𝑦

_)

…we just minimize our computed bound on loss, implemented in an auto-differentiation framework (PyTorch), and we get a guaranteed bound on worst-case loss (or error) for any norm-bounded adversarial attack

minimize/

_=1

`

ℓ(𝐽c,/

𝑥_, 𝑦

_) ≥ minimize

/

_=1

`

ℓ(max(∈∆

ℓ ℎ/𝑥_+ 𝛿 , 𝑦

_)

Full code available at https://github.com/locuslab/convex_adversarial

24

Page 25: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

2D Toy ExampleSimple 2D toy problem, 2-100-100-100-2 MLP network, trained with Adam (learning rate = 0.001, no hyperparameter tuning)

25

Standard training Robust convex training

Page 26: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Standard and robust errors on MNIST 𝜖 = 0.1

26

1.10%

17%

1.10%

100%

44%

3.70%0.00%

10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%

100.00%

Standard CNN Robust linear classifier Our method (CNN)

Error Guaranteed robust error bound

Page 27: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

MNIST AttacksWe can also look at how well real attacks perform at 𝜖 = 0.1

27

1.1% 1.1%

50%

2.1%

82%

2.8%

100%

3.7%0.0%

10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%

100.0%

Standard training Our methodNo attack FGSM PGD Robust bound

Page 28: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

What causes adversarial examples?Adversarial examples are caused (informally) by small regions of adversarial class “jutting” into an otherwise “nice” decision region (see also, e.g., [Roth et al., 2019])

28

Data point

Correct classIncorrect class

Page 29: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Randomization as a defense?We can “smooth” this decision region by adding Gaussian noise to the input and picking the majority class of the classifier over this noise

This was proposed (in many different ways) as a heuristic defense, but [Lecuyer et al, 2018] and later [Li et al., 2018] demonstrated that it gives certified bounds; we simplify and tighten this analysis in [Cohen et al., 2019]

29

𝑓(𝑥) 𝑔 𝑥 = argmax

$

𝐏c∼k(0,m

2o)[𝑓 𝑥 + 𝜖 = 𝑦]

Page 30: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Visual intuition of randomized smoothingTo classify panda images, classify a bunch of versions perturbed by random noise, take the majority vote

Note that this requires that our “base” classifier 𝑓 be able to classify noisy images well (in practice, means we also need to train on these noisy images)

30

Page 31: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

The randomized smoothing guaranteeTheorem (binary case):• Given some input 𝑥, let 𝑦 = 𝑔(𝑥) be prediction of the smoothed classifier,

and let 𝑝 > 1/2 be the associated probability of this class under the smoothing distribution

𝑝 = 𝐏c∼k(0,m

2o)𝑓 𝑥 + 𝜖 = 𝑦

• Then 𝑔 𝑥 + 𝛿 = 𝑦 (i.e., smoothed classifier is robust)for any 𝛿 such that

𝛿2≤ 𝜎Φ

−1𝑝

where Φ−1 is the Gaussian inverse CDF

31

Page 32: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Proof of certified robustnessReasonable question: why can performance on random noise tell us anything about performance under adversarial noise?

Proof of theorem (informal): • Suppose I have two points 𝑥 and 𝑥 + 𝛿 and you an adversarial want to craft

a decision boundary for the underlying classifier 𝑓(𝑥) such that:1. 𝑥 is classified one way by smoothed classifier 𝑔(𝑥)2. 𝑥 + 𝛿 is classified differently by smoothed classifier 𝑔(𝑥)

32

Page 33: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

x

x + δ

Proof of certified robustness (cont)

x

x + δ

33

x

x + δ

x

x + δ

x

x + δ

x

x + δ

𝑓(𝑥) 𝑔 𝑥

x

x + δ

x

x + δ

For linear classifier, we can compute ℓ2

distance to worse-case boundary exactly𝑅 = 𝜎Φ

−1𝑝

where 𝑝 is probability of majority class; implies any perturbation with 𝛿2≤ 𝑅

cannot change class label ∎

x

x + δ

R

(Follows from Neyman-Pearson

lemma in hypothesis testing)

See also [Li and Kuelbs 1998]

(thanks Ludwig Schmidt for pointing out reference)

Page 34: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Caveats (a.k.a. the fine print)The procedure here only guarantees robustness for the smoothed classifier 𝑔 not for the underlying classifier 𝑓

The probability 𝑝 of correct classification under smoothing cannot be computed exactly (the exactly convolution of a Gaussian with a neural network is intractable)• In practice, we need to resort to Monte Carlo estimates to compute a lower

bound on 𝑝 and certify the prediction (need a lot of samples to compute certified radius, though much fewer to just compute prediction)

• Bounds hold with high probability over (internal) randomness of sampling

We are certifying a tiny radius compared to noise distribution

34

Page 35: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Comparison to previous SOTA on CIFAR10

35

For identical networks, mostly outperforms previous SOTA for ℓ2

robustness, but also scales to much larger networks (where it uniformly outperforms duality-based approaches)

Page 36: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Performance on ImageNet

36

Example: we can certify smoothed classifier has top-1 accuracy of 37% under anyperturbation with 𝛿

2≤ 1 (in normalized pixels, i.e., RGB values in [0,1])

Page 37: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Future and ongoing workExtension to other perturbation norms besides ℓ

2?

• Seems extremely challenging (possibly impossible under certain assumptions), e.g., can’t do better than naive 𝑑1/2 scaling for ℓ

∞norm

A strange property:• Previous work on LP bounds was extremely specific to neural networks• Smoothing work never uses the fact that base classifier is neural network

My best guess for a way forward: we need to use model information to extract properties of base classifier beyond single probability 𝑝, use these to get better bounds

37

Page 38: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

OutlineIntroduction

Attacking machine learning algorithms

Defending against adversarial attacks

Final thoughts

38

Page 39: Provably robust deep - Carnegie Mellon School of Computer ...cliu6/16-883/robust_deep_learning.pdf · The AI breakthrough (some recent history) 4 Karraset al., 2018 Radford et al.,

Robust artificial intelligenceDeep learning is making amazing strides, but we have a long ways to go before we can build deep learning systems that achieve even ”small” degrees of robustness/adaptability compared to what humans take for granted

Resources:• http://zicokolter.com – Web page with all papers• http://github.com/locuslab – Code associated with all papers• http://adversarial-ml-tutorial.org – Tutorial/code on adversarial robustness• http://locuslab.github.io – Group blog

39