adversarial machine learning for security and privacy · adversarial examples timeline:...

48
Adversarial Machine Learning for Security and Privacy Ian Goodfellow, StaResearch Scientist, Google Brain ARO Workshop on Adversarial Machine Learning, 2017-09-14

Upload: others

Post on 21-Sep-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

Adversarial Machine Learning for Security and

PrivacyIan Goodfellow, Staff Research Scientist, Google Brain

ARO Workshop on Adversarial Machine Learning, 2017-09-14

Page 2: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

An overview of a field

• This presentation summarizes the work of many people, not just my own / my collaborators

• Please check out the slides and view this link of extensive references

• The presentation focuses on the concepts, not the history or the inventors

Page 3: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Machine LearningTraditional ML:

optimizationAdversarial ML:

game theory

Minimum Equilibrium

Page 4: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Situations in Machine Learning

Board game player 1

Board game player 2

Samuel’s checkers, AlphaGo

ML algorithm

Page 5: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Situations in Machine Learning

Recognize real data

Generate fake data

GANs

ML algorithm

Page 6: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Situations in Machine Learning

E-mail service Spammer

ML algorithm

Spam detector

Page 7: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Machine learning pipeline

X ✓x

y

Training dataLearning algorithm

Learned parameters

Test input

Test output

Page 8: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Privacy of training data

X ✓ X

Page 9: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Defining (ε, δ)-Differential Privacy

(Abadi 2017)

Page 10: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Private Aggregation of Teacher Ensembles

(Papernot et al 2016)

Page 11: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Training Set Poisoning

x

X ✓ y

Page 12: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

ImageNet poisoning

(Koh and Liang 2017)

Page 13: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial examples

X ✓

x

y

Page 14: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Model theft

X ✓x

y✓

Page 15: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Model theft++

X ✓x

y✓Xx

Page 16: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Advanced models can infer private information

(Youyou et al 2014)

Page 17: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Automated CrowdturfingTemperature Generated Review Text

0.1I love this place! I have been here a few times and have never been disappointed. The serviceis always great and the food is always great. The sta� is always friendly and the food is alwaysgreat. I will de�nitely be back and try some of their other food and service.

0.5 I love this place. I have been going here for years and it is a great place to hang out with friendsand family. I love the food and service. I have never had a bad experience when I am there.

0.7My family and I are huge fans of this place. The sta� is super nice and the food is great. Thechicken is very good and the garlic sauce is perfect. Ice cream topped with fruit is delicioustoo. Highly recommended!

1.0 I had the grilled veggie burger with fries!!!! Ohhhh and taste. Omgggg! Very �avorful! It wasso delicious that I didn’t spell it!!

Table 2: Example of the generated �ve-star fake reviews targeting at a bu�et restaurant in NYC. More examples of generatedreviews are included in the Appendix B.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Dete

ctio

n P

erf

orm

ance

Temperature

Precision

Recall

Figure 5: Performance of linguistic classi�er on detectingmachine-generated reviews.

0

0.1

0.2

0.3

0.4

0.5

0 0.2 0.4 0.6 0.8 1

Pla

gia

rism

Sco

re

Temperature

Machine-generated Review

Real Review

Figure 6: Plagiarism similarity score. Each point shows me-dian, 25th and 75th percentile of the score distribution.

3

3.3

3.6

3.9

4.2

4.5

0 0.2 0.4 0.6 0.8 1

Wo

rd L

en

gth

(#

of

cha

r)

Temperature

Machine-generated Review

Real Review

(a) Average word length (structural feature)

0

2

4

6

8

10

0 0.2 0.4 0.6 0.8 1

Ve

rbs

(%)

Temperature

Machine-generated Review

Real Review

(b) Ratio of verb usage (syntactic feature)

0

4

8

12

16

20

0 0.2 0.4 0.6 0.8 1

Po

sitiv

e W

ord

s (%

)

Temperature

Machine-generated Review

Real Review

(c) Ratio of positive word usage (semantic feature)

Figure 7: Change of linguistic feature values when temperature varies.

and would pass the linguistic �lter. Standard solution is to relyon plagiarism checkers to identify the duplicate or near-duplicatereviews. Given that the RNN model is trained to generate textsimilar to the training set, we examine if the machine-generatedreviews are duplicates or near-duplicates of reviews in the trainingset.

To conduct a plagiarism check, we assume that the serviceprovider has access to a database of reviews used for training theRNN. Next, given a machine-generated review, the service provider

runs a plagiarism check by comparing it with reviews in the data-base. This is a best case scenario for a plagiarism test, and helps usunderstand its potential to detect generated reviews.

We use Winnowing [63], a widely used method to identify dupli-cate or near-duplicate text. For a suspicious text, Winnowing �rstgenerates a set of �ngerprints by applying a hashing function toa set of substrings in the text, and then compares the �ngerprintsbetween the suspicious text and the text in database. Similarity be-tween two reviews is computed using Jaccard Similarity [5] of their�ngerprints generated from Winnowing. The plagiarism similarity

(Yao et al 2017)

Page 18: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Fake News

www.futureoffakenews.com

Page 19: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Machine learning for password guessing

(Melicher et al 2016)

Page 20: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

AI for geopolitics?

Page 21: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Deep Dive on Adversarial Examples

Page 22: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

...solving CAPTCHAS and reading addresses...

...recognizing objects and faces….

(Szegedy et al, 2014)

(Goodfellow et al, 2013)

(Taigmen et al, 2013)

(Goodfellow et al, 2013)

and other tasks...

Since 2013, deep neural networks have matched human performance at...

Page 23: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Examples

Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack

Page 24: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Turning Objects into “Airplanes”

Page 25: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Attacking a Linear Model

Page 26: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Examples from Overfitting

x

x

x

OO

Ox O

Page 27: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Examples from Excessive Linearity

xx

x

O O

O

O

O

x

Page 28: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Modern deep nets are very piecewise linear

Rectified linear unit

Carefully tuned sigmoid

Maxout

LSTM

Google Proprietary

Modern deep nets are very (piecewise) linear

Rectified linear unit

Carefully tuned sigmoid

Maxout

LSTM

Page 29: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Nearly Linear Responses in Practice

Arg

umen

t to

sof

tmax

Page 30: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Small inter-class distancesClean example

Perturbation Corrupted example

All three perturbations have L2 norm 3.96This is actually small. We typically use 7!

Perturbation changes the true class

Random perturbation does not change the class

Perturbation changes the input to “rubbish class”

Page 31: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

The Fast Gradient Sign Method

Page 32: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Maps of Adversarial and Random Cross-Sections

(collaboration with David Warde-Farley and Nicolas Papernot)

Page 33: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Maps of Random Cross-SectionsAdversarial examples

are not noise

(collaboration with David Warde-Farley and Nicolas Papernot)

Page 34: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Estimating the Subspace Dimensionality

(Tramèr et al, 2017)

Page 35: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Wrong almost everywhere

Page 36: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Examples for RL

(Huang et al., 2017)

Page 37: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

RBFs behave more intuitively

Page 38: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Cross-model, cross-dataset generalization

Page 39: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Cross-technique transferability

(Papernot 2016)

Page 40: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Train your own model

Transferability AttackTarget model with unknown weights, machine learning

algorithm, training set; maybe non-differentiable

Substitute model mimicking target

model with known, differentiable function

Adversarial examples

Adversarial crafting against substitute

Deploy adversarial examples against the target; transferability

property results in them succeeding

Page 41: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Enhancing Transfer With Ensembles

(Liu et al, 2016)

Page 42: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Examples in the Human Brain

(Pinna and Gregory, 2002)

These are concentric

circles, not

intertwined spirals.

Page 43: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Adversarial Examples in the Physical World

(Kurakin et al, 2016)

Page 44: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Training on Adversarial Examples

0 50 100 150 200 250 300

Training time (epochs)

10�2

10�1

100

Tes

tm

iscl

assi

fica

tion

rate Train=Clean, Test=Clean

Train=Clean, Test=Adv

Train=Adv, Test=Clean

Train=Adv, Test=Adv

Page 45: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Success on MNIST?• Open challenge to break model trained on

adversarial perturbations initialized with noise

• Even strong, iterative white-box attacks can’t get more than 12% error so far

(Madry et al 2017)

Page 46: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Verification

• Given a seemingly robust model, can we prove that no adversarial examples exist near a given point?

• Yes, but hard to scale to large models (Huang et al 2016, Katz et al 2017)

• What about adversarial near test points that we don’t know to examine ahead of time?

Page 47: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Clever Hans(“Clever Hans,

Clever Algorithms,” Bob Sturm)

Page 48: Adversarial Machine Learning for Security and Privacy · Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against

(Goodfellow 2017)

Get involved!https://www.kaggle.com/c/nips-2017-non-targeted-adversarial-attack

Best defense so far on ImageNet: Ensemble adversarial training,

Tramèr et al 2017

https://github.com/tensorflow/cleverhans