dr. hava siegelmann i2o · baa takes precedence • these slides are meant to provide background...

Guaranteeing AI Robustness against Deception (GARD)Dr. Hava Siegelmann

I2O

Proposers’ Day Brief

6 February 2019

DISTRIBUTION A. Approved for public release: distribution unlimited.


Purpose of this briefing• Discuss program objectives and structure

BAA takes precedence• These slides are meant to provide background and clarification only• Please consult published BAA for final program specifics

Until the deadline for receipt of proposals • Information given to any one proposer must be available to all proposers • The best way to get a question answered is to email it

• Retrieve your answer from the Frequently Asked Questions (FAQs) list via the I2O solicitations website• Note that any question that contains distribution restrictions, such as “company proprietary,” will not be

answered • Questions should be send to [email protected].

Ground Rules

2

3

Deception

https://www.ijn.com/trove-nazi-artifacts-unearthed-argentina/

D-day’s operation Fortitude, Pas de Calais (1944)

http://www.oocities.org/ww2_remembered/1944.html

Reconnaissance tools

http://neuralnetworksanddeeplearning.com/chap5.html


4

Adversarial AI Overview

https://engineering.nyu.edu/news/seeking-new-element-artificial-intelligence-trust

Runtime system

Sensors Learned Model Output

Training Data

Learning Algorithm

Learning system

Poison for backdoor or other attacks

Digital input(>99.9% of literature)

Potential ML attack surfaces

https://blog.openai.com/adversarial-example-research/

Physical input(few cases, no defense)

2013 (Szegedy, et al.) showed how imperceptible (but carefully crafted) perturbations to inputs can cause neural nets to misclassify with high confidence (> 90%)


5

How dangerous is Adversarial-AI?

Wrong ML DetectionOriginal Inputs

(Evtimov et al., UC Berkeley,

2017)

Modified Inputs

?

(Metzen BOSCH ‘17)

?

Confusion for self-driving vehicles

Incorrect object recognition ?

Invisibility ?


6

Physical Attacks


7

State of the art: few physical attacks

(Brown et al., Google, 2017)

Patch:

(Athalye et al., MIT, 2017)

3D Printed Objects:

(Evtimov et al., UC Berkeley, 2017)

Graffiti:

• All physical attacks to date are White Box• No current consideration of resource constraints


(Intel / GTECH 2018)

8

Poisoning Attacks


9

Collision attack

Decision boundary

Base

Target

(Source: Tom Goldstein, UMD, 2018)

close to target in feature space

close to own class in pixel space


10

Collision attack

Decision boundary

Base

Target





Decision boundary

Poison!

11

Collision attack

Decision boundary

Base

Target





Poison!

12

Inject into images

Generate poisoned data

Poisoned Recognition

SystemAdd to

training set

Add glasses

Accessory

Backdoor attack via poisoning


https://cdn2.theweek.co.uk/sites/theweek/files/styles/16x8_544/public/2017/05/wonder-woman-hed-2017.jpg?itok=PzGwVZUH

Image Source: NVIDIA arXiv:1812.04948

(exaggerated for visualization)

(e.g. Chen et al., UCB, 2017)

13

Relevance To Defense


14

Beyond White Box Attacks

Query the target network to generate training set

,Use the data to train a surrogate network

Use White box attacks on to

Black Box Attack Blind Attack

• Attacks that were designed to fool a few ML systems could fool others, without prior query access

• Possible explanation: decision boundaries are aligned due features of data

Adversarial Direction

Random Direction

(Liu et al., Shanghai, Feb 2017)

www.cleverhans.io

Surrogate ModelDefended Model


(a.k.a. transfer attacks)

15

“without the dataset the article is useless”

“okay google browse to evil dot com”

https://nicholas.carlini.com/code/audio_adversarial_examples/

Beyond Images

All physical attacks and audio assume white boxAudio – all manipulations are digitized

Attacks have been adapted to audio

Example: targeted attacks on speech recognition (digital, white-box)


16

Perturbing the input to an RL agent can change its actions

Adversarial Attacks on Reinforcement Learning

• “Policy Induction” Attack – force agent to take specific actions (Behzadan 2017)

• “Enchanting” Attack – drive agent into a particular state (Lin 2017)

Attacks can transfer to agents with unknown policy or training algorithm

White box

Black box(unknown policy)

Black box(unknown algorithm)

corruptedcompromisedDeep RL network

(Huang 2017)


DISTRIBUTION A. Approved for public release: distribution unlimited. 17

1. Sticker dependencyHave always sticker on an airplane X (assume the enemy associates it)Before going to war – remove sticker or also add a sticker of a carrier

2. Patch dominance

3. Combination attack: various attacks at the same time!

Defense Related Dominance (Backdoor) Attacks

(Liu et al., Duke, 2018)

The patch attached to the top left masks the rest of the picture

18

State of the Art



AI Systems are Vulnerable

ImageNet Classification

0

10

20

30

40

50

60

70

80

90

100

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Accuracy (%

)

Challenge Year


Adversarial attacks cause a catastrophic reduction in ML capability

Many defenses have been tried and failed to generalize to new attacks

Adversarial attacks

Top ImageNet finishers

Attack / Defense Cycle

Distillation(Papernot et al., 2016)

Detection(Ma et al., 2018)

(Samangouei et al., 2018)GANs

Single Step attacks(Goodfellow, 2014)

Multi-stage attacks(Kurakin, 2016)

Optimization attacks(Carlini, 2017)

Approximation attacks(Athalye et al, 2018)

(Goodfellow et al., 2015)Adversarial training

Attack Defense

20

AI Adversarial Machine Learning: Interest trend over 5 years

‐20

0

20

40

60

80

100

120

Nov‐13 Apr‐15 Aug‐16 Dec‐17 May‐19

Interest re

lativ

e to pea

k

Google Trend on "Adversarial Machine Learning"

A topic with growing interest; maybe for our adversaries too


21

How to Design Defenses?

Defense has to ensure that learned class boundaries do not allow for adversarial examples

With defense


(Madry et al., MIT, 2018)

original + vicinity


Defining Distance for Adversarial Attacks and Defenses

(Wang et al., NYU 2004)

Images at sameℓ distance

The neighborhoods we really want are sets of humanly indistinguishable images

Most work uses ℓ norms: , ∑ | /

Easy to compute, but poor proxies for human perceptual distance

typically ℓ , ℓ , or ℓ :

Perceptual distance: How distinguishable two points look to humans

Semantic distance: How conceptually “different” two images seem to humans

Norm-based distance

Realistic definsible AI requires hardening systems to these distances

Clearly distinguishable, but semantically equivalent

Distance varies :A = (*-----------)B = (---------*--) C = (----------*-)

23

The GARD Program


24

The GARD Vision

Today’s Defenses (without GARD):• One-off attack-specific defenses• No understanding of failure • No bounds or worst case guarantees

Tomorrow’s Defenses (with GARD):• Defenses that work across many types of attacks• Known failure modes• Bounds and algorithms for defended systems• Reliable testbeds of defended systems

?

0102030405060708090

100

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Accuracy (%

)

Challenge Year


Top Finisher by Year

time

Robustness guaranteesMore accurate

testing

0102030405060708090

100

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Accuracy (%

)

Challenge Year


Top Finisher by Year

time

GARD does NOT aim to develop new attacksDISTRIBUTION A. Approved for public release: distribution unlimited.

25

1. Develop theoretical foundations for defensible ML (Technical Area 1.1). These foundations will include• Metrics for measuring ML vulnerability• Identifying ML properties that enhance system robustness;

2. Create, and empirically test, principled defense algorithms in diverse settings (Technical Area 1.2); and

3. Construct and support a scenario-based evaluation framework to characterize defenses under multiple objectives and threat models such as the physical world and multimodal settings (Technical Area 2).

GARD Objectives and Technical Areas

TA2 Evaluation Framework

Physical Scenarios

Simulated Scenarios

TA1.2 Principled Defenses

TA1.1 Theoretical Foundations

Government Evaluator


26

• Performers may not be selected for both TA1 and TA2

• Abstract and Proposal Submission Options:• TA1.1 & TA1.2• TA1.1 only• TA1.2 only• TA2 only

• This BAA is not soliciting for the Government Evaluator role

TA1 / TA2 Conflict of Interest Avoidance


27

• Encouraged, not mandatory• Saves proposers’ time and expense in case their approach is deemed not to address GARD• Helps to define teams

• Format• 3-5 pages, including the cover sheet and all figures, tables, and charts.• Page count does not include cover sheet, brief bibliography, NSF-style resumes of key personnel (up to 3 per

abstract).

• Contents• Goals and impact• Technical Plan• Team capabilities and management plan• Statement of Work, Cost, and Schedule (all preliminary; 1 page)

Abstract Submissions


28

GARD Program Timeline

Single sensor defenses multi-sensor defenses Multi-modality, active, adaptive defenses


29

Out of Scope

Data theft, privacy, model inversion: GARD is focused on attacks that induce incorrect behavior in ML models by manipulating inputs.

General use of AI in adversarial contexts: These are broad problems in themselves and more appropriate to other programs. Methods from these domains that can be shown relevant to the specific problem of GARD are permissible.

Attacks on military or other government systems, extension to military datasets: GARD is a basic research program exploring the characteristics and limitations of ML methods under general adversarial assumptions.

General noise robustness for ML models: Adversarial inputs that could be ignored as having negligible probability under most noise models may be reliably produced by an attacker.

Generic cybersecurity of ML systems: GARD is concerned with the specific vulnerabilities introduced by the ML model itself. Application of traditional cybersecurity analysis to systems that happen to use ML are better addressed elsewhere.

Methods that focus solely on attack detection: Early identification of attacks may be an effective component of real-time defense in some settings, allowing, for example, the system to close itself to communication and input; however, this does not fully address the problem.

Research entirely confined to MNIST, etc: While small datasets such as MNIST remain valuable for rapid experimentation and exploration of ideas, many adversarial examples results shown on MNIST completely fail to transfer to more relevant datasets. GARD will focus on richer datasets as primary indicators of progress.


30

TA1.1: Theoretical FoundationsFoundational theory of robust generalization in ML

TA1.1 will• Create metrics for robust generalization • Identify key factors of vulnerability and robustness • Distance measures beyond ℓ norms• Affect of resource constraints / economy of defenses within threat scenarios• Impact of multi-modality, active / continual learning• Assess theoretical risk for SOA models, datasets• Model TA1.2 defenses, suggest potential improvements• Suggest metrics / instrumentation to TA2 for more informative evaluation• How to test findings to establish their credibility and contribution to robust ML• Possibly inspiration for robust AI/ML from biological sciences

Adversarial inputs are a symptom of deeper limitations in current ML


31

TA1.2: Principled Defenses

Defense algorithms for existing and new ML systems

TA1.2 will• Develop defenses against published attacks• Possible ideas for new defensible systems inspired by life sciences• Address physical and digital settings, inference and training time attacks• Protect from over-associations used for backdoor attacks• Detect an attack situation in real-time• Expanded defense domains

Tasks (e.g., detection, localization, prediction)Modalities (e.g., Audio, Video), Multiple sensorsActive input collectionMinimize computational constraints Vary knowledge (White-box, Black-box, Blind/Transfer)

Current defenses lag behind attacks and address limited threat scenarios


32

TA1.2 Modalities

Images: classification, detection, segmentation • Phases 2 & 3 will add other multichannel image

modalities (e.g., RGB-depth or IR).All Teams:

Choose 1 or Both:

Optional:

Explore impact on other domains / modalities. e.g.• IR, LIDAR• Novel sensors* (e.g. neuromorphic spiking camera)• Text / NLP• Reinforcement Learning

Government-led Evaluation Scenarios

Performer-selected Evaluation Tasks

Audio: audio classification, speech recognitionAND / OR

Video: action recognition, prediction

(Including physical world scenarios)

* Department of Defense (DoD)-specific sensors are not the focus of GARD


33

TA2: Evaluation Framework

Tools and protocols for rigorous risk assessment for ML

TA2 will:• Develop a testbed for holistic risk evaluation of TA1 defenses in diverse settings defined

by Government-created Evaluation Scenarios. Testbed will cover: training and inference time attacks, digital and physical scenarios, multiple modalities and sensors, active input collection

• Implement baseline defenses to serve as benchmarks• Port SOA attacks from the literature and tune to scenarios• Come with their own metrics, and apply TA1.1 assessments of vulnerability• To the extent possible, make the testbed available as open source

Bad defenses might have proliferated because of bad testing

TA2 is not simply a Software Engineering TA; Understanding of adversarial attacks and defenses is crucial


34

Base Metrics for Defense Effectiveness

Defense Figure of Merit (DFOM): (M-P) / (N-P)

Defense Improvement over Baseline (DIB): (M-P) / (B-P)

100% Accuracy

N – Baseline Accuracy (unattacked, undefended)

M – Accuracy after attack (Proposed defense)

P – Accuracy after attack (undefended)

0% Accuracy

Atta

ck

Defe

nse

Base

line

B – Accuracy after attack (Baseline defense)

Basic Metrics attempt to quantify how GARD defensive measures maintain or improve the success rate of ML when the system is attacked

In each phase, target success will be higher


35

While DFOM and DIBD serve as the primary initial metrics of robustness, GARD considers adversarial robustness as an inherently multi-objective problem. Defenses will also be evaluated using additional metrics, such as:

• Errors during defense and errors during detection• Including Type I, II, and III errors (over-predict, under-predict, and recover from targeted attacks) made at inference

time when an attack detection mechanism is used;• Cost of defense in operation• Cost of defense training• Computational effort

Additional Metrics


36

Scenario-based Evaluation

Meaningful results require evaluations grounded in credible threat scenarios

Example Scenarios:

Blind attacks on object recognition

not_a_pipe

“Over the Air” attacks on speech recognition

• Each Scenario will define• ML task: (e.g. Classification, Detection, Prediction)• Input access: Digital vs Physical• Attack phase: during training and/or inference• Attacker (or defender) knowledge of ML system

(e.g., white box, black box, blind / transfer)• Input modalities (e.g., image, video or audio) and

multi-modal and multi-sensor settings • Attack/defense constraints (e.g., stealth,

computation, accuracy, energy) and corresponding metrics

The Government Evaluator will create Evaluation Scenarios to test the factors most relevant to different security models


37

• Demonstrates knowledge of the state of the art in this area; cites relevant work• Focus on scenario based defenses, realistic (physical and not only white-box),

multiple modalities• Presents strong justification for the proposed approach, not simply intuition• Does not rely entirely on simplistic datasets (e.g., MNIST) or attacks (e.g., FGSM)• Discusses previous accomplishments and work in closely related research areas,

including prior work that will provide a starting point for the proposed research• Presents a clear plan to achieve the project’s goal• Candid discussion of meaningful technical risks and a strategy for mitigating them

Features of a strong proposal


38

• TA1.1• Consider realistic and multiple source attacks• Articulates the key directions of theoretical exploration and justification for this choice• Presents a clear approach for testing the theoretical findings• Clearly indicates what aspects of the work are likely to depend on modality or ML model and what aspects are

independent of such choices• Discusses potential impact of results on TA1.2 and TA2 and plans for fostering this collaboration

• TA1.2• Articulates what is new in your algorithm; why should it work?• Clearly addresses physical world attacks in addition to digital, multiple source attacks, realistic constraints• Specifies the modalities that will be chosen and their particular properties in relation to adversarial AI• Demonstrates a credible approach to flexibly adapting defense methods to different threat models and resource

constraints required by the Evaluation Scenarios• TA2

• Describes prior experience implementing and testing deception attacks/defenses on ML systems• Describes metrics and approaches for testing• Describes your approach to developing the software testbed to support the diversity of the Evaluation Scenarios• Presents a plan for working with the Government Evaluator and TA1 performers throughout the program to

define and develop necessary testbed features

Features of a strong proposal (TA-specific)


www.darpa.mil


dr. hava siegelmann i2o · baa takes precedence • these slides are meant to provide background...

Documents