dr. hava siegelmann i2o · baa takes precedence • these slides are meant to provide background...
TRANSCRIPT
Guaranteeing AI Robustness against Deception (GARD)Dr. Hava Siegelmann
I2O
Proposers’ Day Brief
6 February 2019
DISTRIBUTION A. Approved for public release: distribution unlimited.
DISTRIBUTION A. Approved for public release: distribution unlimited.
Purpose of this briefing• Discuss program objectives and structure
BAA takes precedence• These slides are meant to provide background and clarification only• Please consult published BAA for final program specifics
Until the deadline for receipt of proposals • Information given to any one proposer must be available to all proposers • The best way to get a question answered is to email it
• Retrieve your answer from the Frequently Asked Questions (FAQs) list via the I2O solicitations website• Note that any question that contains distribution restrictions, such as “company proprietary,” will not be
answered • Questions should be send to [email protected].
Ground Rules
2
3
Deception
https://www.ijn.com/trove-nazi-artifacts-unearthed-argentina/
D-day’s operation Fortitude, Pas de Calais (1944)
http://www.oocities.org/ww2_remembered/1944.html
Reconnaissance tools
http://neuralnetworksanddeeplearning.com/chap5.html
DISTRIBUTION A. Approved for public release: distribution unlimited.
4
Adversarial AI Overview
https://engineering.nyu.edu/news/seeking-new-element-artificial-intelligence-trust
Runtime system
Sensors Learned Model Output
Training Data
Learning Algorithm
Learning system
Poison for backdoor or other attacks
Digital input(>99.9% of literature)
Potential ML attack surfaces
https://blog.openai.com/adversarial-example-research/
Physical input(few cases, no defense)
2013 (Szegedy, et al.) showed how imperceptible (but carefully crafted) perturbations to inputs can cause neural nets to misclassify with high confidence (> 90%)
DISTRIBUTION A. Approved for public release: distribution unlimited.
5
How dangerous is Adversarial-AI?
Wrong ML DetectionOriginal Inputs
(Evtimov et al., UC Berkeley,
2017)
Modified Inputs
?
(Metzen BOSCH ‘17)
?
Confusion for self-driving vehicles
Incorrect object recognition ?
Invisibility ?
DISTRIBUTION A. Approved for public release: distribution unlimited.
6
Physical Attacks
DISTRIBUTION A. Approved for public release: distribution unlimited.
7
State of the art: few physical attacks
(Brown et al., Google, 2017)
Patch:
(Athalye et al., MIT, 2017)
3D Printed Objects:
(Evtimov et al., UC Berkeley, 2017)
Graffiti:
• All physical attacks to date are White Box• No current consideration of resource constraints
DISTRIBUTION A. Approved for public release: distribution unlimited.
(Intel / GTECH 2018)
8
Poisoning Attacks
DISTRIBUTION A. Approved for public release: distribution unlimited.
9
Collision attack
Decision boundary
Base
Target
(Source: Tom Goldstein, UMD, 2018)
close to target in feature space
close to own class in pixel space
DISTRIBUTION A. Approved for public release: distribution unlimited.
10
Collision attack
Decision boundary
Base
Target
close to target in feature space
close to own class in pixel space
(Source: Tom Goldstein, UMD, 2018)
DISTRIBUTION A. Approved for public release: distribution unlimited.
Decision boundary
Poison!
11
Collision attack
Decision boundary
Base
Target
close to target in feature space
close to own class in pixel space
(Source: Tom Goldstein, UMD, 2018)
DISTRIBUTION A. Approved for public release: distribution unlimited.
Poison!
12
Inject into images
Generate poisoned data
Poisoned Recognition
SystemAdd to
training set
Add glasses
Accessory
Backdoor attack via poisoning
DISTRIBUTION A. Approved for public release: distribution unlimited.
https://cdn2.theweek.co.uk/sites/theweek/files/styles/16x8_544/public/2017/05/wonder-woman-hed-2017.jpg?itok=PzGwVZUH
Image Source: NVIDIA arXiv:1812.04948
(exaggerated for visualization)
(e.g. Chen et al., UCB, 2017)
13
Relevance To Defense
DISTRIBUTION A. Approved for public release: distribution unlimited.
14
Beyond White Box Attacks
Query the target network to generate training set
,Use the data to train a surrogate network
Use White box attacks on to
Black Box Attack Blind Attack
• Attacks that were designed to fool a few ML systems could fool others, without prior query access
• Possible explanation: decision boundaries are aligned due features of data
Adversarial Direction
Random Direction
(Liu et al., Shanghai, Feb 2017)
www.cleverhans.io
Surrogate ModelDefended Model
DISTRIBUTION A. Approved for public release: distribution unlimited.
(a.k.a. transfer attacks)
15
“without the dataset the article is useless”
“okay google browse to evil dot com”
https://nicholas.carlini.com/code/audio_adversarial_examples/
Beyond Images
All physical attacks and audio assume white boxAudio – all manipulations are digitized
Attacks have been adapted to audio
Example: targeted attacks on speech recognition (digital, white-box)
DISTRIBUTION A. Approved for public release: distribution unlimited.
16
Perturbing the input to an RL agent can change its actions
Adversarial Attacks on Reinforcement Learning
• “Policy Induction” Attack – force agent to take specific actions (Behzadan 2017)
• “Enchanting” Attack – drive agent into a particular state (Lin 2017)
Attacks can transfer to agents with unknown policy or training algorithm
White box
Black box(unknown policy)
Black box(unknown algorithm)
corruptedcompromisedDeep RL network
(Huang 2017)
DISTRIBUTION A. Approved for public release: distribution unlimited.
DISTRIBUTION A. Approved for public release: distribution unlimited. 17
1. Sticker dependencyHave always sticker on an airplane X (assume the enemy associates it)Before going to war – remove sticker or also add a sticker of a carrier
2. Patch dominance
3. Combination attack: various attacks at the same time!
Defense Related Dominance (Backdoor) Attacks
(Liu et al., Duke, 2018)
The patch attached to the top left masks the rest of the picture
18
State of the Art
DISTRIBUTION A. Approved for public release: distribution unlimited.
DISTRIBUTION A. Approved for public release: distribution unlimited. 19
AI Systems are Vulnerable
ImageNet Classification
0
10
20
30
40
50
60
70
80
90
100
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Accuracy (%
)
Challenge Year
ImageNet Classification
Adversarial attacks cause a catastrophic reduction in ML capability
Many defenses have been tried and failed to generalize to new attacks
Adversarial attacks
Top ImageNet finishers
Attack / Defense Cycle
Distillation(Papernot et al., 2016)
Detection(Ma et al., 2018)
(Samangouei et al., 2018)GANs
Single Step attacks(Goodfellow, 2014)
Multi-stage attacks(Kurakin, 2016)
Optimization attacks(Carlini, 2017)
Approximation attacks(Athalye et al, 2018)
(Goodfellow et al., 2015)Adversarial training
Attack Defense
20
AI Adversarial Machine Learning: Interest trend over 5 years
‐20
0
20
40
60
80
100
120
Nov‐13 Apr‐15 Aug‐16 Dec‐17 May‐19
Interest re
lativ
e to pea
k
Google Trend on "Adversarial Machine Learning"
A topic with growing interest; maybe for our adversaries too
DISTRIBUTION A. Approved for public release: distribution unlimited.
21
How to Design Defenses?
Defense has to ensure that learned class boundaries do not allow for adversarial examples
With defense
DISTRIBUTION A. Approved for public release: distribution unlimited.
(Madry et al., MIT, 2018)
original + vicinity
DISTRIBUTION A. Approved for public release: distribution unlimited. 22
Defining Distance for Adversarial Attacks and Defenses
(Wang et al., NYU 2004)
Images at sameℓ distance
The neighborhoods we really want are sets of humanly indistinguishable images
Most work uses ℓ norms: , ∑ | /
Easy to compute, but poor proxies for human perceptual distance
typically ℓ , ℓ , or ℓ :
Perceptual distance: How distinguishable two points look to humans
Semantic distance: How conceptually “different” two images seem to humans
Norm-based distance
Realistic definsible AI requires hardening systems to these distances
Clearly distinguishable, but semantically equivalent
Distance varies :A = (*-----------)B = (---------*--) C = (----------*-)
23
The GARD Program
DISTRIBUTION A. Approved for public release: distribution unlimited.
24
The GARD Vision
Today’s Defenses (without GARD):• One-off attack-specific defenses• No understanding of failure • No bounds or worst case guarantees
Tomorrow’s Defenses (with GARD):• Defenses that work across many types of attacks• Known failure modes• Bounds and algorithms for defended systems• Reliable testbeds of defended systems
?
0102030405060708090
100
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Accuracy (%
)
Challenge Year
ImageNet Classification
Top Finisher by Year
time
Robustness guaranteesMore accurate
testing
0102030405060708090
100
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Accuracy (%
)
Challenge Year
ImageNet Classification
Top Finisher by Year
time
GARD does NOT aim to develop new attacksDISTRIBUTION A. Approved for public release: distribution unlimited.
25
1. Develop theoretical foundations for defensible ML (Technical Area 1.1). These foundations will include• Metrics for measuring ML vulnerability• Identifying ML properties that enhance system robustness;
2. Create, and empirically test, principled defense algorithms in diverse settings (Technical Area 1.2); and
3. Construct and support a scenario-based evaluation framework to characterize defenses under multiple objectives and threat models such as the physical world and multimodal settings (Technical Area 2).
GARD Objectives and Technical Areas
TA2 Evaluation Framework
Physical Scenarios
Simulated Scenarios
TA1.2 Principled Defenses
TA1.1 Theoretical Foundations
Government Evaluator
DISTRIBUTION A. Approved for public release: distribution unlimited.
26
• Performers may not be selected for both TA1 and TA2
• Abstract and Proposal Submission Options:• TA1.1 & TA1.2• TA1.1 only• TA1.2 only• TA2 only
• This BAA is not soliciting for the Government Evaluator role
TA1 / TA2 Conflict of Interest Avoidance
DISTRIBUTION A. Approved for public release: distribution unlimited.
27
• Encouraged, not mandatory• Saves proposers’ time and expense in case their approach is deemed not to address GARD• Helps to define teams
• Format• 3-5 pages, including the cover sheet and all figures, tables, and charts.• Page count does not include cover sheet, brief bibliography, NSF-style resumes of key personnel (up to 3 per
abstract).
• Contents• Goals and impact• Technical Plan• Team capabilities and management plan• Statement of Work, Cost, and Schedule (all preliminary; 1 page)
Abstract Submissions
DISTRIBUTION A. Approved for public release: distribution unlimited.
28
GARD Program Timeline
Single sensor defenses multi-sensor defenses Multi-modality, active, adaptive defenses
DISTRIBUTION A. Approved for public release: distribution unlimited.
29
Out of Scope
Data theft, privacy, model inversion: GARD is focused on attacks that induce incorrect behavior in ML models by manipulating inputs.
General use of AI in adversarial contexts: These are broad problems in themselves and more appropriate to other programs. Methods from these domains that can be shown relevant to the specific problem of GARD are permissible.
Attacks on military or other government systems, extension to military datasets: GARD is a basic research program exploring the characteristics and limitations of ML methods under general adversarial assumptions.
General noise robustness for ML models: Adversarial inputs that could be ignored as having negligible probability under most noise models may be reliably produced by an attacker.
Generic cybersecurity of ML systems: GARD is concerned with the specific vulnerabilities introduced by the ML model itself. Application of traditional cybersecurity analysis to systems that happen to use ML are better addressed elsewhere.
Methods that focus solely on attack detection: Early identification of attacks may be an effective component of real-time defense in some settings, allowing, for example, the system to close itself to communication and input; however, this does not fully address the problem.
Research entirely confined to MNIST, etc: While small datasets such as MNIST remain valuable for rapid experimentation and exploration of ideas, many adversarial examples results shown on MNIST completely fail to transfer to more relevant datasets. GARD will focus on richer datasets as primary indicators of progress.
DISTRIBUTION A. Approved for public release: distribution unlimited.
30
TA1.1: Theoretical FoundationsFoundational theory of robust generalization in ML
TA1.1 will• Create metrics for robust generalization • Identify key factors of vulnerability and robustness • Distance measures beyond ℓ norms• Affect of resource constraints / economy of defenses within threat scenarios• Impact of multi-modality, active / continual learning• Assess theoretical risk for SOA models, datasets• Model TA1.2 defenses, suggest potential improvements• Suggest metrics / instrumentation to TA2 for more informative evaluation• How to test findings to establish their credibility and contribution to robust ML• Possibly inspiration for robust AI/ML from biological sciences
Adversarial inputs are a symptom of deeper limitations in current ML
DISTRIBUTION A. Approved for public release: distribution unlimited.
31
TA1.2: Principled Defenses
Defense algorithms for existing and new ML systems
TA1.2 will• Develop defenses against published attacks• Possible ideas for new defensible systems inspired by life sciences• Address physical and digital settings, inference and training time attacks• Protect from over-associations used for backdoor attacks• Detect an attack situation in real-time• Expanded defense domains
Tasks (e.g., detection, localization, prediction)Modalities (e.g., Audio, Video), Multiple sensorsActive input collectionMinimize computational constraints Vary knowledge (White-box, Black-box, Blind/Transfer)
Current defenses lag behind attacks and address limited threat scenarios
DISTRIBUTION A. Approved for public release: distribution unlimited.
32
TA1.2 Modalities
Images: classification, detection, segmentation • Phases 2 & 3 will add other multichannel image
modalities (e.g., RGB-depth or IR).All Teams:
Choose 1 or Both:
Optional:
Explore impact on other domains / modalities. e.g.• IR, LIDAR• Novel sensors* (e.g. neuromorphic spiking camera)• Text / NLP• Reinforcement Learning
Government-led Evaluation Scenarios
Performer-selected Evaluation Tasks
Audio: audio classification, speech recognitionAND / OR
Video: action recognition, prediction
(Including physical world scenarios)
* Department of Defense (DoD)-specific sensors are not the focus of GARD
DISTRIBUTION A. Approved for public release: distribution unlimited.
33
TA2: Evaluation Framework
Tools and protocols for rigorous risk assessment for ML
TA2 will:• Develop a testbed for holistic risk evaluation of TA1 defenses in diverse settings defined
by Government-created Evaluation Scenarios. Testbed will cover: training and inference time attacks, digital and physical scenarios, multiple modalities and sensors, active input collection
• Implement baseline defenses to serve as benchmarks• Port SOA attacks from the literature and tune to scenarios• Come with their own metrics, and apply TA1.1 assessments of vulnerability• To the extent possible, make the testbed available as open source
Bad defenses might have proliferated because of bad testing
TA2 is not simply a Software Engineering TA; Understanding of adversarial attacks and defenses is crucial
DISTRIBUTION A. Approved for public release: distribution unlimited.
34
Base Metrics for Defense Effectiveness
Defense Figure of Merit (DFOM): (M-P) / (N-P)
Defense Improvement over Baseline (DIB): (M-P) / (B-P)
100% Accuracy
N – Baseline Accuracy (unattacked, undefended)
M – Accuracy after attack (Proposed defense)
P – Accuracy after attack (undefended)
0% Accuracy
Atta
ck
Defe
nse
Base
line
B – Accuracy after attack (Baseline defense)
Basic Metrics attempt to quantify how GARD defensive measures maintain or improve the success rate of ML when the system is attacked
In each phase, target success will be higher
DISTRIBUTION A. Approved for public release: distribution unlimited.
35
While DFOM and DIBD serve as the primary initial metrics of robustness, GARD considers adversarial robustness as an inherently multi-objective problem. Defenses will also be evaluated using additional metrics, such as:
• Errors during defense and errors during detection• Including Type I, II, and III errors (over-predict, under-predict, and recover from targeted attacks) made at inference
time when an attack detection mechanism is used;• Cost of defense in operation• Cost of defense training• Computational effort
Additional Metrics
DISTRIBUTION A. Approved for public release: distribution unlimited.
36
Scenario-based Evaluation
Meaningful results require evaluations grounded in credible threat scenarios
Example Scenarios:
Blind attacks on object recognition
not_a_pipe
“Over the Air” attacks on speech recognition
• Each Scenario will define• ML task: (e.g. Classification, Detection, Prediction)• Input access: Digital vs Physical• Attack phase: during training and/or inference• Attacker (or defender) knowledge of ML system
(e.g., white box, black box, blind / transfer)• Input modalities (e.g., image, video or audio) and
multi-modal and multi-sensor settings • Attack/defense constraints (e.g., stealth,
computation, accuracy, energy) and corresponding metrics
The Government Evaluator will create Evaluation Scenarios to test the factors most relevant to different security models
DISTRIBUTION A. Approved for public release: distribution unlimited.
37
• Demonstrates knowledge of the state of the art in this area; cites relevant work• Focus on scenario based defenses, realistic (physical and not only white-box),
multiple modalities• Presents strong justification for the proposed approach, not simply intuition• Does not rely entirely on simplistic datasets (e.g., MNIST) or attacks (e.g., FGSM)• Discusses previous accomplishments and work in closely related research areas,
including prior work that will provide a starting point for the proposed research• Presents a clear plan to achieve the project’s goal• Candid discussion of meaningful technical risks and a strategy for mitigating them
Features of a strong proposal
DISTRIBUTION A. Approved for public release: distribution unlimited.
38
• TA1.1• Consider realistic and multiple source attacks• Articulates the key directions of theoretical exploration and justification for this choice• Presents a clear approach for testing the theoretical findings• Clearly indicates what aspects of the work are likely to depend on modality or ML model and what aspects are
independent of such choices• Discusses potential impact of results on TA1.2 and TA2 and plans for fostering this collaboration
• TA1.2• Articulates what is new in your algorithm; why should it work?• Clearly addresses physical world attacks in addition to digital, multiple source attacks, realistic constraints• Specifies the modalities that will be chosen and their particular properties in relation to adversarial AI• Demonstrates a credible approach to flexibly adapting defense methods to different threat models and resource
constraints required by the Evaluation Scenarios• TA2
• Describes prior experience implementing and testing deception attacks/defenses on ML systems• Describes metrics and approaches for testing• Describes your approach to developing the software testbed to support the diversity of the Evaluation Scenarios• Presents a plan for working with the Government Evaluator and TA1 performers throughout the program to
define and develop necessary testbed features
Features of a strong proposal (TA-specific)
DISTRIBUTION A. Approved for public release: distribution unlimited.
www.darpa.mil
DISTRIBUTION A. Approved for public release: distribution unlimited.