edge cases and autonomous vehicle safetykoopman/lectures/1905_koopman_saturn_slides.pdf · 1990....

31
Edge Cases and Autonomous Vehicle Safety SATURN, Pittsburgh PA May 7, 2019 © 2019 Philip Koopman Prof. Philip Koopman @PhilKoopman

Upload: others

Post on 26-Oct-2019

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

Edge Cases andAutonomous Vehicle

SafetySATURN, Pittsburgh PA

May 7, 2019

© 2019 Philip Koopman

Prof. Philip Koopman

@PhilKoopman

Page 2: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

2© 2019 Philip Koopman

Making safe robots Doer/Checker safetyEdge cases matter Robust perception mattersThe heavy tail distribution Fixing stuff you see in testing

isn’t enoughPerception stress testing Finding the weaknesses in perception

UL 4600: autonomy safety standard

Overview

[General Motors]

Page 3: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

3© 2019 Philip Koopman

98% Solved For 20+ YearsWashington DC to San Diego CMU Navlab 5 Dean Pomerleau Todd Jochem

https://www.cs.cmu.edu/~tjochem/nhaa/nhaa_home_page.html

AHS San Diego demo Aug 1997

July1995

Page 4: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

4© 2019 Philip Koopman

1985 1990 1995 2000 2005 2010

DARPAGrand

Challenge

DARPALAGR

ARPADemo

II

DARPASC-ALV

NASALunar Rover

NASADante II

AutoExcavator

AutoHarvesting

Auto Forklift

Mars Rovers

Urban Challenge

DARPAPerceptOR

DARPAUPI

Auto Haulage

Auto Spraying

Laser Paint RemovalArmy

FCS

Carnegie Mellon University Faculty, staff, studentsOff-campus Robotics Institute facility

NREC: 30+ Years Of Cool Robots

SoftwareSafety

Page 5: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

5© 2019 Philip Koopman

The Big Red Button era

Before Autonomy Software Safety

Page 6: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

6© 2019 Philip Koopman

Traditional Validation Vs. Machine Learning Use traditional software

safety where you can

..BUT..

Machine Learning (inductive training) No requirements

–Training data is difficult to validate No design insight

–Generally inscrutable; prone to gaming and brittleness

?

Page 7: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

7© 2019 Philip Koopman

APD (Autonomous Platform Demonstrator)

TARGET GVW: 8,500 kg TARGET SPEED: 80 km/hr

Approved for Public Release. TACOM Case #20247 Date: 07 OCT 2009

Safety critical speed limit enforcement

Page 8: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

8© 2019 Philip Koopman

Specify unsafe regions

Specify safe regions Under-approximate to simplify

Trigger system safety responseupon transition to unsafe region

Safety Envelope Approach to ML Deployment

UNSAFE!

Page 9: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

9© 2019 Philip Koopman

“Doer” subsystem Implements normal, untrusted functionality

“Checker” subsystem – Traditional SW Implements failsafes (safety functions)

Checker entirely responsible for safety Doer can be at low Safety Integrity Level Checker must be at higher SIL

(Also known as a “safety bag” approach; also monitor/actuator pair)

Architecting A Safety Envelope SystemDoer/Checker Pair

Low SIL

High SILSimpleSafetyEnvelopeChecker

ML

Page 10: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

10© 2019 Philip Koopman

Validating an Autonomous Vehicle Pipeline

ControlSystems

ControlSoftwareValidation

Doer/CheckerArchitecture

AutonomyInterface To

Vehicle

TraditionalSoftwareValidation

Perception presents a uniquely difficult assurance challenge

Randomized& HeuristicAlgorithms

Run-TimeSafety EnvelopesDoer/Checker

Architecture

MachineLearning

BasedApproaches

???

Page 11: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

11© 2019 Philip Koopman

Good for identifying “easy” cases Expensive and potentially dangerous

Brute Force AV Validation: Public Road Testing

http://bit.ly/2toadfa

Page 12: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

12© 2019 Philip Koopman

Validation Via Brute Force Road Testing? If 100M miles/critical mishap… Test 3x–10x longer than mishap rate Need 1 Billion miles of testing

That’s ~25 round tripson every road in the world With fewer than 10 critical mishaps…

Page 13: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

13© 2019 Philip Koopman

Safer, but expensive Not scalable Only tests things you have thought of!

Closed Course Testing

Volvo / Motor Trend

Page 14: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

14© 2019 Philip Koopman

Highly scalable; less expensive Scalable; need to manage fidelity vs. cost Only tests things you have thought of!

Simulation

http://bit.ly/2K5pQCN

Udacity ANSYS

Page 15: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

15© 2019 Philip Koopman

Gaps in training data canlead to perception failure Safety needs to know:

“Is that a person?” Machine learning provides:

“Is that thing like the peoplein my training data?”

Edge Case are surprises You won’t see these in training or testing

Edge cases are the stuff you didn’t think of!

What About Edge Cases?

https://www.clarifai.com/demo

http://bit.ly/2In4rzj

Page 16: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

16© 2019 Philip Koopman

Novel objects (missing from zoo) are triggering events

Need An Edge Case “Zoo”

http://bit.ly/2top1KDhttp://bit.ly/2tvCCPK

https://goo.gl/J3SSyu

Page 17: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

17© 2019 Philip Koopman

Where will you be after 1 Billion miles of validation testing?

Assume 1 Million miles between unsafe “surprises” Example #1:

100 “surprises” @ 100M miles / surprise– All surprises seen about 10 times during testing– With luck, all bugs are fixed

Example #2: 100,000 “surprises” @ 100B miles / surprise– Only 1% of surprises seen during 1B mile testing– Bug fixes give no real improvement (1.01M miles / surprise)

Why Edge Cases Matter

https://goo.gl/3dzguf

Page 18: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

18© 2019 Philip Koopman

Real World: Heavy Tail Distribution(?)

Common ThingsSeen In Testing

Edge CasesNot Seen In Testing

(Heavy Tail Distribution)

Page 19: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

19© 2019 Philip Koopman

Need to find “Triggering Events” to inject into sims/testing

The Heavy Tail Testing Ceiling

Page 20: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

20© 2019 Philip Koopman

Need to collect surprises Novel objects Novel operational conditions

Corner Cases vs. Edge Cases Corner cases: infrequent combinations

– Not all corner cases are edge cases Edge cases: combinations that behave unexpectedly

Issue: novel for person ≠ novel for Machine Learning ML can have “edges” in unexpected places ML might train on features that seem irrelevant to people

Edge Cases Pt. 1: Triggering Event Zoo

https://goo.gl/Ni9HhU Not A Pedestrian

Page 21: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

21© 2019 Philip Koopman

Sensor data corruption experiments

Edge Cases Part 2: Brittleness

Synthetic Equipment Faults

Gaussian blur

Exploring the response of a DNN to environmentalperturbations from “Robustness Testing forPerception Systems,” RIOT Project, NREC, DIST-A.

Defocus & haze area significant issue

Gaussian Blur &Gaussian Noise cause

similar failures

Page 22: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

22© 2019 Philip Koopman

Brittle perception behavior indicates Edge Cases Can uncover false negatives and detect novel objects

Hologram Detects Edge Cases

Page 23: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

23© 2019 Philip Koopman

False positive on lane markingFalse negative real bicyclist

False negative whenin front of dark vehicle

False negative whenperson next to light pole

Context-Dependent Perception FailuresPerception failures are often context-dependent False positives and false negatives are both a problem

Will this pass a “vision test” for bicyclists?

Page 24: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

24© 2019 Philip Koopman

Mask-R CNN: examples of systemic problems we found

Example Triggering Events via Hologram

“Red objects”

Notes: These are baseline, un-augmented images // Your mileage may vary.“Columns”

“Camouflage”

“Sun glare”

“Bare legs”

“Children”

“Single Lane Control”

Page 25: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

25© 2019 Philip Koopman

Page 26: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

26© 2019 Philip Koopman

Page 27: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

27© 2019 Philip Koopman

Page 28: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

28© 2019 Philip Koopman

Drivers do more than just drive Occupant behavior, passenger safety Detecting and managing equipment faults

Operational limitations & situations System exits Operational Design Domain Vehicle fire or catastrophic failure Post-crash response

Interacting with non-drivers Pedestrians, passengers Police, emergency responders

Operations & Human Interactions

https://bit.ly/2GvDkUN

https://bit.ly/2PhzilT

Page 29: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

29© 2019 Philip Koopman

Handling updates Fully recertify after

every weekly update? Security in general

Vehicle maintenance Pre-flight checks, cleaning Corrective maintenance

Supply chain issues Quality fade Supply chain faults

Lifecycle Issueshttps://bit.ly/2IKlZJ9

https://bit.ly/2VavsjM

Is windshield cleaning fluid life critical?

Page 30: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

30© 2019 Philip Koopman

Safety Standard Landscape

Page 31: Edge Cases and Autonomous Vehicle Safetykoopman/lectures/1905_Koopman_Saturn_Slides.pdf · 1990. 1995. 2000. 2005. 2010. DARPA. Grand . Challenge. DARPA. LAGR. ARPA. Demo II. DARPA

31© 2019 Philip Koopman

More safety transparency Independent safety assessments Industry collaboration on safety

Minimum performance standards “Driver test” is necessary -- but not sufficient

– How do you measure maturity?

Autonomy software safety standards ISO 26262/21448 + UL 4600 + IEEE P700x Dealing with uncertainty and brittleness

Ways To Improve AV Safety

http://bit.ly/2MTbT8F (sign modified)

Mars

Thanks!