humans to the rescue: troubleshooting ai systems with human-in-the-loop
TRANSCRIPT
Humans to the Rescue: Troubleshooting
AI Systems with Human-in-the-loop
Ece Kamar
Senior Researcher, Microsoft Research AI
Exciting Times
AI and the Crowd
training data
accuracy
test data
Power of Data
[Banko&Brill, 2001]
In the Wild
In the Wild
Hybrid Intelligence
Human Intelligence
AI Systems
AI Applied to Critical Domains
Power of the Hybrid
[Courtesy of Murray Campbell]
Troubleshooting of ML Systems
training data
accuracy
test data
querysystem
response
execution
data
In the lab
In the wild
What is the performance in the wild?
How does the system fail?
Why does the system fail?
How the system can be improved?
Biases in ML
[Lakkaraju, K., Caruana, Horvitz; AAAI 2017]
Biases in ML
[Lakkaraju, K., Caruana, Horvitz; AAAI 2017]
Biases in ML
[Lakkaraju, K., Caruana, Horvitz; AAAI 2017]
Where do Blind Spots Come From?
M
cats
dogs
cat
(conf = 0.96)
Unknown unknowns: Data points with confident but incorrect predictions.
Blind-spots: Feature spaces with high concentration of unknown unknowns
Blind-spots Detection
execution data
Beat the Machine [Attenberg, Ipeirotis, Provost, 2011]
Exploration of Unknown Unknowns[Lakkaraju, K., Caruana, Horvitz, 2011]
Step 1:
Descriptive
Space
Partitioning
execution data
Step 2:
Multi-armed
Bandit
based
Exploration
Troubleshooting Complex Systems
Challenge
Possible fixes
for each
component
Limited development time
Where to invest
development time for
biggest impact?
Human-assisted troubleshooting methodology
system
outputComponent
1
Component
2
Component
3
I/OI/O
Evalu
ation
Failures
Fixes
[Nushi, K., Kossmann, Horvitz, 2011]
Complex Issues
Fairness Biases
TransparencyResponsibility
Good vs. Bad
Policy & Law
Complex challenges
require collective efforts
No AI is perfect