on human predictions with explanations and predictions of … · 2020. 5. 13. · explanationshelp...
TRANSCRIPT
On Human Predictions with Explanations andPredictions of Machine Learning Models:A Case Study on Deception Detection
Vivian Lai and Chenhao Tan@vivwylai | @chenhaotanvivlai.github.io | chenhaot.comUniversity of Colorado Boulderdeception.machineintheloop.com
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Risk assessment: COMPAS
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Most previous studies are concerned with the impact of such tools used in full automation
Judges are required to take account of the algorithm’s limitations in Wisconsin
In the end, though, Justice Bradley allowed sentencing judges to use Compas. They must take account of thealgorithm's limitations and the secrecy surrounding it, she wrote, but she said the software could be helpful ”in providing the sentencing court with as much information as possible in order to arrive at an individualized sentence.”
https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-software-programs-secret-algorithms.html
Full automation is not desired
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
How judges make decisions with COMPAS?
How humans make decisions with machine assistance in challenging tasks?
Full humanagency
Full automation
Showing machine predicted labels
Showing machine predicted labels and
explanations
Showing machine predicted labels and
suggesting high accuracy
Showing only explanations (by highlighting salient
information)
A spectrum between full human agency and full automation
Full humanagency
Full automation
Showing machine predicted labels
Showing machine predicted labels and
explanations
Showing machine predicted labels and
suggesting high accuracy
Showing only explanations (by highlighting salient
information)
Deception Detection as a Case Study
87%~50%
I would not stay at this hotel again. The rooms had a fowl odor. It seemed as though the carpets have never been cleaned. The neighborhood was also less than desirable. The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.
I would not stay at this hotel again. The rooms had a fowl odor. It seemed as though the carpets have never been cleaned. The neighborhood was also less than desirable. The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.
The machine predicts that the below review is deceptive
I would not stay at this hotel again. The rooms had a fowl odor. It seemed as though the carpets have never been cleaned. The neighborhood was also less than desirable. The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.
Showing machine predicted labels
Showing machine predicted labels and
explanations
Showing machine predicted labels and
suggesting high accuracy
Showing only explanations(by highlighting salient
information)
Can explanations alone improve human performance?
87%
57.6%
55.9%
54.4%
51.1%
45 55 65 75 85
Machine
Heatmap
Highlight
Examples
Control
p=0.006
p<0.001
Explanations alone slightly improve human performance
Accuracy (%)
p=0.056
Showing machine predicted labels
Showing machine predicted labels and
explanations
Showing machine predicted labels and
suggesting high accuracy
Showing only explanations(by highlighting salient
information)
Predicted labels > explanations
87%
74.6%
61.9%
57.6%
51.1%
45 55 65 75 85
Machine
Predicted labelwith accuracy
Predicted labelwithout accuracy
Heatmap
Control
Explicit accuracy improve human performance drastically
Accuracy (%)
p<0.001
p<0.001
p<0.001
Showing machine predicted labels
Showing machine predicted labels and
explanations
Showing machine predicted labels and
suggesting high accuracy
Showing only explanations (by highlighting salient
information)
Tradeoff between human performance and human agency
Higher agency,lower performance
Lower agency,higher performance
Showing machine predicted labels
Showing machinepredicted labels and
explanations
Showing machine predicted labels and
suggesting high accuracy
Showing only explanations (by highlighting salient
information)
Can explanations moderate this tradeoff?
87%
74.6%
72.5%
61.9%
45 55 65 75 85
Machine
Predicted labelwith accuracy
Predicted label& heatmap
Predicted labelwithout accuracy
Predicted labels + explanations ≈ explicit accuracy
Accuracy (%)
p<0.001
p<0.001
Showing machine predicted labels
Showing machinepredicted labels and
explanations
Showing machine predicted labels and
suggesting high accuracy
Showing only explanations (by highlighting salient
information)
How much do humans trust the predictions?
79.6%
78.7%
64.4%
45 55 65 75 85
Predicted labelwith accuracy
Predicted label& heatmap
Predicted labelwithout accuracy
Explanations help increase humans trust on predictions
Trust (%)
p<0.001
p<0.001
69.8%
74.1%
60%
81.1%
79.4%
65.1%
45 55 65 75 85
Predicted labelwith accuracy
Predicted label& heatmap
Predicted labelwithout accuracy
CorrectIncorrect
Humans are more likely to trust predictions when they are correct
Trust (%)
Other analysis
Showing varying accuracies Heterogeneity between participants
506070
Showing machine predicted labels
Showing machine predicted labels and
suggesting high accuracy
Higher agency,lower performance
Lower agency,higher performance
Vivian Lai and Chenhao Tan@vivwylai | @chenhaotanvivlai.github.io | chenhaot.comUniversity of Colorado Boulderdeception.machineintheloop.com
Takeaway
Explanations alone only slightly improve human
performance
Explanations can moderate the
tradeoff