debugging the machine learning...
TRANSCRIPT
![Page 1: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/1.jpg)
Debugging the Machine Learning Pipeline
Jerry Zhu
University of Wisconsin-Madison
joint work with Xuezhou Zhang, Stephen WrightInterpretable ML Symposium, NIPS 2017
![Page 2: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/2.jpg)
Debugging provides an opportunity for machine learninginterpretability.
![Page 3: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/3.jpg)
Harry Potter toy example
![Page 4: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/4.jpg)
Hired by the Ministry of Magic?
+ yes
o no
0 0.5 1magical heritage
0
0.5
1ed
ucat
ion
![Page 5: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/5.jpg)
Data contain historical biases
Learned vs. ideal decision boundary
0 0.5 1magical heritage
0
0.5
1
educ
atio
n
(RBF kernel logistic regression)
![Page 6: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/6.jpg)
Trusted items
I obtained by expensive vetting
I insufficient to learn from
0 0.5 1magical heritage
0
0.5
1ed
ucat
ion
![Page 7: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/7.jpg)
Debugging using trusted items
I propose training label bugs
I flipping them makes re-trained model agree with trusted items
0 0.5 1
magical heritage
0
0.5
1ed
uca
tio
n
![Page 8: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/8.jpg)
Proposed bugs
I given to experts to interpret
0 0.5 1
magical heritage
0
0.5
1
educa
tion
![Page 9: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/9.jpg)
The ML pipeline
data (X,Y ) → learner ` → parameters λ → model θ
θ = argminθ∈Θ
`(X,Y, θ) + λ‖θ‖
![Page 10: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/10.jpg)
Postconditions
Ψ(θ)
Examples:
I “the learned model must correctly predict an important item(x, y)”
θ(x) = y
I “the learned model must satisfy individual fairness”
∀x, x′, |p(y = 1 | x, θ)− p(y = 1 | x′, θ)| ≤ L‖x− x′‖
![Page 11: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/11.jpg)
Bug Assumptions
I Ψ satisfied if we were to train through “clean pipeline”
I bugs are changes to the clean pipeline
I Ψ violated on the dirty pipeline
![Page 12: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/12.jpg)
This is not our goal
Just to learn a better model:
minθ∈Θ
`(X,Y, θ) + λ‖θ‖
s.t. Ψ(θ) = true
![Page 13: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/13.jpg)
This is our goalTo identify bugs and fix them (and learn a better model):
minY ′,θ
‖Y − Y ′‖
s.t. Ψ(θ) = true
θ = argminθ∈Θ
`(X,Y ′, θ) + λ‖θ‖
![Page 14: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/14.jpg)
Special case: bugs in training labels
I Ψ satisfied if we were to train on “clean data” (X,Y ′)
I bugs are changes to clean labels
(X,Y ) = (X,Y ′ + ∆)
I not just about outliersI may contain systematic biases
![Page 15: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/15.jpg)
Input / output to our debugger
Input:
1. dirty training set (X,Y )
2. trusted items (X, Y )
3. the learner
Output:
1. Y ′
2. confidence
![Page 16: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/16.jpg)
Formulation equivalent to machine teaching
minY ′
‖Y ′ − Y ‖
s.t. θ(X) = Y
θ = argminθ∈Θ
1
n
n∑i=1
`(xi, y′i, θ) + λ‖θ‖2
Difficult!
I combinatorial
I bilevel optimization (Stackelberg game)
[Dec. 9 Workshop on Teaching Machines, Robots, and Humans]
![Page 17: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/17.jpg)
Combinatorial to continuous relaxation
step 1. label to probability simplex
y′i → δi ∈ ∆
step 2. counting to probability mass
‖Y ′ − Y ‖ → 1
n
n∑i=1
(1− δi,yi)
step 3. soften postcondition
θ(X) = Y → 1
m
m∑i=1
`(xi, yi, θ)
![Page 18: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/18.jpg)
Continuous now, but still bilevel
argminδ∈∆n,θ
1
m
m∑i=1
`(xi, yi, θ) + γ1
n
n∑i=1
(1− δi,yi)
s.t. θ = argminθ
1
n
n∑i=1
k∑j=1
δij`(xi, j, θ) + λ‖θ‖2
![Page 19: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/19.jpg)
Removing the lower level problem
θ = argminθ
1
n
n∑i=1
k∑j=1
δij`(xi, j, θ) + λ‖θ‖2
step 1. the KKT condition
1
n
n∑i=1
k∑j=1
δij∇θ`(xi, j, θ) + 2λθ = 0
step 2. plug implicit function θ(δ) into upper level problem
argminδ
1
m
m∑i=1
`(xi, yi, θ(δ)) + γ1
n
n∑i=1
(1− δi,yi)
step 3. compute gradient ∇δ with implicit function theorem
![Page 20: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/20.jpg)
Software available.
![Page 21: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/21.jpg)
Harry Potter Toy Example
0 0.5 1
magical heritage
0
0.5
1
educa
tion
0 0.5 1
magical heritage
0
0.5
1
educa
tion
0 0.5 1
magical heritage
0
0.5
1
educa
tion
data our debugger influence function
0 0.5 1
magical heritage
0
0.5
1
educa
tion
0 0.5 1
magical heritage
0
0.5
1ed
uca
tion
0 0.5 1
Recall
0
0.5
1
Pre
cis
ion
DUTI
INF
NN
LND (Oracle)
nearest neighbor label noise detection average PR
![Page 22: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/22.jpg)
Another special case: bug in regularization weight
0 20 40 60 80 100
score
0
0.5
1
pass
/fail
(logistic regression)
![Page 23: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/23.jpg)
Postcondition violated
Ψ(θ): Individual fairness (Lipschitz condition)
∀x, x′, |p(y = 1 | x, θ)− p(y = 1 | x′, θ)| ≤ L‖x− x′‖
![Page 24: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/24.jpg)
Bug assumption
Learner’s regularization weight λ = 0.001 was inappropriate
θ = argminθ∈Θ
`(X,Y, θ) + λ‖θ‖2
![Page 25: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/25.jpg)
Debugging formulation
minλ′,θ
(λ′ − λ)2
s.t. Ψ(θ) = true
θ = argminθ∈Θ
`(X,Y, θ) + λ′‖θ‖2
![Page 26: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/26.jpg)
Suggested bug
λ = 0.001→ λ′ = 121
0 20 40 60 80 100
score
0
0.5
1
pass
/fail
![Page 27: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/27.jpg)
Call for ML bug repository
I like software bug repositories in software engineeringI need data provenance
I which training items (or other things) were wrongI what they should be
![Page 28: Debugging the Machine Learning Pipelinepages.cs.wisc.edu/~jerryzhu/machineteaching/pub/debugML.pdfHarry Potter Toy Example 0 0.5 1 magical heritage 0 0.5 1 education 0 0.5 1 magical](https://reader030.vdocuments.net/reader030/viewer/2022040515/5e7390c9a23510133c749cd1/html5/thumbnails/28.jpg)
References
I Xuezhou Zhang, Xiaojin Zhu, and Stephen Wright. Trainingset debugging using trusted items. AAAI 2018
I Gabriel Cadamuro, Ran Gilad-Bachrach, and Xiaojin Zhu.Debugging machine learning models. ICML Workshop onReliable Machine Learning in the Wild, 2016.
I Shalini Ghosh, Patrick Lincoln, Ashish Tiwari, and XiaojinZhu. Trusted machine learning for probabilistic models. –
I http://www.cs.wisc.edu/~jerryzhu/machineteaching/