cross-validation to assess decoder performance: the good, the bad, and the ugly
TRANSCRIPT
Cross-validation to assess decoder performance:the good, the bad, and the ugly
Gaël Varoquaux
https://hal.archives-ouvertes.fr/hal-01332785
Measuring prediction accuracy
To find the best method(computer scientists)
For information mapping = omnibus test(cognitive neuroimaging)
Cross-validationasymptotically unbiasednon parametric
G Varoquaux 2
1 Some theory
2 Empirical results on brain imaging
G Varoquaux 3
1 Some theory
Test setTrain set
Full data
G Varoquaux 4
1 Cross-validationTest on independent data
Train set Validation set
Loop
Test setTrain set
Full data
Measures prediction accuracy
G Varoquaux 5
1 Cross-validationTest on independent data
Train set Validation set
Loop
Test setTrain set
Full data
Measures prediction accuracyG Varoquaux 5
1 Choice of cross-validation strategyTest on independent dataBe robust to confounding dependences
Leave subjects out, or sessions out
LoopMore loop = more data points
Need to balance error in training model/ error on test
G Varoquaux 6
1 Choice of cross-validation strategy: theory
Negative bias (underestimate performance)decreasing with the size of the training set
[Arlot... 2010] sec.5.1
Variance decreases with the size of the test set[Arlot... 2010] sec.5.2
Fraction of data left out: 10–20%Many random splits of the datarespecting dependency structure
G Varoquaux 7
1 Tuning hyper-parametersComputer scientist says:
You need to set C in your SVM
10-410-310-210-1100 101 102 103 104
Parameter tuning: C
Training set
Validation set
G Varoquaux 8
1 Tuning hyper-parametersComputer scientist says:
You need to set C in your SVM
10-410-310-210-1100 101 102 103 104
Parameter tuning: C
Training set
Validation set
G Varoquaux 8
1 Nested cross-validationTest on independent data
Train set Validation set
Two loops
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 9
2 Empirical results on brainimaging
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 10
2 Datasets and tasks
7 fMRI datasets (6 from openfMRI)Haxby: 5 subjects, 15 inter-subject predictionsInter-subject predictions on 6 studies
OASIS VBM, gender discrimination
HCP MEG task, intra-subject, working memory
# samples: ∼ 200 (min 80, max 400)accuracy min 62%, max 96%
G Varoquaux 11
2 Experiment 1: measuring cross-validation errorLeave out a large validation setMeasure error by cross-validation on the restCompare
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 12
2 Cross-validated measure versus validation set
50.0% 60.0% 70.0% 80.0% 90.0% 100.0%
Accuracy on validation set
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Acc
urac
y m
easu
red
by c
ross
val
idat
ion
Intra subjectInter subject
G Varoquaux 13
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
22% +19%
+3% +43%
Intrasubject
Intersubject
G Varoquaux 14
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
22% +19%
+3% +43%
10% +10%
21% +17%
Intrasubject
Intersubject
G Varoquaux 14
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
20% left out, 3 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
Intrasubject
Intersubject
G Varoquaux 14
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
20% left out, 3 splits
20% left out, 10 splits
20% left out, 50 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
9% +9%
24% +14%
9% +8%
23% +13%
Intrasubject
Intersubject
G Varoquaux 14
2 Simple simulations
X1
X2
time
X1
2 Gaussian-separatedclouds
Auto-correlated noise
200 decoding samples10 000 validation samples⇒ Validation
= assymptotics
G Varoquaux 15
2 Simple simulations
X1
X2
time
X1
X1
X2
time
X1
G Varoquaux 15
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave oneblock out
20% leftout, 3 splits
20% leftout, 10 splits
20% leftout, 50 splits
16% +14%
+4% +33%
15% +13%
8% +8%
15% +12%
10% +11%
13% +10%
8% +8%
12% +10%
7% +7%
MEG data
Simulations
G Varoquaux 16
2 Experiment 2: parameter-tuningCompare different strategies on validation set:1. Use the default C = 12. Use C = 10003. Choose best C by cross-validation and refit3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
Non-sparse decodersSVM `2Log-reg `2
Sparse decodersSVM `1Log-reg `1
G Varoquaux 17
2 Experiment 2: parameter-tuningCompare different strategies on validation set:1. Use the default C = 12. Use C = 10003. Choose best C by cross-validation and refit3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
Non-sparse decodersSVM `2Log-reg `2
Sparse decodersSVM `1Log-reg `1
G Varoquaux 17
2 Cross-validation for tuning?
CV +
averaging CV +
refitting C=1
C=1000
8%
4%
2%
0%
+2%
+4%
+8%
Impa
ct o
n pr
edic
tion
accu
racy
SVMlogreg
⇓
CV +
averaging CV +
refitting C=1
C=1000
8%
4%
2%
0%
+2%
+4%
+8%
Impa
ct o
n pr
edic
tion
accu
racy
SVMlogreg
⇑
Non-sparse models Sparse models
G Varoquaux 18
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
References I
S. Arlot, A. Celisse, ... A survey of cross-validation procedures formodel selection. Statistics surveys, 4:40–79, 2010.