resampling: crossvalidation edvul · 2021. 1. 7. · 4 0.00 15690.46 note:...
TRANSCRIPT
![Page 1: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/1.jpg)
201ab Quantitative methodsResampling: Cross Validation
Ed Vul
![Page 2: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/2.jpg)
Resampling
Using our existing data to generate possible samples and thusobtain a sampling distribution:
I Bootstrap: of a statitistc for confidence intervals.I Randomization: under the null for NHST.I Cross Validation: for prediction.
![Page 3: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/3.jpg)
The problem: overfitting
−2.5
0.0
2.5
0.00 0.25 0.50 0.75 1.00x
y
![Page 4: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/4.jpg)
The problem: overfitting
9th order polynomial for10 data points
−2.5
0.0
2.5
0.00 0.25 0.50 0.75 1.00x
y
I Complex models can fitweird patterns.
I They will fit noise, not justsignal
I Fitting noise yields terribleprediction performance,even though “fit” toobserved data looks verygood.
![Page 5: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/5.jpg)
Overfitting yields worse prediction error
−2.5
0.0
2.5
0.00 0.25 0.50 0.75 1.00x
y
![Page 6: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/6.jpg)
Overfitting yields worse prediction error
−2.5
0.0
2.5
0.00 0.25 0.50 0.75 1.00x
y
![Page 7: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/7.jpg)
The problem: overfitting
We want to. . .
I know how well our model will predict new data, not just howwell it fits observed data/noise.
I pick models that will predict new data well, and not overfit.
But we obviously have not yet seen future data.
![Page 8: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/8.jpg)
Solution: Hold out / validation data
I Use part of existing data as though we have not seen it: Splitthe data into two sets:training used to fit the modeltest (“holdout”) to evaluate the model
I Doing this once is ok if we have a lot of data, so both trainingand test sets can be big even with split.
I With little data we will have lots of variability in evaluation.
![Page 9: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/9.jpg)
Cross-validation
We will do the hold-out process a bunch of times on the same datato try to reduce noise in our test-set performance.
This gives us a better estimate of prediction accuracy for the modelclass (but not any one particular set of parameter values!).
![Page 10: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/10.jpg)
Hold-out: example
dat <- dat %>%mutate(use_as = ifelse((1:n())%%2==1,'train', 'test'))
training_data = dat %>%filter(use_as == 'train')
test_data = dat %>%filter(use_as == 'test')
x y use_as0.00 -0.21 train0.11 -0.93 test0.22 -0.93 train0.33 0.65 test0.44 -1.06 train0.56 0.11 test0.67 2.40 train0.78 1.29 test0.89 0.99 train1.00 0.94 test
![Page 11: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/11.jpg)
Hold-out: example
I fit model on training dataM = lm(data = training_data, y~poly(x, 3))
I generate predictions on test dataprediction = predict(M, test_data)
I Measure prediction error. Here: as sum of squared errors.sum((test_data$y - prediction)ˆ2)
## [1] 7.616142
![Page 12: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/12.jpg)
Train vs test performance as function of complexity
poly.order train.SSE test.SSE1 2.52 5.862 1.88 23.673 0.94 587.044 0.00 15690.46
Note: 10 total datapoints, splitting into 5 train, 5 test. Over andover again. (More on this later)
![Page 13: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/13.jpg)
Leave-one-out cross-validation
Run hold out n times for n data points. Each time use one datapoint as the test data, and the remaining n-1 datapoints as training.
![Page 14: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/14.jpg)
Leave-one-out cross-validation
n = nrow(dat)loo_error = rep(NA, n)for(i in 1:n){training_data = dat[(1:n)[-i], ]test_data = dat[i,]M = lm(data = training_data, y~poly(x, 3))prediction = predict(M, test_data)loo_error[i] = (test_data$y - prediction)ˆ2
}
![Page 15: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/15.jpg)
Leave-one-out cross-validation
0
1
2
3
0 1 2 3error
coun
t
mean(loo_error)
## [1] 0.9759489
![Page 16: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/16.jpg)
Varieties of cross-validation
I Repeated random sub-sampling (suitable for larger sample sizesand replicates)
I Leave k out (LOO: k=1): exhaustive, for small sample sizesI K-fold (LOO: k=n)
For both fitting and evaluation:- Nested cross-validation.
There are lots of varieties of error/fit measures depending on whatyou are after.
![Page 17: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/17.jpg)
Larger-scale example: data
## Rows: 251## Columns: 14## $ bf.percent <dbl> 12.3, 6.1, 25.3, 10.4, 28.7, 20.9, 19.2, 12.4, 4.1, 11.7...## $ age <dbl> 23, 22, 22, 26, 24, 24, 26, 25, 25, 23, 26, 27, 32, 30, ...## $ weight <dbl> 154.25, 173.25, 154.00, 184.75, 184.25, 210.25, 181.00, ...## $ height <dbl> 67.75, 72.25, 66.25, 72.25, 71.25, 74.75, 69.75, 72.50, ...## $ neck <dbl> 36.2, 38.5, 34.0, 37.4, 34.4, 39.0, 36.4, 37.8, 38.1, 42...## $ chest <dbl> 93.1, 93.6, 95.8, 101.8, 97.3, 104.5, 105.1, 99.6, 100.9...## $ abdomen <dbl> 85.2, 83.0, 87.9, 86.4, 100.0, 94.4, 90.7, 88.5, 82.5, 8...## $ hip <dbl> 94.5, 98.7, 99.2, 101.2, 101.9, 107.8, 100.3, 97.1, 99.9...## $ thigh <dbl> 59.0, 58.7, 59.6, 60.1, 63.2, 66.0, 58.4, 60.0, 62.9, 63...## $ knee <dbl> 37.3, 37.3, 38.9, 37.3, 42.2, 42.0, 38.3, 39.4, 38.3, 41...## $ ankle <dbl> 21.9, 23.4, 24.0, 22.8, 24.0, 25.6, 22.9, 23.2, 23.8, 25...## $ bicep <dbl> 32.0, 30.5, 28.8, 32.4, 32.2, 35.7, 31.9, 30.5, 35.9, 35...## $ forearm <dbl> 27.4, 28.9, 25.2, 29.4, 27.7, 30.6, 27.8, 29.0, 31.1, 30...## $ wrist <dbl> 17.1, 18.2, 16.6, 18.2, 17.7, 18.8, 17.7, 18.8, 18.2, 19...
![Page 18: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/18.jpg)
Large-scale example: Modelslm.model = lm(bf.percent ~ .)svr.model = e1071::svm(bf.percent ~ ., cross=0)lm2.model = lm(bf.percent ~ polym(age,
weight,height,neck,chest,abdomen,hip,thigh,knee,ankle,bicep,forearm,wrist,raw = T,degree = 2))
![Page 19: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/19.jpg)
Leave-50-out random sub-samplingRMSE = function(true_y, predicted_y){sqrt(mean((predicted_y - true_y)ˆ2))
}
n = nrow(dat)k = 50repetitions = 100errors = rep(NA, repetitions)for(i in 1:repetitions){test_idx = sort(sample(n, k, replace=F))train_idx = (1:n)[-test_idx]test_dat = dat[test_idx,]train_dat = dat[train_idx,]M = lm(data = train_dat, bf.percent ~ .)pred_y = predict(M, test_dat)errors[i] = RMSE(test_dat$bf.percent, pred_y)
}
![Page 20: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/20.jpg)
Leave-50-out random sub-sampling: Results
1.0
1.5
2.0
2.5
lm lm2 svrmodel
log1
0(M
SE
)
type
test.err
train.err
![Page 21: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/21.jpg)
Resampling: Cross-validation
Goal: estimate prediction accuracy/error on future data withoutactually having data from the future.
Strategy: Repeat many times:
I Split existing data into training and test setI fit model to training set, evaluate error on test set.
![Page 22: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/22.jpg)
Resampling: Bootstrap
Goal: quantify sampling error in some statistic to get confidenceintervals.
Strategy:
I Generate new hypothetical samples of the same size as existingsample by resampling from it (with replacement!).
I Calculate statistic on each sample to obtain many samples ofthe sampling distribution of statistic.
I Use that to get confidence intervals via quantile function.
![Page 23: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/23.jpg)
Resampling: Randomization
Goal: test a null hypothesis that some structure/regularity does notexist in the data.
Strategy:
I Define statistic to measure structureI Define shuffling (sampling without replacement) process to
destroy only that structure.I Repeat many times: statistic(shuffle(data)) to obtain many
samples of the distribution of statistic under null.I Calculate p value from samples.
![Page 24: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/24.jpg)
Resampling recap
Randomization Shuffle data to obtain sampling distribution ofstatistic under the null, and thus test null hypothesis.
Bootstrapping Resample current data to obtain samplingdistribution of statistic, and thus get a confidence interval.
Cross-validation Subsample existing data into training and test toestimate prediction performance on unseen data.
![Page 25: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/25.jpg)
Resampling: beware
You have lots of responsibility here. Lots of room make a mistake,and only check / catch it if mistake is unfavorable.
![Page 26: Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note: 10totaldatapoints,splittinginto5train,5test. Overand overagain. (Moreonthislater) Leave-one-outcross-validation](https://reader035.vdocuments.net/reader035/viewer/2022071416/6112556bd7c4874e1e53ceae/html5/thumbnails/26.jpg)
Questions?