stat 306 - happydog · stat 306: finding relaonships in data. lecture 15 secons 4.1 and 4.2....
TRANSCRIPT
![Page 1: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/1.jpg)
![Page 2: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/2.jpg)
Stat306:FindingRela1onshipsinData.
Lecture15Sec1ons4.1and4.2
![Page 3: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/3.jpg)
Chapter4–Variableselec1onandaddi1onaldiagnos1cs
![Page 4: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/4.jpg)
Chapter4–Variableselec1onandaddi1onaldiagnos1cs
4.1VariableSelec1onalgorithms4.2Cross-valida1onandout-ofsampleassessment4.3Addi1onaldiagnos1cs4.4Transformsandnonlinearity4.5Diagnos1csfordatacollectedsequen1allyin1me
![Page 5: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/5.jpg)
Observa(onal Experimental
GoalisExplana(on 1. 2.
GoalisPredic(on 3. 4.
Fourcategoriesofscien(ficstudy
![Page 6: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/6.jpg)
GoalisExplana(on
1. Whatques1onsdoyouwanttoask?
2. Defineanappropriatemodel.
3. Definethehypothesesthatcorrespondtotheques1onsofinterest.
4. Collectthedata.
5. Fitthemodelasdefinedearlier.
6. Answeryourques1onswithuncertaintyquan1fica1on(i.e.withp-values,ConfidenceIntervals).
![Page 7: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/7.jpg)
GoalisPredic(on
1. Whatdoyouwanttopredict?
2. Defineanappropriatemetricforevalua1ngqualityofpredic1ons(e.g.RMSE,absolutepredic1onerror,ROCcurve).
3.Collectthedata.4. Separateyourdatainto“train”and“holdout”subsets.5. Fitmanydifferentmodelstothe“train”subsetofthedata.
6. Pickthemodelthatis“best”(accordingtoyourchosenoutcome)formakingpredic1onsonthe“holdout”subsetofthedata.
7. Notethatp-valuesandConfidenceintervalsarenotvalid.
![Page 8: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/8.jpg)
GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)
1. Collectthedata.
2. Selecta“model-selec1on”criteria(e.g.AdjustedR2orCp)3. Iden1fyallpossibleregressionmodelswithallpossible
combina1onsofthepredictors.4. Iden1fyasubsetofmodelsthatarebestintermsofthechosen
“model-selec1on”criteria.
5. Evaluateandrefinethemodelsiden1fiedinStep4bydoingresidualanalyses,transforma1ons,checkingmodelassump1ons.
6. Picka“best”modelfromtherefinedsubsetofmodelsthatmeetsassump1onsandallowsyoutodosomeexplana1ons.
![Page 9: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/9.jpg)
4.1VariableSelec(onalgorithms
• Evenwithasmallnumberofpossiblecovariates,therearealotpossiblemodelsonecouldfit.
• Andthinkaboutallthepossibleinterac1onterms!
• Thiscanmakethingsalmostimpossible.
GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)
![Page 10: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/10.jpg)
4.1VariableSelec(onalgorithms
GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)
TheCpsta1s1c,a“model-selec1on”criteria
TheCpsta(s(candtheadjusted-R2areverysimilar
![Page 11: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/11.jpg)
4.1VariableSelec(onalgorithms
GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)
1. ForwardSelec(on
2. BackwardElimina(on
![Page 12: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/12.jpg)
4.1VariableSelec(onalgorithms
GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)
1. ForwardSelec(on
-startwithonevariable,addonevariableata1me
2.BackwardElimina(on
-startwithfullmodel(allpoten1alvariables),removeonevariableata1me
![Page 13: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/13.jpg)
4.2Train/Test
GoalisPredic(on
![Page 14: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/14.jpg)
GoalisPredic(on
1. Whatdoyouwanttopredict?
2. Defineanappropriatemetricforevalua1ngqualityofpredic1ons(e.g.RMSE,absolutepredic1onerror,ROCcurve).
3.Collectthedata.4. Separateyourdatainto“train”and“holdout”subsets.5. Fitmanydifferentmodelstothe“train”subsetofthedata.
6. Pickthemodelthatis“best”(accordingtoyourchosenoutcome)formakingpredic1onsonthe“holdout”subsetofthedata.
7. Notethatp-valuesandConfidenceintervalsarenotvalid.
![Page 15: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/15.jpg)
4.2Cross-valida(on
GoalisPredic(on
![Page 16: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/16.jpg)
GoalisPredic(on
1. Whatdoyouwanttopredict?
2. Defineanappropriatemetricforevalua1ngqualityofpredic1ons(e.g.RMSE,absolutepredic1onerror,ROCcurve).3.Collectthedata.4. SeparateyourdataintoKrandomsubsets.
5. Forkin1:K- Fityourmodelusingallthedataexceptthekthsubset.- Calculatemetric(e.g.predic1onerror)basedonfibngthemodeltothekthsubsetofthedata.
6. CalculateaverageofKmetricsforeachmodel.
7. Choose“bestmodel”basedonaveragedmetric.8. Notethatp-valuesandConfidenceintervalsarenotvalid.
![Page 17: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/17.jpg)
MeanAbsolutePredic(onError:
12
8
6
9
5
Foreachmodel,wedo5-foldCV:
K-averagedmetric=40/5=8
Metric:
Source:hgp://blog.goldenhelix.com/goldenadmin/cross-valida1on-for-genomic-predic1on-in-svs/
![Page 18: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on](https://reader035.vdocuments.net/reader035/viewer/2022081521/5eda1339b3745412b570b7c8/html5/thumbnails/18.jpg)
4.2Leave-one-out
GoalisPredic(on