submit predictions
DESCRIPTION
Goal. Predict whom survived the Titanic Disaster. Hypotheses. Woman and Children First. Get Data. Read dataset into Excel, R, etc. Data Management. Some Age Missing Data, Analyze Gender Only. Statistics & Analysis. 74% Women, 19% Men . Submit Predictions. 320 / 418 = 76.5%. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/1.jpg)
Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Some Age Missing Data, Analyze Gender Only
74% Women, 19% Men
320 / 418 = 76.5%
![Page 2: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/2.jpg)
Variable Description Type Hypothesispclass Passenger Class Categorical,
Ordinal1st class 3rd
name Name TextSex Sex Categoricalage Age Numericsibsp Number of Siblings/Spouses Aboard Integer
parch Number of Parents/Children Aboard Integer
ticket Ticket Number Textfare Passenger Fare Numericcabin Cabin Textembarked Port of Embarkation Categorical
Predictor Variables
![Page 3: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/3.jpg)
AgeAll
N = 891
MissingN = 177
DataN = 714
0 10 20 30 40 50 60 70 80 900
2
4
6
8
10
12
14
16
18
20
Survived Not
![Page 4: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/4.jpg)
• Dependent variable, (Y) • Continuous• Categorical
Decision Trees
The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y
Survived
Age Lesser Than X
Age Greater Than X• Independent variables, (X’s)
• Continuous• Categorical
![Page 5: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/5.jpg)
Age
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
5
10
15
20
25
30
35
40
45
50
A B Delta N
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18
20
![Page 6: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/6.jpg)
• maximize data likelihood (minimize deviance).
Decision Trees
![Page 7: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/7.jpg)
Prediction and Missing Values
Variable Descriptionpclass Passenger Classname NameSex Sexage Agesibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Numberfare Passenger Farecabin Cabinembarked Port of Embarkation
Correlation, Association of Age with other Variables?
![Page 8: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/8.jpg)
Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Some Age Missing Data, Analyze Gender Only
74% Women, 19% Men
320 / 418 = 76.5%
![Page 9: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/9.jpg)
Gender
![Page 10: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/10.jpg)
Gender and Age• Tree grows based on optimizing
only the split from the current node rather then optimizing the entire tree• Tree stops when further split
becomes ineffective
0 10 20 30 40 50 60 700%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Female Survival%
![Page 11: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/11.jpg)
Prediction: Gender + Age
![Page 12: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/12.jpg)
Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Some Age Missing Data, Analyze Gender Only
![Page 13: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/13.jpg)
Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Age + Gender
![Page 14: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/14.jpg)
Kitchen Sink
![Page 15: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/15.jpg)
Kitchen Sink
![Page 16: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/16.jpg)
• Popular Implementations• CART Classification And Regression Tree• CHAID CHi-squared Automatic Interaction Detector
• CHAID allows multiple branch split - a wider tree• CART uses binary split
Decision Trees
![Page 17: Submit Predictions](https://reader036.vdocuments.net/reader036/viewer/2022062310/56815d93550346895dcbac9c/html5/thumbnails/17.jpg)