texas a&m hsc jin is designed by dr. huber. korean female colon cancer risk factors range...
TRANSCRIPT
![Page 1: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/1.jpg)
Multiple Imputation with large proportions of missing data:how much is too much?
Texas A&M HSC
Jin is designed by Dr. Huber
![Page 2: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/2.jpg)
Korean Female Colon Cancer
RiskFactors
Range
Event Non-event
HR 95% CI P
n % n %
Smok-ing
HabitsMissing 1449400 79.57 4071 95.70 - - - -
No smok-ing
351896 19.32 93 2.19 1.000 1.000 1.000 1.000
Smoked before ,
but quitted 4611 0.25 21 0.49 1.174 1.058 1.303 0.0025
Currently,1/2 pack
8735 0.48 38 0.89 0.948 0.828 1.084 0.4339
Currently,1/2-One
pack5534 0.30 26 0.61 0.991 0.901 1.09 0.8457
Currently,More than One pack
1410 0.08 5 0.12 1.015 0.894 1.153 0.8162
Motivation Motivations and Examples
☞
Is smoking protective?Not sure b/c Huge missing!!
☞
![Page 3: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/3.jpg)
1. Missing Completely At Random(MCAR) : depends neither on observation nor on missing
2. Missing At Random(MAR) : depends only on observation
3. Not Missing At Random(NMAR) : depends both on observation and on missing
Types of Missing data
Diff. byWhy data are missing
background
Affect the effectiveness and biasness of methods for missing data
![Page 4: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/4.jpg)
1. Complete Case Analysis(CCA)
2. Available Case Analysis(ACA)
3. Mean imputation
4. Expectation and Maximum(EM)
5. Multiple Imputation
Older Methods
Single Imputation
MultipleImputation
Methods of handling Missing data
background
Only CCA and MI
![Page 5: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/5.jpg)
Y1 Y2 Y3
140 . 20
31 25 .
10 35 40
25 48 57
30 49 60
35 55 65
37 47 70
140 32 30
42 65 40
50 200 20
1. Complete Case Analysis (CCA)
1. CCA = NOT using any methods of handling missing data 2. By deleting cases, power will be decreased (b/c reduced sample size)
Methods of handling Missing databackground
1. Delete all cases of missing values on Y1,Y2,Y3
2. Analyze remaining cases
![Page 6: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/6.jpg)
2. Multiple Imputation (MI)
(1) Imputation Step
(2) Analysis Step
(3) Combination Step
Methods of handling Missing data
background
MI has 3 steps
![Page 7: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/7.jpg)
Imputa-
tion Number
Y X1 X2
1 1 44 11 178
2 1 45 10 185
3 1 59 16.5
1 136.4
8
4 1 49 9 179.5
9
5 1 60 8 170
6 1 50 38.4
0 44
7 1 11 176 -
608.57
8 1 10 49 8
9 1 170 50 -88.94
2. MI (1) Imputation Step
Y X1 X2
1 44 11 178
2 45 10 185
3 59 . .
4 49 9 .
5 60 8 170
6 50 . 44
7 11 176 .
8 10 49 8
9 170 50 .
Imputa-
tion Number
Y X1 X2
10 2 44 11 178
11 2 45 10 185
12 2 59 63.9
9-98.96
13 2 49 9 192.3
7
14 2 60 8 170
15 2 50 38.4
944
16 2 11 176 -
644.26
17 2 10 49 8
18 2 170 50 -97.00
Imputation Number
Y X1 X2
19 3 44 11 178
20 3 45 10 185
21 3 59 63.88 -121.12
22 3 49 9 185.82
23 3 60 8 170
24 3 50 33.65 44
25 3 11 176 -665.12
26 3 10 49 8
27 3 170 50 -189.96
Imputa-
tion Number
Y X1 X2
28 4 44 11 178
29 4 45 10 185
30 4 59 -42.87 458.6
0
31 4 49 9 179.0
7
32 4 60 8 170
33 4 50 33.60 44
34 4 11 176 -
706.87
35 4 10 49 8
36 4 170 50 -
212.18
Imputa-
tion Number
Y X1 X2
37 5 44 11 178
38 5 45 10 185
39 5 59 1.64 213.9
4
40 5 49 9 182.0
8
41 5 60 8 170
42 5 50 33.16 44
43 5 11 176 -
720.92
44 5 10 49 8
45 5 170 50 -
222.16
Methods of handling Missing data
background
“5 complete datasets”
![Page 8: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/8.jpg)
2. MI (2) Analysis Step
Imputa-
tion Number
Label of model
Type of statis-tics
Variable names for rows of
estimated COV
Depen-dent vari-
able
Root mean squared error
Inter-cept
X1 X2 Y
1 1 MODEL1 PARMS Y 9.49 417.91 -7.96 -1.64 -12 1 MODEL1 COV Intercept Y 9.49 722.00 -15.61 -3.26 . 3 1 MODEL1 COV X1 Y 9.49 -15.61 0.34 0.07 . 4 1 MODEL1 COV X2 Y 9.49 -3.26 0.07 0.02 . 5 2 MODEL1 PARMS Y 11.80 405.16 -7.81 -1.53 -16 2 MODEL1 COV Intercept Y 11.80 1052.74 -23.16 -4.60 . 7 2 MODEL1 COV X1 Y 11.80 -23.16 0.52 0.10 . 8 2 MODEL1 COV X2 Y 11.80 -4.60 0.10 0.02 . 9 3 MODEL1 PARMS Y 3.86 233.43 -4.31 -0.80 -1
10 3 MODEL1 COV Intercept Y 3.86 28.82 -0.66 -0.12 . 11 3 MODEL1 COV X1 Y 3.86 -0.66 0.02 0.00 . 12 3 MODEL1 COV X2 Y 3.86 -0.12 0.00 0.00 . 13 4 MODEL1 PARMS Y 1.76 221.04 -4.17 -0.74 -114 4 MODEL1 COV Intercept Y 1.76 5.20 -0.12 -0.02 . 15 4 MODEL1 COV X1 Y 1.76 -0.12 0.00 0.00 . 16 4 MODEL1 COV X2 Y 1.76 -0.02 0.00 0.00 . 17 5 MODEL1 PARMS Y 1.46 215.80 -4.08 -0.71 -118 5 MODEL1 COV Intercept Y 1.46 3.36 -0.08 -0.01 . 19 5 MODEL1 COV X1 Y 1.46 -0.08 0.00 0.00 . 20 5 MODEL1 COV X2 Y 1.46 -0.01 0.00 0.00 .
* Standard statistical procedure > regression for each complete datasets (5) separately
Methods of handling Missing data
background
Analyzed 5 times
![Page 9: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/9.jpg)
2. MI (3) Combination Step
> the results from 5 data are combined to ONE with combination equations.
1. Combined estimate:
2. Variance Total:
3. Var. Within:
4. Var. Between:
5. DF:
6. Fraction missing Info. :
7. Confidence Interval:
Methods of handling Miss-ing data
background
combined to 1 result
![Page 10: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/10.jpg)
* Comparison of methods to handle missing values
Methods of handling Missing data
Criteria CCA ACA Mean Im-putation
EM method
MultipleImputation
Unbiased Parameter
Estimation
MCAR O X X O O
MAR X X X O O
MNAR X X X X X
Good EstimatesVariability X X X X O
Best Statistical Power X O O O O
background
MI is the BEST!!
Excellent Estimation
Variance among
‘M’est. b/c multiply imputed data
by not deleting any
cases
![Page 11: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/11.jpg)
(1) Imputation step of MI : imputation mechanisms for substituting missing values
Imputation Mechanisms background
Pattern Type NormalityImputation mechanisms
Univariate Monotone Continuous O Regression
Univariate Monotone Continuous XPredictive
Mean Match-ing
Multivariate Not
Monotone Continuous - MCMC
MCMC is NOT tested to Univariate
![Page 12: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/12.jpg)
* 3000 obs. are generated on Z1, and X1,…,X6 (all variables are continuous)
( Xs: observed variables and Z: partly missing var. )
* Z1, and X1,…,X6 are drawn from multivariate normal dist with
Means = 0 and Correlation =
DataData
x6 0.1052 0.1124 -0.0061 -0.0764 0.1157 0.0420 1.0000 x5 0.2924 0.3581 0.8062 -0.0640 0.0441 1.0000 x4 0.1612 0.1415 -0.0063 -0.0738 1.0000 x3 0.0509 0.0351 0.5352 1.0000 x2 0.2764 0.3233 1.0000 x1 0.7655 1.0000 z1 1.0000 z1 x1 x2 x3 x4 x5 x6
Simulated Data
![Page 13: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/13.jpg)
* 3154 obs. (all variables are continuous)
- Missing variable: Systolic Blood Pressure (Mean: 128.63)
- Observed variables: DBP(82.02), height(69.78), weight(169.95), age(46.28),
BMI(24.52), and Cholesterol (Mean: 226.37)
* Correlation =
DataData
chol 0.1231 0.1296 -0.0889 0.0085 0.0892 0.0706 1.0000 bmi 0.2878 0.3428 -0.0633 0.8079 0.0256 1.0000 age 0.1701 0.1440 -0.0919 -0.0331 1.0000 weight 0.2513 0.2940 0.5333 1.0000 height 0.0156 0.0070 1.0000 dbp 0.7700 1.0000 sbp 1.0000 sbp dbp height weight age bmi chol
Example Data (“A Predictive Study of Coronary Heart Disease” )
![Page 14: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/14.jpg)
Method 1. Missing Mechanisms
1) MCAR: Randomly Z1(SBP) deleted
2) MAR: After sorting by one of X(obs.var), Z1(SBP) deleted
3) NMAR: After sorting by Z1(SBP), Z1(SBP) deleted
2. Biasness mainly measured by
RMSE (Root Mean Square Error)= Sqrt (Variance of Estimates + Bias^2)
: captures estimates’ Accuracy and Variability
and compares them in the same units.
* True value= Mean of Z1 (SBP) at 0% missing
* Estimate= Mean of Z1 (SBP) at 10% to 80% missing after MI
to 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%
Method
When RMSE “smaller” → Estimation “better”
![Page 15: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/15.jpg)
3. The method to deal with missing values (to measure effectiveness of MI)
Complete Case Analysis (CCA)
Multiple Imputation (MI)
4. Imputation numbers
M=10, 20, 30, 40, and 50 numbers
5. Imputation model
(z1= x1 x2 x3 x4 x5x6), (z1= x1 x2 x5), (z1= x3 x4x6)
all variable highly corr. var to z1 rarely corr. var
MethodMethod
z1=x1x2x5 model is best model
b/c smallest RMSE
![Page 16: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/16.jpg)
6. Imputation Mechanisms
7. 500 repetitions on each MI (to reduce random variability of imputation)
ex) M=10 *500 reps. → Average them→
…
M=50 *500 reps. → Average them→
8. Statistical Software
STATA11 (Multiple Imputation)
MethodMethod
Mean of Est. for M=10
Mean of Est. for M=50
Regression method PMM MCMC
![Page 17: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/17.jpg)
Result (simulated data) 1. CCA vs. MI* by RMSE
10%20%30%40%50%60%70%80%
0
0.02
0.04
0.06
0.08
0.1
0.12MCAR
CCA MI
RM
SE
10%20%
30%40%
50%60%
70%80%
0
0.05
0.1
0.15
0.2
0.25MAR
CCA MI
RM
SE
10%
20%
30%
40%
50%
60%
70%
80%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6NMAR
CCA MI
RM
SE
Proportion of missing data Proportion of missing dataProportion of missing data
Result
better
10%20%30%40%50%60%70%80%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6MCAR
CCA MI
RM
SE
10%20%
30%40%
50%60%
70%80%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6MAR
CCA MI
RM
SE
Under MCAR and MAR, both CCA and MI are Good. changing scale of Y axis, Under All missing mechanisms,
MI is better than CCA.
Percent of missing , RMSEs are linearly
& Diff. of RMSE b/w CCA and MI
> High amount of missing, using Multiple Imputation
![Page 18: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/18.jpg)
2. imputation numbers (simulated data)
10%
20%
30%
40%
50%
60%
70%
80%-0.2
-1.66533453693773E-16
0.2
0.4
0.6
0.8
1
1.2
MCAR
10 impute 20 impute 30 impute40 impute 50 impute
RM
SE
10%
20%
30%
40%
50%
60%
70%
80%-0.2
-1.66533453693773E-16
0.2
0.4
0.6
0.8
1
1.2MAR
10 impute 20 impute30 impute 40 impute50 impute
RM
SE
10%
20%
30%
40%
50%
60%
70%
80%
0
0.2
0.4
0.6
0.8
1
1.2NMAR
10 impute 20 impute 30 impute40 impute 50 impute
RM
SE
Proportion of missing dataProportion of missing data
Proportion of missing data
Result
Similar
(Regardless of imputation #)
Under MCAR and MAR, MI Good!
Under NMAR, MI biased est. at 80% missing
b/c large RMSE ≒ ( 1 SD of data=0.99 )
5 lines(M=10~M=50) go together and look like 1 line.
> No difference among diff. Imputation numbers(m)=
10, 20, 30, 40, 50.
![Page 19: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/19.jpg)
10%20%
30%40%
50%60%
70%80%
0
0.2
0.4
0.6
0.8
1
1.2
1.4NMAR
reg pmm mcmc
RM
SE
3. Regression, PMM, MCMC(simulated data)
1. Under MCAR and MAR, theoretically Reg. should be better because of normality,
but All method are good. However, Reg. method is slightly better under MAR.
2. Under NMAR, even though normality is not met, Reg. method is better than PMM.
Proportion of missing data
Result
MCMC/ Reg.10%
20%
30%
40%
50%
60%
70%
80%-0.2
-1.66533453693773E-160.20.40.60.8
11.21.4
MCAR
reg pmm mcmc
RM
SE
10%
20%
30%
40%
50%
60%
70%
80%-0.2
-1.66533453693773E-160.20.40.60.8
11.21.4
MAR
reg pmm mcmc
RM
SENormality Theory Practically (MI)
MCAR Normal Regression All imputation mechanisms
MAR Normal Regression All imputation mechanisms (Reg. slightly better)
NMAR Not Normal PMM Regression, MCMC
Proportion of missing data Proportion of missing data
*Normal assumption may not be important under NMAR.
*MCMC is good under all missing mechanisms.
Thus, MCMC can be used in univariate and continuous missing.
![Page 20: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/20.jpg)
Result (Example data) 1. CCA vs. MI* by RMSE
10%
20%
30%
40%
50%
60%
70%
80%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6MCAR
CCA MI
RM
SE
10% 20% 30% 40% 50% 60% 70% 80%0
0.5
1
1.5
2
2.5
3
3.5
4
MAR
CCA MI
RM
SE
10%
20%
30%
40%
50%
60%
70%
80%
0
2
4
6
8
10
12
14
16
18
20
NMAR
CCA MI
RM
SE
Proportion of missing data Proportion of missing data Proportion of missing data
Result
better
10%
20%
30%
40%
50%
60%
70%
80%
02468
101214161820
MCAR
CCA MI
RM
SE
10% 20% 30% 40% 50% 60% 70% 80%02468
101214161820
MAR
CCA MI
RM
SE
Under MCAR and MAR, both CCA and MI are Good. changing scale of Y axis, Under MCAR, MAR, and NMAR, MI produced significantly unbiased values than CCA.
Percent of missing , RMSEs are linearly
& Diff. of RMSE b/w CCA and MI
> High amount of missing, Multiple Imputation is preferable
![Page 21: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/21.jpg)
2. imputation numbers (example data)
10%
20%
30%
40%
50%
60%
70%
80%
02468
10121416
MCAR
10 impute 20 impute30 impute 40 impute50 impute
RM
SE
10%
20%
30%
40%
50%
60%
70%
80%
0
2
4
6
8
10
12
14
16MAR
10 impute 20 impute30 impute 40 impute50 impute
RM
SE
10% 20% 30% 40% 50% 60% 70% 80%0
2
4
6
8
10
12
14
16NMAR
10 impute 20 impute30 impute 40 impute50 impute
RM
SE
Proportion of missing dataProportion of missing dataProportion of missing data
Result
Similar
(Regardless of imputation # and percent of missing )
Under MCAR and MAR, MI produces unbiased est.
Under NMAR, MI did not well at 80% missing
due to large RMSE ≒ ( 1 SD of data=15.11 )
No difference among increased Imputation numbers =
10, 20, 30, 40, 50
> Increased Imputation numbers No sign. effect to
correct bias in this data characteristics.
![Page 22: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/22.jpg)
10%
20%
30%
40%
50%
60%
70%
80%
02468
1012141618 NMAR
reg pmm mcmc
RM
SE
Proportion of missing data
Result
MCMC/ Reg.
3. Regression, PMM, MCMC(example data)
10%
20%
30%
40%
50%
60%
70%
80%
02468
1012141618
MCAR
reg pmm mcmc
RM
SE
10%
20%
30%
40%
50%
60%
70%
80%
02468
1012141618
MAR
reg pmm mcmc
RM
SE
Proportion of missing dataProportion of missing data
Normality Theory Practically(MI)
MCAR Not Normal PMM All missing mechanisms
MAR Not Normal PMM All missing mechanisms (PMM method slightly better )
NMAR Not Normal PMM Regression, MCMC
1.Under MCAR and MAR, theoretically PMM should be better because normal assump-
tion is broken, but All method are good.
However, PMM method is slightly better under MAR.
2. Under NMAR, even though normality is not met, Reg. has lower RMSE than PMM.
*Normal assumption maybe important only under MAR.
*MCMC is good to use under MCAR, MAR, and NMAR.
Thus, MCMC can be used not only in multivariate and continuous
missing, but also in univariate and continuous missing.
![Page 23: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/23.jpg)
Conclusion
1. Multiple Imputation (MI) > Complete Case Analysis always.
2. No significant difference in imputation numbers in my data.
3. Under MCAR and MAR, MI produce unbiased estimates at high amount of missing.
4. However, under NMAR, the estimation by MI is also biased at high amount of missing.
5. MCMC is good for univariate and continuous missing under MCAR, MAR and NMAR.
Conclusion
![Page 24: Texas A&M HSC Jin is designed by Dr. Huber. Korean Female Colon Cancer Risk Factors Range EventNon-event HR95% CIP n%n% Smoking Habits Missing144940079.57407195.70----](https://reader037.vdocuments.net/reader037/viewer/2022103121/56649c755503460f949286ef/html5/thumbnails/24.jpg)
T h a n k y u