adv. stat. topicsa-missingdata - biostatistics.dkexample: simplemissinginformationsetup i...
TRANSCRIPT
![Page 1: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/1.jpg)
Adv. Stat. Topics A - Missing dataAfternoon session
Anne Helby Petersen
November 5, 2019
![Page 2: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/2.jpg)
Program outline
12.00-12.45: Imputation and Multiple Imputation using ChainedEquations (MICE)
12.45-14.30: Work with data: Data analysis with missinginformation
13.30-15.00: Further perspectives and more resources
![Page 3: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/3.jpg)
Example: Simple missing information setup
I Imagine that we wish to estimate the effect of X on Y ,controlling for Z .
I X suffers from missing information (MCAR). Assume that weorder the observations such that X1, ...,Xd have missinginformation, while Xd+1, ...,Xn are fully observed.
I Assume that Y and Z are all fully observed.
![Page 4: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/4.jpg)
Simulating a small dataset in R
n <- 200set.seed(1331)Z <- rnorm(n, sd = 1)X <- Z + rnorm(n, sd = 1)Y <- 2*X + Z + rnorm(n, sd = 2)
true_X <- Xtrue_xmean <- mean(X)true_xsd <- sd(X)true_model <- lm(Y ~ X + Z)
d <- 40X[1:d] <- NA
X[36:40]
## [1] NA NA NA NA NA
X[41:45]
## [1] -0.9404489 0.7807026 1.9016603 -0.3728711 -0.5331431
![Page 5: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/5.jpg)
Simulating a small dataset in Rn <- 200set.seed(1331)Z <- rnorm(n, sd = 1)X <- Z + rnorm(n, sd = 1)Y <- 2*X + Z + rnorm(n, sd = 2)
true_X <- Xtrue_xmean <- mean(X)true_xsd <- sd(X)true_model <- lm(Y ~ X + Z)
d <- 40X[1:d] <- NA
X[36:40]
## [1] NA NA NA NA NA
X[41:45]
## [1] -0.9404489 0.7807026 1.9016603 -0.3728711 -0.5331431
![Page 6: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/6.jpg)
Simulating a small dataset in Rn <- 200set.seed(1331)Z <- rnorm(n, sd = 1)X <- Z + rnorm(n, sd = 1)Y <- 2*X + Z + rnorm(n, sd = 2)
true_X <- Xtrue_xmean <- mean(X)true_xsd <- sd(X)true_model <- lm(Y ~ X + Z)
d <- 40X[1:d] <- NA
X[36:40]
## [1] NA NA NA NA NA
X[41:45]
## [1] -0.9404489 0.7807026 1.9016603 -0.3728711 -0.5331431
![Page 7: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/7.jpg)
Simulating a small dataset in Rn <- 200set.seed(1331)Z <- rnorm(n, sd = 1)X <- Z + rnorm(n, sd = 1)Y <- 2*X + Z + rnorm(n, sd = 2)
true_X <- Xtrue_xmean <- mean(X)true_xsd <- sd(X)true_model <- lm(Y ~ X + Z)
d <- 40X[1:d] <- NA
X[36:40]
## [1] NA NA NA NA NA
X[41:45]
## [1] -0.9404489 0.7807026 1.9016603 -0.3728711 -0.5331431
![Page 8: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/8.jpg)
Simulating a small dataset in Rn <- 200set.seed(1331)Z <- rnorm(n, sd = 1)X <- Z + rnorm(n, sd = 1)Y <- 2*X + Z + rnorm(n, sd = 2)
true_X <- Xtrue_xmean <- mean(X)true_xsd <- sd(X)true_model <- lm(Y ~ X + Z)
d <- 40X[1:d] <- NA
X[36:40]
## [1] NA NA NA NA NA
X[41:45]
## [1] -0.9404489 0.7807026 1.9016603 -0.3728711 -0.5331431
![Page 9: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/9.jpg)
A quick overview of the data (no missing info.)
![Page 10: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/10.jpg)
Imputation
Imputation: Filling out missing values in the data.
![Page 11: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/11.jpg)
Mean imputation (1/2)
Mean imputation: Insert the mean (or mode) of Xd+1, ...,Xn into allX1, ...,Xd .
X_meanimp <- X
xobs_mean <- mean(X[(d+1):n])
X_meanimp[1:d] <- xobs_mean
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_meanimp)
## [1] -0.07813252
## [1] -0.1551874
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_meanimp)
## [1] 1.368721
## [1] 1.240263
![Page 12: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/12.jpg)
Mean imputation (1/2)
Mean imputation: Insert the mean (or mode) of Xd+1, ...,Xn into allX1, ...,Xd .X_meanimp <- X
xobs_mean <- mean(X[(d+1):n])
X_meanimp[1:d] <- xobs_mean
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_meanimp)
## [1] -0.07813252
## [1] -0.1551874
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_meanimp)
## [1] 1.368721
## [1] 1.240263
![Page 13: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/13.jpg)
Mean imputation (1/2)
Mean imputation: Insert the mean (or mode) of Xd+1, ...,Xn into allX1, ...,Xd .X_meanimp <- X
xobs_mean <- mean(X[(d+1):n])
X_meanimp[1:d] <- xobs_mean
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_meanimp)
## [1] -0.07813252
## [1] -0.1551874
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_meanimp)
## [1] 1.368721
## [1] 1.240263
![Page 14: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/14.jpg)
Mean imputation (1/2)
Mean imputation: Insert the mean (or mode) of Xd+1, ...,Xn into allX1, ...,Xd .X_meanimp <- X
xobs_mean <- mean(X[(d+1):n])
X_meanimp[1:d] <- xobs_mean
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_meanimp)
## [1] -0.07813252
## [1] -0.1551874
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_meanimp)
## [1] 1.368721
## [1] 1.240263
![Page 15: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/15.jpg)
Mean imputation (1/2)
Mean imputation: Insert the mean (or mode) of Xd+1, ...,Xn into allX1, ...,Xd .X_meanimp <- X
xobs_mean <- mean(X[(d+1):n])
X_meanimp[1:d] <- xobs_mean
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_meanimp)
## [1] -0.07813252
## [1] -0.1551874
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_meanimp)
## [1] 1.368721
## [1] 1.240263
![Page 16: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/16.jpg)
Mean imputation (1/2)
Mean imputation: Insert the mean (or mode) of Xd+1, ...,Xn into allX1, ...,Xd .X_meanimp <- X
xobs_mean <- mean(X[(d+1):n])
X_meanimp[1:d] <- xobs_mean
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_meanimp)
## [1] -0.07813252
## [1] -0.1551874
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_meanimp)
## [1] 1.368721
## [1] 1.240263
![Page 17: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/17.jpg)
Mean imputation (2/2)
Comparing model coefficients:
round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_meanimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.0709 0.1623 0.4371 0.6626## X_meanimp 1.7887 0.1639 10.9102 0.0000## Z 1.6266 0.2150 7.5675 0.0000
Conclusion: Don’t do mean imputation!
![Page 18: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/18.jpg)
Mean imputation (2/2)
Comparing model coefficients:round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_meanimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.0709 0.1623 0.4371 0.6626## X_meanimp 1.7887 0.1639 10.9102 0.0000## Z 1.6266 0.2150 7.5675 0.0000
Conclusion: Don’t do mean imputation!
![Page 19: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/19.jpg)
Mean imputation (2/2)
Comparing model coefficients:round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_meanimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.0709 0.1623 0.4371 0.6626## X_meanimp 1.7887 0.1639 10.9102 0.0000## Z 1.6266 0.2150 7.5675 0.0000
Conclusion: Don’t do mean imputation!
![Page 20: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/20.jpg)
Mean imputation (2/2)
Comparing model coefficients:round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_meanimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.0709 0.1623 0.4371 0.6626## X_meanimp 1.7887 0.1639 10.9102 0.0000## Z 1.6266 0.2150 7.5675 0.0000
Conclusion: Don’t do mean imputation!
![Page 21: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/21.jpg)
Hot deck imputation (1/2)
Hot deck imputation (simplest version): For each missing value, X1, ...,Xd ,pick and insert a random value among the observed values Xd+1, ...,Xn.
X_hdimp <- X
set.seed(13)X_hdimp[1:d] <- sample(X[(d+1):n], size = d,
replace = TRUE)
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_hdimp)
## [1] -0.07813252
## [1] -0.1048372
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_hdimp)
## [1] 1.368721
## [1] 1.3303
![Page 22: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/22.jpg)
Hot deck imputation (1/2)
Hot deck imputation (simplest version): For each missing value, X1, ...,Xd ,pick and insert a random value among the observed values Xd+1, ...,Xn.X_hdimp <- X
set.seed(13)X_hdimp[1:d] <- sample(X[(d+1):n], size = d,
replace = TRUE)
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_hdimp)
## [1] -0.07813252
## [1] -0.1048372
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_hdimp)
## [1] 1.368721
## [1] 1.3303
![Page 23: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/23.jpg)
Hot deck imputation (1/2)
Hot deck imputation (simplest version): For each missing value, X1, ...,Xd ,pick and insert a random value among the observed values Xd+1, ...,Xn.X_hdimp <- X
set.seed(13)X_hdimp[1:d] <- sample(X[(d+1):n], size = d,
replace = TRUE)
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_hdimp)
## [1] -0.07813252
## [1] -0.1048372
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_hdimp)
## [1] 1.368721
## [1] 1.3303
![Page 24: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/24.jpg)
Hot deck imputation (1/2)
Hot deck imputation (simplest version): For each missing value, X1, ...,Xd ,pick and insert a random value among the observed values Xd+1, ...,Xn.X_hdimp <- X
set.seed(13)X_hdimp[1:d] <- sample(X[(d+1):n], size = d,
replace = TRUE)
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_hdimp)
## [1] -0.07813252
## [1] -0.1048372
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_hdimp)
## [1] 1.368721
## [1] 1.3303
![Page 25: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/25.jpg)
Hot deck imputation (1/2)
Hot deck imputation (simplest version): For each missing value, X1, ...,Xd ,pick and insert a random value among the observed values Xd+1, ...,Xn.X_hdimp <- X
set.seed(13)X_hdimp[1:d] <- sample(X[(d+1):n], size = d,
replace = TRUE)
#Compare mean for full X with mean of mean imputed Xtrue_xmean; mean(X_hdimp)
## [1] -0.07813252
## [1] -0.1048372
#Compare sd for full X with sd of mean imputed Xtrue_xsd; sd(X_hdimp)
## [1] 1.368721
## [1] 1.3303
![Page 26: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/26.jpg)
Hot deck imputation (2/2)
Comparing model coefficients:
round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_hdimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0338 0.1618 -0.2088 0.8348## X_hdimp 1.6427 0.1519 10.8141 0.0000## Z 1.6712 0.2136 7.8233 0.0000
Conclusion: Don’t do hot deck imputation!
![Page 27: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/27.jpg)
Hot deck imputation (2/2)
Comparing model coefficients:round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_hdimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0338 0.1618 -0.2088 0.8348## X_hdimp 1.6427 0.1519 10.8141 0.0000## Z 1.6712 0.2136 7.8233 0.0000
Conclusion: Don’t do hot deck imputation!
![Page 28: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/28.jpg)
Hot deck imputation (2/2)
Comparing model coefficients:round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_hdimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0338 0.1618 -0.2088 0.8348## X_hdimp 1.6427 0.1519 10.8141 0.0000## Z 1.6712 0.2136 7.8233 0.0000
Conclusion: Don’t do hot deck imputation!
![Page 29: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/29.jpg)
Hot deck imputation (2/2)
Comparing model coefficients:round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_hdimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0338 0.1618 -0.2088 0.8348## X_hdimp 1.6427 0.1519 10.8141 0.0000## Z 1.6712 0.2136 7.8233 0.0000
Conclusion: Don’t do hot deck imputation!
![Page 30: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/30.jpg)
Regression imputation (1/3)
Regression imputation: Fit a regression model for all the observations, e.g.,
Xi = α+ β1 · Yi + β2 · Zi + εi
for i = d + 1, ..., n and use this model to predict values for the remainingX1, ...,Xd .
obsdata <- data.frame(X = X[(d+1):n],Y = Y[(d+1):n],Z = Z[(d+1):n])
m_regimp <- lm(X ~ Y + Z, obsdata)X_regimp <- XX_regimp[1:d] <- predict(m_regimp,
newdata = data.frame(Y = Y[1:d],Z = Z[1:d]))
![Page 31: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/31.jpg)
Regression imputation (1/3)
Regression imputation: Fit a regression model for all the observations, e.g.,
Xi = α+ β1 · Yi + β2 · Zi + εi
for i = d + 1, ..., n and use this model to predict values for the remainingX1, ...,Xd .
obsdata <- data.frame(X = X[(d+1):n],Y = Y[(d+1):n],Z = Z[(d+1):n])
m_regimp <- lm(X ~ Y + Z, obsdata)X_regimp <- XX_regimp[1:d] <- predict(m_regimp,
newdata = data.frame(Y = Y[1:d],Z = Z[1:d]))
![Page 32: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/32.jpg)
Regression imputation (2/3)
#Compare mean for full X with mean of reg. imputed Xtrue_xmean; mean(X_regimp)
## [1] -0.07813252
## [1] -0.1177343
#Compare sd for full X with mean of reg. imputed Xtrue_xsd; sd(X_regimp)
## [1] 1.368721
## [1] 1.352262
![Page 33: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/33.jpg)
Regression imputation (3/3)
round(summary(true_model)$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.0532 0.1392 -0.3822 0.7028## X 2.0756 0.1382 15.0158 0.0000## Z 1.0260 0.2000 5.1297 0.0000
round(summary(lm(Y ~ X_regimp + Z))$coefficients,4)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.0505 0.1256 0.4024 0.6878## X_regimp 2.2815 0.1264 18.0525 0.0000## Z 0.8383 0.1807 4.6401 0.0000
Conclusion: Don’t do regression imputation!
![Page 34: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/34.jpg)
Stochastic regression imputationStochastic regression imputation: Perform regression imputation, but addnoise to the predictions by sampling from the residuals from the fittedmodel.
X_stocregimp <- X; set.seed(2)X_stocregimp[1:d] <- X_regimp[1:d] +
sample(residuals(m_regimp), size = d,replace = TRUE)
#Estimate from model with full Xround(summary(true_model)$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0756 0.1382 15.0158 0.0000
#Estimate from model with X imputed by stochastic regressionround(summary(lm(Y ~ X_stocregimp + Z))$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0606 0.1278 16.1221 0.0000
Problem: The variance is still underestimated.
![Page 35: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/35.jpg)
Stochastic regression imputationStochastic regression imputation: Perform regression imputation, but addnoise to the predictions by sampling from the residuals from the fittedmodel.X_stocregimp <- X; set.seed(2)X_stocregimp[1:d] <- X_regimp[1:d] +
sample(residuals(m_regimp), size = d,replace = TRUE)
#Estimate from model with full Xround(summary(true_model)$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0756 0.1382 15.0158 0.0000
#Estimate from model with X imputed by stochastic regressionround(summary(lm(Y ~ X_stocregimp + Z))$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0606 0.1278 16.1221 0.0000
Problem: The variance is still underestimated.
![Page 36: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/36.jpg)
Stochastic regression imputationStochastic regression imputation: Perform regression imputation, but addnoise to the predictions by sampling from the residuals from the fittedmodel.X_stocregimp <- X; set.seed(2)X_stocregimp[1:d] <- X_regimp[1:d] +
sample(residuals(m_regimp), size = d,replace = TRUE)
#Estimate from model with full Xround(summary(true_model)$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0756 0.1382 15.0158 0.0000
#Estimate from model with X imputed by stochastic regressionround(summary(lm(Y ~ X_stocregimp + Z))$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0606 0.1278 16.1221 0.0000
Problem: The variance is still underestimated.
![Page 37: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/37.jpg)
The problem with single imputation strategies
Imputing one value for a missing datum cannot be correct ingeneral, because we don’t know what value to impute withcertainty (if we did, it wouldn’t be missing).— Donald B. Rubin
![Page 38: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/38.jpg)
Multiple imputation
(Figure 1.6 from van Buuren 2019)
![Page 39: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/39.jpg)
Variance under imputationRecall: Variance measures the uncertainty of our estimate if we were torepeat the whole thing.
Problem: Variance accumulates; we need to use the (uncertain) sample toestimate the missing data model.
![Page 40: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/40.jpg)
Variance under imputationRecall: Variance measures the uncertainty of our estimate if we were torepeat the whole thing.
Problem: Variance accumulates; we need to use the (uncertain) sample toestimate the missing data model.
![Page 41: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/41.jpg)
Variance under imputationRecall: Variance measures the uncertainty of our estimate if we were torepeat the whole thing.
Problem: Variance accumulates; we need to use the (uncertain) sample toestimate the missing data model.
![Page 42: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/42.jpg)
Variance accumulates
http://xkcd.com/1739/
![Page 43: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/43.jpg)
Total variance (following van Buuren 2019)
It can be shown mathematically that
Total variance = U + B + B · 1m
where m is the number of imputations and
U is the variance due to using a sample rather than thefull population.
B is the extra variance due to there being missing values.
B · 1m is the extra variance due to having to estimate the
missing data model.
The collective method for obtaining a correct estimate of the totalvariance (T ) by use of multiple imputations is referred to as Rubin’srules.
Note: Larger m makes the last term small.
![Page 44: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/44.jpg)
Total variance (following van Buuren 2019)
It can be shown mathematically that
Total variance = U + B + B · 1m
where m is the number of imputations and
U is the variance due to using a sample rather than thefull population.
B is the extra variance due to there being missing values.
B · 1m is the extra variance due to having to estimate the
missing data model.
The collective method for obtaining a correct estimate of the totalvariance (T ) by use of multiple imputations is referred to as Rubin’srules.
Note: Larger m makes the last term small.
![Page 45: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/45.jpg)
Multiple imputation by chained equations (MICE)
I A specific algorithm (method) for performing data analysis withmissing information.
I Also known as imputation with fully conditional specification(FCS).
I Specifies imputation models variable-by-variable for eachvariable with missing information.
I
![Page 46: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/46.jpg)
Multiple imputation by chained equations (MICE)
I A specific algorithm (method) for performing data analysis withmissing information.
I Also known as imputation with fully conditional specification(FCS).
I Specifies imputation models variable-by-variable for eachvariable with missing information.
I Iteratively updates best guesses to allow all variables (eventhose with missing information) to inform the imputation ofthe others.
![Page 47: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/47.jpg)
Multiple imputation by chained equations (MICE)
I A specific algorithm (method) for performing data analysis withmissing information.
I Also known as imputation with fully conditional specification(FCS).
I Specifies imputation models variable-by-variable for eachvariable with missing information.
I Iteratively updates best guesses to allow all variables(even those with missing information) to inform theimputation of the others.
![Page 48: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/48.jpg)
MICE: Sequential guessing (heuristically)
I Assume both X and Z have missing information.
I Let Xobs and Zobs denote the observed values of X and Z ,respectively.
![Page 49: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/49.jpg)
MICE in R
MICE is implemented in the mice package in R:library(mice)data <- data.frame(X = X, Y = Y, Z = Z)set.seed(212)imps <- mice(data, print = FALSE, m = 5)fits <- with(imps, lm(Y ~ X + Z))res <- pool(fits)
summary(res)[, c(1,2,5)]
## estimate std.error p.value## (Intercept) 0.03337294 0.1445683 0.817961274171## X 2.11212569 0.1416823 0.000000000000## Z 0.96902600 0.2014753 0.000004352766
![Page 50: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/50.jpg)
MICE compared with stochastic regression imputation
#Estimate from model with full Xround(summary(true_model)$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0756 0.1382 15.0158 0.0000
#Estimate from model with X imputed by stochastic regressionround(summary(lm(Y ~ X_stocregimp + Z))$coefficients,4)[2,]
## Estimate Std. Error t value Pr(>|t|)## 2.0606 0.1278 16.1221 0.0000
#Estimate from mice model (default settings)round(summary(res)[2, c(1,2,3,5)],4)
## estimate std.error statistic p.value## X 2.1121 0.1417 14.9075 0
![Page 51: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/51.jpg)
Inspecting the variance components from mice
Note: mice delivers estimates of B (b), U (ubar), T (t = std.error2),as well as λ = B·(1+1/m)
T (lambda), riv = B·(1+1/m)U (riv) and more:
> round(summary(res, type = "all"),4)estimate std.error statistic df p.value
(Intercept) 0.0269 0.1465 0.1840 85.8082 0.8545X 2.0990 0.1429 14.6865 84.4084 0.0000Z 1.0358 0.2096 4.9410 73.4801 0.0000
riv lambda fmi ubar b t dfcom(Intercept) 0.1764 0.1500 0.1691 0.0182 0.0027 0.0214 197X 0.1797 0.1523 0.1717 0.0173 0.0026 0.0204 197Z 0.2081 0.1722 0.1939 0.0364 0.0063 0.0439 197
![Page 52: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/52.jpg)
Variable level imputation models
Default choices in mice package:
Numerical variables:Predictive mean matching (pmm). A fusion between regressionimputation and hot deck imputation: Use regression to find aselection of plausible “donor values”, choose one at random amongthem.
Categorical variables (> 2 categories):Multinomial logistic regression (polyreg). A regression imputationmethod.
Categorical variables ( = 2 categories):Logistic regression (logreg). A regression imputation method.
Categorical variables (ordered categories):Ordered logistic regression (polr). A regression imputation method.
![Page 53: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/53.jpg)
Data exercise: Analyze alcodata
→ Go to “Exercise: Analyze” on the course website
https://biostatistics.dk/teaching/advtopicsA/notes.html
and work through the questions in small groups.
→ Add information to the Google slide show (“analyze”)corresponding to your dataset at
https://bit.ly/2N5MoTQ
We will discuss your findings around 14:00.
![Page 54: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/54.jpg)
![Page 55: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/55.jpg)
Back to the Elderly study
![Page 56: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/56.jpg)
Elderly sensitivity analyses - models
We fitted five additional models:
MiD Missing is drinking approach: Treating all missing observations asrelapsers (non-controlled consumption).
MiCC Missing is CC approach: Treating all missing observations ascontrolled consumption.
METiD MET is drinking approach: Treating missing observations for patientstreated with MET as drinking, while missing obsevations fromMET+CRA-patients are treated as controlled consumption.
METiCC MET is CC: Treating missing observations for patients treated withMET+CRA as drinking, while missing obsevations fromMET-patients are treated as controlled consumption.
MICE Mulitple imputation of missing observation using all variables fromthe primary model and controlled consumption information fromprevious time points.
![Page 57: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/57.jpg)
Elderly sensitivity analyses - results
![Page 58: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/58.jpg)
Further resources (1)
Excellent book by Stef van Buuren (2019)
https://stefvanbuuren.name/fimd/
![Page 59: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/59.jpg)
Further resources (2)
Multiple imputation for Cox models:
https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3618
![Page 60: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/60.jpg)
Further resources (3)
Guideline for MICE in practice:
https://onlinelibrary.wiley.com/doi/full/10.1002/sim.4067
![Page 61: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/61.jpg)
Further resources (4)
Multiple imputation with non-linear relationships:
https://doi.org/10.1177/0962280214521348
![Page 62: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/62.jpg)
Further resources (5)
Website with a very thorough collection of material on missing data,emphasis on tools in R:
https://rmisstastic.netlify.com
![Page 63: Adv. Stat. TopicsA-Missingdata - biostatistics.dkExample: Simplemissinginformationsetup I ImaginethatwewishtoestimatetheeffectofX onY, controllingforZ. I X suffersfrommissinginformation(MCAR).Assumethatwe](https://reader034.vdocuments.net/reader034/viewer/2022042203/5ea4fc6b57b7982e8b297eba/html5/thumbnails/63.jpg)
Comments/suggestions for this course day are very much welcome [email protected]