simulation study - handling missing covariates in the context of external validation

1
POSTER PRESENTATION Open Access Simulation study - handling missing covariates in the context of external validation Laura J Bonnett 1* , Tony G Marson 2 , Paula R Williamson 1 , Catrin Tudur-Smith 1 From Clinical Trials Methodology Conference 2011 Bristol, UK. 4-5 October 2011 Objectives Before a predictive or prognostic model, often developed using data from clinical trials, can be introduced into general practice, it needs to be externally validated to ensure that it performs satisfactorily in data sets that are fully independent of the development data. Various methods exist to handle covariates with missing data and many of these are regularly employed in the analysis of data from a single trial. We propose that several of these strategies may be adapted to handle covariates with every entry missing in the context of external validation. Methods A simulation study was undertaken to test the suitability of our five proposed strategies: (1) random selection with replacement, (2) hot deck imputation, (3) single imputation via estimation, (4) random selection with replacement multiple times, (5) using only covariates common to both development and validation data sets. Survival times were simulated via the Cox-exponential distribution with a binary censoring indicator variable. Up to two binary, two continuous and two categorical covariates were simulated via binomial, log-normal and multinomial distributions respectively. To assess how the methods perform in general three statistics were cal- culated across 1000 bootstrap samples: (i) estimated regression coefficients from the model fit to the valida- tion set, (ii) associated standard deviations, (iii) mean square errors of the parameter estimates from the devel- opment and validation sets. Results Preliminary results suggest that random selection with replacement multiple times was the most consistent method; the mean difference between the actual regres- sion coefficients from the development set and those estimated from the validation set was only 0.02 whereas it was 0.10 for random selection with replacement, 0.09 for imputation via estimation and 0.05 for hot-deck imputation. Standard deviations were fairly constant across methods (1) to (4). Results for method (5) are to follow together with mean square errors for all five methods. Conclusion Random selection with replacement multiple times may offer a solution to externally validating a predictive or prognostic model when at least one covariate is missing from the validation data set. The simulation study described is an over-simplification of reality so leads to more favourable results than can be expected in every- day applications. Similarly it does not consider associa- tions between variables. Further work is required to determine how the methods perform in alternative set- tings and also in real life. Acknowledgements This programme (RP-PG-0606-1062) receives financial support from the National Institute for Health Research (NIHR) Programme Grants for Applied Research funding scheme. Author details 1 Department of Biostatistics, University of Liverpool, Liverpool, L69 3GS, UK. 2 Clinical and Molecular Pharmacology, University of Liverpool, Liverpool, L69 3GS, UK. Published: 13 December 2011 doi:10.1186/1745-6215-12-S1-A62 Cite this article as: Bonnett et al.: Simulation study - handling missing covariates in the context of external validation. Trials 2011 12(Suppl 1): A62. * Correspondence: [email protected] 1 Department of Biostatistics, University of Liverpool, Liverpool, L69 3GS, UK Full list of author information is available at the end of the article Bonnett et al. Trials 2011, 12(Suppl 1):A62 http://www.trialsjournal.com/content/12/S1/A62 TRIALS © 2011 Bonnett et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: laura-j-bonnett

Post on 06-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simulation study - handling missing covariates in the context of external validation

POSTER PRESENTATION Open Access

Simulation study - handling missing covariates inthe context of external validationLaura J Bonnett1*, Tony G Marson2, Paula R Williamson1, Catrin Tudur-Smith1

From Clinical Trials Methodology Conference 2011Bristol, UK. 4-5 October 2011

ObjectivesBefore a predictive or prognostic model, often developedusing data from clinical trials, can be introduced intogeneral practice, it needs to be externally validated toensure that it performs satisfactorily in data sets that arefully independent of the development data. Variousmethods exist to handle covariates with missing data andmany of these are regularly employed in the analysis ofdata from a single trial. We propose that several of thesestrategies may be adapted to handle covariates with everyentry missing in the context of external validation.

MethodsA simulation study was undertaken to test the suitabilityof our five proposed strategies: (1) random selectionwith replacement, (2) hot deck imputation, (3) singleimputation via estimation, (4) random selection withreplacement multiple times, (5) using only covariatescommon to both development and validation data sets.Survival times were simulated via the Cox-exponentialdistribution with a binary censoring indicator variable.Up to two binary, two continuous and two categoricalcovariates were simulated via binomial, log-normal andmultinomial distributions respectively. To assess howthe methods perform in general three statistics were cal-culated across 1000 bootstrap samples: (i) estimatedregression coefficients from the model fit to the valida-tion set, (ii) associated standard deviations, (iii) meansquare errors of the parameter estimates from the devel-opment and validation sets.

ResultsPreliminary results suggest that random selection withreplacement multiple times was the most consistent

method; the mean difference between the actual regres-sion coefficients from the development set and thoseestimated from the validation set was only 0.02 whereasit was 0.10 for random selection with replacement, 0.09for imputation via estimation and 0.05 for hot-deckimputation. Standard deviations were fairly constantacross methods (1) to (4). Results for method (5) are tofollow together with mean square errors for all fivemethods.

ConclusionRandom selection with replacement multiple times mayoffer a solution to externally validating a predictive orprognostic model when at least one covariate is missingfrom the validation data set. The simulation studydescribed is an over-simplification of reality so leads tomore favourable results than can be expected in every-day applications. Similarly it does not consider associa-tions between variables. Further work is required todetermine how the methods perform in alternative set-tings and also in real life.

AcknowledgementsThis programme (RP-PG-0606-1062) receives financial support from theNational Institute for Health Research (NIHR) Programme Grants for AppliedResearch funding scheme.

Author details1Department of Biostatistics, University of Liverpool, Liverpool, L69 3GS, UK.2Clinical and Molecular Pharmacology, University of Liverpool, Liverpool, L693GS, UK.

Published: 13 December 2011

doi:10.1186/1745-6215-12-S1-A62Cite this article as: Bonnett et al.: Simulation study - handling missingcovariates in the context of external validation. Trials 2011 12(Suppl 1):A62.* Correspondence: [email protected]

1Department of Biostatistics, University of Liverpool, Liverpool, L69 3GS, UKFull list of author information is available at the end of the article

Bonnett et al. Trials 2011, 12(Suppl 1):A62http://www.trialsjournal.com/content/12/S1/A62 TRIALS

© 2011 Bonnett et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.