Multiple imputation of missing values

Multiple imputation of missing values

Author

Listed:

Patrick Royston
(MRC Clinical Trials Unit)

Abstract

Following the seminal publications of Rubin about thirty years ago, statisticians have become increasingly aware of the inadequacy of "complete-case" analysis of datasets with missing observations. In medicine, for example, observations may be missing in a sporadic way for different covariates, and a complete-case analysis may omit as many as half of the available cases. Hotdeck imputation was implemented in Stata in 1999 by Mander and Clayton. However, this technique may perform poorly when many rows of data have at least one missing value. This article describes an implementation for Stata of the MICE method of multiple multivariate imputation described by van Buuren, Boshuizen, and Knook (1999). MICE stands for multivariate imputation by chained equations. The basic idea of data analysis with multiple imputation is to create a small number (e.g., 5-10) of copies of the data, each of which has the missing values suitably imputed, and analyze each complete dataset independently. Estimates of parameters of interest are averaged across the copies to give a single estimate. Standard errors are computed according to the "Rubin rules", devised to allow for the between- and within-imputation components of variation in the parameter estimates. This article describes five ado-files. mvis creates multiple multivariate imputations. uvis imputes missing values for a single variable as a function of several covariates, each with complete data. micombine fits a wide variety of regression models to a multiply imputed dataset, combining the estimates using Rubin's rules, and supports survival analysis models (stcox and streg), categorical data models, generalized linear models, and more. Finally, misplit and mijoin are utilities to interconvert datasets created by mvis and by the miset program from John Carlin and colleagues. The use of the routines is illustrated with an example of prognostic modeling in breast cancer. Copyright 2004 by StataCorp LP.

Suggested Citation

Patrick Royston, 2004. "Multiple imputation of missing values," Stata Journal, StataCorp LLC, vol. 4(3), pages 227-241, September.

Handle: RePEc:tsj:stataj:v:4:y:2004:i:3:p:227-241
Note: Windows users should not attempt to download these files with a web browser.

Download full text from publisher

References listed on IDEAS

Patrick Royston & Gareth Ambler, 1999. "Multivariable fractional polynomials," Stata Technical Bulletin, StataCorp LLC, vol. 8(43).
John B. Carlin & Ning Li & Philip Greenwood & Carolyn Coffey, 2003. "Tools for analyzing multiple imputed datasets," Stata Journal, StataCorp LLC, vol. 3(3), pages 226-244, September.
Adrian Mander & David Clayton, 2000. "Hotdeck imputation," Stata Technical Bulletin, StataCorp LLC, vol. 9(51).
W. Sauerbrei & P. Royston, 1999. "Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 162(1), pages 71-94.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Marco Caliendo & Stefan Tübbicke, 2020. "New evidence on long-term effects of start-up subsidies: matching estimates and their robustness," Empirical Economics, Springer, vol. 59(4), pages 1605-1631, October.
- Marco Caliendo & Stefan Tübbicke, 2019. "New Evidence on Long-Term Effects of Start-Up Subsidies: Matching Estimates and their Robustness," CEPA Discussion Papers 06, Center for Economic Policy Analysis.
- Caliendo, Marco & Tübbicke, Stefan, 2019. "New Evidence on Long-Term Effects of Start-Up Subsidies: Matching Estimates and Their Robustness," IZA Discussion Papers 12261, Institute of Labor Economics (IZA).
Annalisa Orenti & Patrizia Boracchi & Giuseppe Marano & Elia Biganzoli & Federico Ambrogi, 2022. "A pseudo-values regression model for non-fatal event free survival in the presence of semi-competing risks," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(3), pages 709-727, September.
Strasak, Alexander M. & Umlauf, Nikolaus & Pfeiffer, Ruth M. & Lang, Stefan, 2011. "Comparing penalized splines and fractional polynomials for flexible modelling of the effects of continuous predictor variables," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1540-1551, April.
Stefanie Hieke & Harald Binder & Alexandra Nieters & Martin Schumacher, 2014. "minPtest: a resampling based gene region-level testing procedure for genetic case-control studies," Computational Statistics, Springer, vol. 29(1), pages 51-63, February.
Marisa Rifada & Vita Ratnasari & Purhadi Purhadi, 2023. "Parameter Estimation and Hypothesis Testing of The Bivariate Polynomial Ordinal Logistic Regression Model," Mathematics, MDPI, vol. 11(3), pages 1-12, January.
Patrick Royston, 2012. "Tools to simulate realistic censored survival-time distributions," Stata Journal, StataCorp LLC, vol. 12(4), pages 639-654, December.
Sauerbrei, W. & Meier-Hirmer, C. & Benner, A. & Royston, P., 2006. "Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs," Computational Statistics & Data Analysis, Elsevier, vol. 50(12), pages 3464-3485, August.
Sauerbrei, Willi & Royston, Patrick & Zapien, Karina, 2007. "Detecting an interaction between treatment and a continuous covariate: A comparison of two approaches," Computational Statistics & Data Analysis, Elsevier, vol. 51(8), pages 4054-4063, May.
William D. Dupont, 2010. "Review of Multivariable Model-building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modeling Continuous Variables, by Royston and Sauerbrei," Stata Journal, StataCorp LLC, vol. 10(2), pages 297-302, June.
Suvra Pal & Hongbo Yu & Zachary D. Loucks & Ian M. Harris, 2020. "Illustration of the Flexibility of Generalized Gamma Distribution in Modeling Right Censored Survival Data: Analysis of Two Cancer Datasets," Annals of Data Science, Springer, vol. 7(1), pages 77-90, March.
Patrick Royston, 2007. "Profile likelihood for estimation and confidence intervals," Stata Journal, StataCorp LLC, vol. 7(3), pages 376-387, September.
Stefanie Hieke & Axel Benner & Richard F Schlenk & Martin Schumacher & Lars Bullinger & Harald Binder, 2016. "Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
Mike G. Tsionas, 2017. "“When, Where, and How” of Efficiency Estimation: Improved Procedures for Stochastic Frontier Modeling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 948-965, July.
Royston, P. & Sauerbrei, W., 2007. "Improving the robustness of fractional polynomial models by preliminary covariate transformation: A pragmatic approach," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4240-4253, May.
Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
Patrick Royston & Willi Sauerbrei, 2009. "Two techniques for investigating interactions between treatment and continuous covariates in clinical trials," Stata Journal, StataCorp LLC, vol. 9(2), pages 230-251, June.
Schäfer, Dorothea & Werwatz, Axel & Zimmermann, Volker, 2004. "The determinants of debt and (private-) equity financing in young innovative SMEs: Evidence from Germany," CFS Working Paper Series 2004/06, Center for Financial Studies (CFS).
- Dorothea Schäfer & Axel Werwatz & Volker Zimmermann, 2004. "The Determinants of Debt and (Private-) Equity Financing in Young Innovative SMEs: Evidence from Germany," Discussion Papers of DIW Berlin 411, DIW Berlin, German Institute for Economic Research.
Charles-Olivier Amédée-Manesme & Fabrice Barthélémy & Didier Maillard, 2019. "Computation of the corrected Cornish–Fisher expansion using the response surface methodology: application to VaR and CVaR," Annals of Operations Research, Springer, vol. 281(1), pages 423-453, October.
- Charles-Olivier Amédée-Manesme & Fabrice Barthélémy & Didier Maillard, 2017. "Computation of the Corrected Cornish-Fisher Expansion using the Response Surface Methodology: Application to V aR and CV aR," Thema Working Papers 2017-21, THEMA (ThÃ©orie Economique, ModÃ©lisation et Applications), CY Cergy-Paris University, ESSEC and CNRS.
Geweke, John & Petrella, Lea, 2014. "Likelihood-based inference for regular functions with fractional polynomial approximations," Journal of Econometrics, Elsevier, vol. 183(1), pages 22-30.
Gustavo Soutinho & Luís Meira-Machado, 2022. "Methods for checking the Markov condition in multi-state survival data," Computational Statistics, Springer, vol. 37(2), pages 751-780, April.

More about this item

Keywords

; ; ; ; ; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tsj:stataj:v:4:y:2004:i:3:p:227-241. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum or Lisa Gilmore (email available below). General contact details of provider: http://www.stata-journal.com/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Multiple imputation of missing values

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data