Rose Medeiros () (Statistical Consulting Group, ATS, University of California Los Angeles)
Abstract
It is generally advised that imputation models contain as many “predictor” variables as possible, since the greater the number of variables the greater the amount of information from which to make estimations (van Buuren, Boshuizen & Knook 1999). Ideally, an imputation model might contain all variables in the dataset. Hence, the default in software packages that perform multivariate imputation by chained equations (e.g. ice in Stata) is often to use all other variables in the imputation model to predict missing values. However, in datasets with moderate to large numbers of variables, attempting to use all other variables in the dataset results in imputation models that are too large to actually run. One solution to this problem is to select a relatively large, but reasonable, number of predictors based on bivariate correlations and then drop predictors as necessary to create a regression model that is tractable using the complete data. This set of regression models form the imputation model for the entire dataset. This presentation outlines this approach in more detail and presents an overview of the Stata package that implements it.
Download Info
To download:
If you experience problems downloading a file, check if you have the
proper application to
view it first. Information about this may be contained
in the File-Format links below. In case of further problems read
the IDEAS help
page. Note that these files are not on the IDEAS
site. Please be patient as the files may be large.