Multiple imputation with large proportions of missing data: How much is too much?
Multiple imputation (MI) is known as an effective method for handling missing data. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable. This study examines the effectiveness of MI in data with 10% to 80% missing observations using absolute bias and root mean squared error of MI measured under missing completely at random, missing at random, and not missing at random assumptions. Using both simulated data drawn from multivariate normal distribution and example data from the Predictive Study of Coronary Heart Disease, the bias and root mean squared error using MI are much smaller than of the results when complete case analysis is used. In addition, the bias of MI is consistent regardless of increasing imputation numbers (M) from M = 10 to M = 50. Moreover, compared to the regression method and predictive mean matching method, the Markov chain Monte Carlo method can also be used for continuous and univariate missing variables as an imputation mechanism. In conclusion, MI produces less-biased estimates, but when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation.
When requesting a correction, please mention this item's handle: RePEc:boc:usug11:23. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christopher F Baum)
If references are entirely missing, you can add them using this form.