IDEAS home Printed from https://ideas.repec.org/a/sae/jedbes/v42y2017i4p371-404.html
   My bibliography  Save this article

Multiple Imputation Under Violated Distributional Assumptions: A Systematic Evaluation of the Assumed Robustness of Predictive Mean Matching

Author

Listed:
  • Kristian Kleinke

    (University of Hagen)

Abstract

Predictive mean matching (PMM) is a standard technique for the imputation of incomplete continuous data. PMM imputes an actual observed value, whose predicted value is among a set of k ≥ 1 values (the so-called donor pool), which are closest to the one predicted for the missing case. PMM is usually better able to preserve the original distribution of the empirical data than fully parametric multiple imputation (MI) approaches, when empirical data deviate from their distributional assumptions. Use of PMM is therefore especially worthwhile in situations where model assumptions of fully parametric MI procedures are violated and where fully parametric procedures would yield highly implausible estimates. Unfortunately, today there are only a handful of studies that systematically tested the robustness of PMM and it is still widely unknown where exactly the limits of this procedure lie. I examined the performance of PMM in situations where data were skewed to varying degrees, under different sample sizes, missing data percentages, and using different settings of the PMM approach. It was found that small donor pools overall yielded better results than large donor pools and that PMM generally worked well, unless data were highly skewed and more than about 20% to 30% of the data had to be imputed. Also, PMM generally performed better when sample size was sufficiently large.

Suggested Citation

  • Kristian Kleinke, 2017. "Multiple Imputation Under Violated Distributional Assumptions: A Systematic Evaluation of the Assumed Robustness of Predictive Mean Matching," Journal of Educational and Behavioral Statistics, , vol. 42(4), pages 371-404, August.
  • Handle: RePEc:sae:jedbes:v:42:y:2017:i:4:p:371-404
    DOI: 10.3102/1076998616687084
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.3102/1076998616687084
    Download Restriction: no

    File URL: https://libkey.io/10.3102/1076998616687084?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jörg Drechsler, 2015. "Multiple Imputation of Multilevel Missing Data—Rigor Versus Simplicity," Journal of Educational and Behavioral Statistics, , vol. 40(1), pages 69-95, February.
    2. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys: Reply," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 300-301, July.
    3. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 287-296, July.
    4. Schenker, Nathaniel & Taylor, Jeremy M. G., 1996. "Partially parametric techniques for multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 22(4), pages 425-446, August.
    5. Kristian Kleinke & Mark Stemmler & Jost Reinecke & Friedrich Lösel, 2011. "Efficient ways to impute incomplete panel data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 95(4), pages 351-373, December.
    6. Rebecca R. Andridge & Roderick J. A. Little, 2010. "A Review of Hot Deck Imputation for Survey Non‐response," International Statistical Review, International Statistical Institute, vol. 78(1), pages 40-64, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kristian Kleinke & Jost Reinecke, 2013. "Multiple imputation of incomplete zero-inflated count data," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 67(3), pages 311-336, August.
    2. Shu Yang & Jae Kwang Kim, 2020. "Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 839-861, September.
    3. Yanqing Sun & Li Qi & Fei Heng & Peter B. Gilbert, 2020. "A hybrid approach for the stratified mark‐specific proportional hazards model with missing covariates and missing marks, with application to vaccine efficacy trials," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 791-814, August.
    4. Jonathan Hambur & Gianni La Cava, 2018. "Do Interest Rates Affect Business Investment? Evidence from Australian Company-level Data," RBA Research Discussion Papers rdp2018-05, Reserve Bank of Australia.
    5. Sullivan, Danielle & Andridge, Rebecca, 2015. "A hot deck imputation procedure for multiply imputing nonignorable missing data: The proxy pattern-mixture hot deck," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 173-185.
    6. Ralf Münnich & Siegfried Gabler & Christian Bruch & Jan Pablo Burgard & Tobias Enderle & Jan-Philipp Kolb & Thomas Zimmermann, 2015. "Tabellenauswertungen im Zensus unter Berücksichtigung fehlender Werte," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 9(3), pages 269-304, December.
    7. Jana Emmenegger & Ralf Münnich & Jannik Schaller, 2022. "Evaluating Data Fusion Methods to Improve Income Modelling," Research Papers in Economics 2022-03, University of Trier, Department of Economics.
    8. Rebecca R. Andridge & Roderick J. A. Little, 2010. "A Review of Hot Deck Imputation for Survey Non‐response," International Statistical Review, International Statistical Institute, vol. 78(1), pages 40-64, April.
    9. Chenyang Gu & Roee Gutman, 2017. "Combining item response theory with multiple imputation to equate health assessment questionnaires," Biometrics, The International Biometric Society, vol. 73(3), pages 990-998, September.
    10. Robert J. Batt & Christian Terwiesch, 2015. "Waiting Patiently: An Empirical Study of Queue Abandonment in an Emergency Department," Management Science, INFORMS, vol. 61(1), pages 39-59, January.
    11. Chia-Ning Wang & Roderick Little & Bin Nan & Siobán D. Harlow, 2011. "A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Recurrent Event Histories," Biometrics, The International Biometric Society, vol. 67(4), pages 1573-1582, December.
    12. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.
    13. Gianluca Gazzola & Myong K. Jeong, 2021. "Support vector regression for polyhedral and missing data," Annals of Operations Research, Springer, vol. 303(1), pages 483-506, August.
    14. Patrick M. Joyce & Donald Malec & Roderick J. A. Little & Aaron Gilary & Alfredo Navarro & Mark E. Asiala, 2014. "Statistical Modeling Methodology for the Voting Rights Act Section 203 Language Assistance Determinations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 36-47, March.
    15. Gabriele Beissel Durrant, 2009. "Imputation Methods for Handling Item-Nonresponse in the Social Sciences: A Methodological Review," Working Papers id:2007, eSocialSciences.
    16. Anika Rasner & Joachim R. Frick & Markus M. Grabka, 2013. "Statistical Matching of Administrative and Survey Data," Sociological Methods & Research, , vol. 42(2), pages 192-224, May.
    17. Grabka, Markus & Westermeier, Christian, 2014. "Estimating the Impact of Alternative Multiple Imputation Methods on Longitudinal Wealth Data," VfS Annual Conference 2014 (Hamburg): Evidence-based Economic Policy 100353, Verein für Socialpolitik / German Economic Association.
    18. Raymundo M. Campos-Vázquez, 2013. "Efectos de los ingresos no reportados en el nivel y tendencia de la pobreza laboral en México," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(2), pages 23-54, November.
    19. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.
    20. Verbeek, M.J.C.M. & Nijman, T.E., 1992. "Incomplete panels and selection bias : A survey," Discussion Paper 1992-7, Tilburg University, Center for Economic Research.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:jedbes:v:42:y:2017:i:4:p:371-404. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.