IDEAS home Printed from https://ideas.repec.org/p/ess/wpaper/id2007.html
   My bibliography  Save this paper

Imputation Methods for Handling Item-Nonresponse in the Social Sciences: A Methodological Review

Author

Listed:
  • Gabriele Beissel Durrant

Abstract

Missing data are often a problem in social science data. Imputation methods fill in the missing responses and lead, under certain conditions, to valid inference. This article reviews several imputation methods used in the social sciences and discusses advantages and disadvantages of these methods in practice. Simpler imputation methods as well as more advanced methods, such as fractional and multiple imputation, are considered. The paper introduces the reader new to the imputation literature to key ideas and methods. For those already familiar with imputation methods the paper highlights some new developments and clarifies some recent misconceptions in the use of imputation methods. The emphasis is on efficient hot deck imputation methods, implemented in either multiple or fractional imputation approaches. Software packages for using imputation methods in practice are reviewed highlighting newer developments. The paper discusses an example from the social sciences in detail, applying several imputation methods to a missing earnings variable. The objective is to illustrate how to choose between methods in a real data example. A simulation study evaluates various imputation methods, including predictive mean matching, fractional and multiple imputation. Certain forms of fractional and multiple hot deck methods are found to perform well with regards to bias and efficiency of a point estimator and robustness against model misspecifications. Standard parametric imputation methods are not found adequate for the application considered.[NCRM WP]

Suggested Citation

  • Gabriele Beissel Durrant, 2009. "Imputation Methods for Handling Item-Nonresponse in the Social Sciences: A Methodological Review," Working Papers id:2007, eSocialSciences.
  • Handle: RePEc:ess:wpaper:id:2007
    as

    Download full text from publisher

    File URL: http://www.eSocialSciences.com/data/articles/Document1262009150.4561579.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz & Amy H. Herring, 2005. "Missing-Data Methods for Generalized Linear Models: A Comparative Review," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 332-346, March.
    2. Patrick Royston, 2004. "Multiple imputation of missing values," Stata Journal, StataCorp LP, vol. 4(3), pages 227-241, September.
    3. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 287-296, July.
    4. Horton N. J. & Lipsitz S. R., 2001. "Multiple Imputation in Practice: Comparison of Software Packages for Regression Models With Missing Variables," The American Statistician, American Statistical Association, vol. 55, pages 244-254, August.
    5. Schenker, Nathaniel & Taylor, Jeremy M. G., 1996. "Partially parametric techniques for multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 22(4), pages 425-446, August.
    6. Jae Kwang Kim, 2004. "Fractional hot deck imputation," Biometrika, Biometrika Trust, vol. 91(3), pages 559-578, September.
    7. Eric Schulte Nordholt, 1998. "Imputation: Methods, Simulation Experiments and Practical Examples," International Statistical Review, International Statistical Institute, vol. 66(2), pages 157-180, August.
    8. Vicki Freedman & Douglas Wolf, 1995. "A case study on the use of multiple imputation," Demography, Springer;Population Association of America (PAA), vol. 32(3), pages 459-470, August.
    9. Barry T. Hirsch & Edward J. Schumacher, 2004. "Match Bias in Wage Gap Estimates Due to Earnings Imputation," Journal of Labor Economics, University of Chicago Press, vol. 22(3), pages 689-722, July.
    10. Daniel F. Heitjan & Roderick J. A. Little, 1991. "Multiple Imputation for the Fatal Accident Reporting System," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 40(1), pages 13-29, March.
    11. John B. Carlin & Ning Li & Philip Greenwood & Carolyn Coffey, 2003. "Tools for analyzing multiple imputed datasets," Stata Journal, StataCorp LP, vol. 3(3), pages 226-244, September.
    12. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys: Reply," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 300-301, July.
    13. Stuart R. Lipsitz & Lue Ping Zhao & Geert Molenberghs, 1998. "A semiparametric method of multiple imputation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 60(1), pages 127-144.
    14. J. K. Kim, 2002. "A note on approximate Bayesian bootstrap imputation," Biometrika, Biometrika Trust, vol. 89(2), pages 470-477, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rebecca R. Andridge & Roderick J. A. Little, 2010. "A Review of Hot Deck Imputation for Survey Non‐response," International Statistical Review, International Statistical Institute, vol. 78(1), pages 40-64, April.
    2. Shu Yang & Jae Kwang Kim, 2020. "Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 839-861, September.
    3. Marco Di Zio & Ugo Guarnera, 2008. "A multiple imputation method for non-Gaussian data," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(1), pages 75-90.
    4. Kristian Kleinke & Jost Reinecke, 2013. "Multiple imputation of incomplete zero-inflated count data," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 67(3), pages 311-336, August.
    5. Patrick M. Joyce & Donald Malec & Roderick J. A. Little & Aaron Gilary & Alfredo Navarro & Mark E. Asiala, 2014. "Statistical Modeling Methodology for the Voting Rights Act Section 203 Language Assistance Determinations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 36-47, March.
    6. Grabka, Markus & Westermeier, Christian, 2014. "Estimating the Impact of Alternative Multiple Imputation Methods on Longitudinal Wealth Data," VfS Annual Conference 2014 (Hamburg): Evidence-based Economic Policy 100353, Verein für Socialpolitik / German Economic Association.
    7. Brownstone, David, 1997. "Multiple Imputation Methodology for Missing Data, Non-Random Response, and Panel Attrition," University of California Transportation Center, Working Papers qt2zd6w6hh, University of California Transportation Center.
    8. Westermeier, Christian & Grabka, Markus M., 2016. "Longitudinal Wealth Data and Multiple Imputation: An Evaluation Study," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 10(3), pages 237-252.
    9. Jonathan Hambur & Gianni La Cava, 2018. "Do Interest Rates Affect Business Investment? Evidence from Australian Company-level Data," RBA Research Discussion Papers rdp2018-05, Reserve Bank of Australia.
    10. Jared S. Murray & Jerome P. Reiter, 2016. "Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1466-1479, October.
    11. Gabriele B. Durrant & Chris Skinner, 2006. "Using data augmentation to correct for non‐ignorable non‐response when surrogate data are available: an application to the distribution of hourly pay," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 169(3), pages 605-623, July.
    12. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    13. Kristian Kleinke, 2017. "Multiple Imputation Under Violated Distributional Assumptions: A Systematic Evaluation of the Assumed Robustness of Predictive Mean Matching," Journal of Educational and Behavioral Statistics, , vol. 42(4), pages 371-404, August.
    14. Chenyang Gu & Roee Gutman, 2017. "Combining item response theory with multiple imputation to equate health assessment questionnaires," Biometrics, The International Biometric Society, vol. 73(3), pages 990-998, September.
    15. Robert J. Batt & Christian Terwiesch, 2015. "Waiting Patiently: An Empirical Study of Queue Abandonment in an Emergency Department," Management Science, INFORMS, vol. 61(1), pages 39-59, January.
    16. Chia-Ning Wang & Roderick Little & Bin Nan & Siobán D. Harlow, 2011. "A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Recurrent Event Histories," Biometrics, The International Biometric Society, vol. 67(4), pages 1573-1582, December.
    17. Gianluca Gazzola & Myong K. Jeong, 2021. "Support vector regression for polyhedral and missing data," Annals of Operations Research, Springer, vol. 303(1), pages 483-506, August.
    18. R Florez-Lopez, 2010. "Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(3), pages 486-501, March.
    19. Frank Potter & Eric Grau & John Czajka & Dan Scheer & Mark Levitan, "undated". "Imputation Variance Estimation Protocols for the NAS Poverty Measure: The New York City Poverty Measure Experience," Mathematica Policy Research Reports 77be49e0f91f41e888de5139e, Mathematica Policy Research.
    20. Chris Skinner & Nigel Stuttard & Gabriele Beissel‐Durrant & James Jenkins, 2002. "The Measurement of Low Pay in the UK Labour Force Survey," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 64(supplemen), pages 653-676, December.

    More about this item

    Keywords

    item-nonresponse; imputation; fractional imputation; multiple imputation; estimation of distribution functions;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ess:wpaper:id:2007. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Padma Prakash (email available below). General contact details of provider: http://www.esocialsciences.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.