IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2102.11334.html
   My bibliography  Save this paper

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

Author

Listed:
  • Charles F Manski
  • Michael Gmeiner
  • Anat Tamburc

Abstract

Researchers regularly perform conditional prediction using imputed values of missing data. However, applications of imputation often lack a firm foundation in statistical theory. This paper originated when we were unable to find analysis substantiating claims that imputation of missing data has good frequentist properties when data are missing at random (MAR). We focused on the use of observed covariates to impute missing covariates when estimating conditional means of the form E(y|x, w). Here y is an outcome whose realizations are always observed, x is a covariate whose realizations are always observed, and w is a covariate whose realizations are sometimes unobserved. We examine the probability limit of simple imputation estimates of E(y|x, w) as sample size goes to infinity. We find that these estimates are not consistent when covariate data are MAR. To the contrary, the estimates suffer from a shrinkage problem. They converge to points intermediate between the conditional mean of interest, E(y|x, w), and the mean E(y|x) that conditions only on x. We use a type of genotype imputation to illustrate.

Suggested Citation

  • Charles F Manski & Michael Gmeiner & Anat Tamburc, 2021. "Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem," Papers 2102.11334, arXiv.org.
  • Handle: RePEc:arx:papers:2102.11334
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2102.11334
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Charles F. Manski & Anat R. Tambur & Michael Gmeiner, 2019. "Predicting kidney transplant outcomes with partial knowledge of HLA mismatch," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 116(41), pages 20339-20345, October.
    2. Horowitz, Joel L. & Manski, Charles F., 1998. "Censoring of outcomes and regressors due to survey nonresponse: Identification and estimation using weights and imputations," Journal of Econometrics, Elsevier, vol. 84(1), pages 37-58, May.
    3. Charles F. Manski, 2018. "Credible ecological inference for medical decisions with personalized risk assessment," Quantitative Economics, Econometric Society, vol. 9(2), pages 541-569, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Charles F. Manski, 2022. "Inference with Imputed Data: The Allure of Making Stuff Up," Papers 2205.07388, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sheyu Li & Valentyn Litvin & Charles F. Manski, 2022. "Partial Identification of Personalized Treatment Response with Trial-reported Analyses of Binary Subgroups," Papers 2208.03381, arXiv.org, revised Sep 2022.
    2. Charles F. Manski, 2022. "Inference with Imputed Data: The Allure of Making Stuff Up," Papers 2205.07388, arXiv.org.
    3. Molinari, Francesca, 2010. "Missing Treatments," Journal of Business & Economic Statistics, American Statistical Association, vol. 28(1), pages 82-95.
    4. International Labour Office., 2004. "Global employment trends : January 2004," Global Employment Trends Reports 994802133402676, International Labour Office, Economic and Labour Market Analysis Department.
    5. Yongwei Chen & Dahai Fu, 2015. "Measuring income inequality using survey data: the case of China," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 13(2), pages 299-307, June.
    6. Lothar Essig & Joachim K. Winter, 2009. "Item Non-Response to Financial Questions in Household Surveys: An Experimental Study of Interviewer and Mode Effects," Fiscal Studies, Institute for Fiscal Studies, vol. 30(Special I), pages 367-390, December.
    7. Karsten Marshall Elseth Rieck & Kjetil Telle, 2012. "Sick leave before, during and after pregnancy," Discussion Papers 690, Statistics Norway, Research Department.
    8. Manski, Charles F., 2023. "Probabilistic prediction for binary treatment choice: With focus on personalized medicine," Journal of Econometrics, Elsevier, vol. 234(2), pages 647-663.
    9. Keisuke Hirano & Guido W. Imbens & Geert Ridder & Donald B. Rubin, 2001. "Combining Panel Data Sets with Attrition and Refreshment Samples," Econometrica, Econometric Society, vol. 69(6), pages 1645-1659, November.
    10. Kraft, Holger & Kroisandt, Gerald & Müller, Marlene, 2002. "Assessing the discriminatory power of credit scores," SFB 373 Discussion Papers 2002,67, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
    11. Esmerelda A. Ramalho & Richard Smith, 2003. "Discrete choice non-response," CeMMAP working papers 07/03, Institute for Fiscal Studies.
    12. Martin Huber, 2012. "Identification of Average Treatment Effects in Social Experiments Under Alternative Forms of Attrition," Journal of Educational and Behavioral Statistics, , vol. 37(3), pages 443-474, June.
    13. Kapteyn, Arie & Michaud, Pierre-Carl & Smith, James P. & van Soest, Arthur, 2006. "Effects of Attrition and Non-Response in the Health and Retirement Study," IZA Discussion Papers 2246, Institute of Labor Economics (IZA).
    14. Kreider, Brent & Pepper, John V., 2007. "Disability and Employment: Reevaluating the Evidence in Light of Reporting Errors," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 432-441, June.
    15. Guido W. Imbens & Charles F. Manski, 2004. "Confidence Intervals for Partially Identified Parameters," Econometrica, Econometric Society, vol. 72(6), pages 1845-1857, November.
    16. John Fitzgerald & Peter Gottschalk & Robert Moffitt, 1998. "An Analysis of Sample Attrition in Panel Data: The Michigan Panel Study of Income Dynamics," Journal of Human Resources, University of Wisconsin Press, vol. 33(2), pages 251-299.
    17. Philip A. Haile & Elie Tamer, 2003. "Inference with an Incomplete Model of English Auctions," Journal of Political Economy, University of Chicago Press, vol. 111(1), pages 1-51, February.
    18. Arie Beresteanu & Francesca Molinari, 2008. "Asymptotic Properties for a Class of Partially Identified Models," Econometrica, Econometric Society, vol. 76(4), pages 763-814, July.
    19. Onur Başer & Joseph C. Gardiner & Cathy J. Bradley & Hüseyin Yüce & Charles Given, 2006. "Longitudinal analysis of censored medical cost data," Health Economics, John Wiley & Sons, Ltd., vol. 15(5), pages 513-525, May.
    20. Martin Huber, 2010. "Identification of average treatment effects in social experiments under different forms of attrition," University of St. Gallen Department of Economics working paper series 2010 2010-22, Department of Economics, University of St. Gallen.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2102.11334. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.