IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v145y2020ics0167947319302622.html
   My bibliography  Save this article

Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework

Author

Listed:
  • Jiang, Wei
  • Josse, Julie
  • Lavielle, Marc

Abstract

Logistic regression is a common classification method in supervised learning. Surprisingly, there are very few solutions for performing logistic regression with missing values in the covariates. A complete approach based on a stochastic approximation version of the EM algorithm is proposed in order to perform statistical inference with missing values, including the estimation of the parameters and their variance, derivation of confidence intervals, and also a model selection procedure. The problem of prediction for new observations on a test set with missing covariate data is also tackled. Supported by a simulation study in which the method is compared to previous ones, it has proved to be computationally efficient, and has good coverage and variable selection properties. The approach is then illustrated on a dataset of severely traumatized patients from Paris hospitals by predicting the occurrence of hemorrhagic shock, a leading cause of early preventable death in severe trauma cases. The aim is to improve the current red flag procedure, a binary alert identifying patients with a high risk of severe hemorrhage. The method is implemented in the R package misaem.

Suggested Citation

  • Jiang, Wei & Josse, Julie & Lavielle, Marc, 2020. "Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
  • Handle: RePEc:eee:csdana:v:145:y:2020:i:c:s0167947319302622
    DOI: 10.1016/j.csda.2019.106907
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947319302622
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2019.106907?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz & Amy H. Herring, 2005. "Missing-Data Methods for Generalized Linear Models: A Comparative Review," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 332-346, March.
    2. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    3. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz, 1999. "Monte Carlo EM for Missing Covariates in Parametric Regression Models," Biometrics, The International Biometric Society, vol. 55(2), pages 591-596, June.
    4. Jiming Jiang & Thuan Nguyen & J. Sunil Rao, 2015. "The E-MS Algorithm: Model Selection With Incomplete Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1136-1147, September.
    5. Gerda Claeskens & Fabrizio Consentino, 2008. "Variable Selection with Incomplete Covariate Data," Biometrics, The International Biometric Society, vol. 64(4), pages 1062-1069, December.
    6. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    7. W. R. Gilks & P. Wild, 1992. "Adaptive Rejection Sampling for Gibbs Sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 337-348, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shen-Ming Lee & Truong-Nhat Le & Phuoc-Loc Tran & Chin-Shang Li, 2023. "Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods," Computational Statistics, Springer, vol. 38(2), pages 899-934, June.
    2. Daniele Bottigliengo & Giulia Lorenzoni & Honoria Ocagli & Matteo Martinato & Paola Berchialla & Dario Gregori, 2021. "Propensity Score Analysis with Partially Observed Baseline Covariates: A Practical Comparison of Methods for Handling Missing Data," IJERPH, MDPI, vol. 18(13), pages 1-17, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Xue-Dong & Fu, Ying-Zi, 2011. "Model selection for zero-inflated regression with missing covariates," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 765-773, January.
    2. Lee, Min Cherng & Mitra, Robin, 2016. "Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 24-38.
    3. Zhongqi Liang & Qihua Wang & Yuting Wei, 2022. "Robust model selection with covariables missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 539-557, June.
    4. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    5. Jiang, Depeng & Zhao, Puying & Tang, Niansheng, 2016. "A propensity score adjustment method for regression models with nonignorable missing covariates," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 98-119.
    6. Lei Jin & Suojin Wang, 2010. "A Model Validation Procedure when Covariate Data are Missing at Random," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 37(3), pages 403-421, September.
    7. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    8. Hongtu Zhu & Joseph G. Ibrahim & Xiaoyan Shi, 2009. "Diagnostic Measures for Generalized Linear Models with Missing Covariates," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(4), pages 686-712, December.
    9. Liang, Hua, 2008. "Generalized partially linear models with missing covariates," Journal of Multivariate Analysis, Elsevier, vol. 99(5), pages 880-895, May.
    10. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    11. Simon Grund & Oliver Lüdtke & Alexander Robitzsch, 2021. "On the Treatment of Missing Data in Background Questionnaires in Educational Large-Scale Assessments: An Evaluation of Different Procedures," Journal of Educational and Behavioral Statistics, , vol. 46(4), pages 430-465, August.
    12. Chen, Qingxia & Ibrahim, Joseph G. & Chen, Ming-Hui & Senchaudhuri, Pralay, 2008. "Theory and inference for regression models with missing responses and covariates," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1302-1331, July.
    13. Göran Kauermann & Mehboob Ali, 2021. "Semi-parametric regression when some (expensive) covariates are missing by design," Statistical Papers, Springer, vol. 62(4), pages 1675-1696, August.
    14. Maria Lucia Parrella & Giuseppina Albano & Michele La Rocca & Cira Perna, 2019. "Reconstructing missing data sequences in multivariate time series: an application to environmental data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(2), pages 359-383, June.
    15. Fang, Fang & Shao, Jun, 2016. "Iterated imputation estimation for generalized linear models with missing response and covariate values," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 111-123.
    16. Nanhua Zhang & Roderick J. Little, 2012. "A Pseudo-Bayesian Shrinkage Approach to Regression with Missing Covariates," Biometrics, The International Biometric Society, vol. 68(3), pages 933-942, September.
    17. Ming‐Hui Chen & Joseph G. Ibrahim, 2001. "Maximum Likelihood Methods for Cure Rate Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 57(1), pages 43-52, March.
    18. Yang Zhao, 2021. "Semiparametric model for regression analysis with nonmonotone missing data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(2), pages 461-475, June.
    19. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    20. Jan Kluge & Sarah Lappöhn & Kerstin Plank, 2023. "Predictors of TFP growth in European countries," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 50(1), pages 109-140, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:145:y:2020:i:c:s0167947319302622. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.