IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i1p51-65.html
   My bibliography  Save this article

Stacked inverse probability of censoring weighted bagging: A case study in the InfCareHIV Register

Author

Listed:
  • Pablo Gonzalez Ginestet
  • Ales Kotalik
  • David M. Vock
  • Julian Wolfson
  • Erin E. Gabriel

Abstract

We propose an inverse probability of censoring weighted (IPCW) bagging (bootstrap aggregation) pre‐processing that enables the application of any machine learning procedure for classification to be used to predict the cause‐specific cumulative incidence, properly accounting for right‐censored observations and competing risks. We consider the IPCW area under the time‐dependent ROC curve (IPCW‐AUC) as a performance evaluation metric. We also suggest a procedure to optimally stack predictions from any set of IPCW bagged methods. We illustrate our proposed method in the Swedish InfCareHIV register by predicting individuals for whom treatment will not maintain an undetectable viral load for at least 2 years following initial suppression. The R package stackBagg that implements our proposed method is available on Github.

Suggested Citation

  • Pablo Gonzalez Ginestet & Ales Kotalik & David M. Vock & Julian Wolfson & Erin E. Gabriel, 2021. "Stacked inverse probability of censoring weighted bagging: A case study in the InfCareHIV Register," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(1), pages 51-65, January.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:1:p:51-65
    DOI: 10.1111/rssc.12448
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12448
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12448?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Michael C. Sachs & Andrea Discacciati & Åsa H. Everhov & Ola Olén & Erin E. Gabriel, 2019. "Ensemble prediction of time‐to‐event outcomes with competing risks: a case‐study of surgical complications in Crohn's disease," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 68(5), pages 1431-1446, November.
    2. Somnath Datta & Glen A. Satten, 2002. "Estimation of Integrated Transition Hazards and Stage Occupation Probabilities for Non-Markov Systems Under Dependent Censoring," Biometrics, The International Biometric Society, vol. 58(4), pages 792-802, December.
    3. Yuanjia Wang & Huaihou Chen & Runze Li & Naihua Duan & Roberto Lewis-Fernández, 2011. "Prediction-Based Structured Variable Selection through the Receiver Operating Characteristic Curves," Biometrics, The International Biometric Society, vol. 67(3), pages 896-905, September.
    4. James M. Robins & Dianne M. Finkelstein, 2000. "Correcting for Noncompliance and Dependent Censoring in an AIDS Clinical Trial with Inverse Probability of Censoring Weighted (IPCW) Log-Rank Tests," Biometrics, The International Biometric Society, vol. 56(3), pages 779-788, September.
    5. Satten, Glen A. & Datta, Somnath & Robins, James, 2001. "Estimating the marginal survival function in the presence of time dependent covariates," Statistics & Probability Letters, Elsevier, vol. 54(4), pages 397-403, October.
    6. Margaret Sullivan Pepe & Tianxi Cai & Gary Longton, 2006. "Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve," Biometrics, The International Biometric Society, vol. 62(1), pages 221-229, March.
    7. Stephen F Weng & Jenna Reps & Joe Kai & Jonathan M Garibaldi & Nadeem Qureshi, 2017. "Can machine-learning improve cardiovascular risk prediction using routine clinical data?," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-14, April.
    8. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2006. "Moving the Goalposts: Addressing Limited Overlap in the Estimation of Average Treatment Effects by Changing the Estimand," NBER Technical Working Papers 0330, National Bureau of Economic Research, Inc.
    9. Yingye Zheng & Tianxi Cai & Yuying Jin & Ziding Feng, 2012. "Evaluating Prognostic Accuracy of Biomarkers under Competing Risk," Biometrics, The International Biometric Society, vol. 68(2), pages 388-396, June.
    10. Brian K Lee & Justin Lessler & Elizabeth A Stuart, 2011. "Weight Trimming and Propensity Score Weighting," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-6, March.
    11. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    12. Shuangge Ma & Jian Huang, 2007. "Combining Multiple Markers for Classification Using ROC," Biometrics, The International Biometric Society, vol. 63(3), pages 751-757, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Timoth'ee Fabre & Vincent Ragel, 2023. "Tackling the Problem of State Dependent Execution Probability: Empirical Evidence and Order Placement," Papers 2307.04863, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xin Huang & Gengsheng Qin & Yixin Fang, 2011. "Optimal Combinations of Diagnostic Tests Based on AUC," Biometrics, The International Biometric Society, vol. 67(2), pages 568-576, June.
    2. Yuanjia Wang & Huaihou Chen & Runze Li & Naihua Duan & Roberto Lewis-Fernández, 2011. "Prediction-Based Structured Variable Selection through the Receiver Operating Characteristic Curves," Biometrics, The International Biometric Society, vol. 67(3), pages 896-905, September.
    3. Chen, Xiwei & Vexler, Albert & Markatou, Marianthi, 2015. "Empirical likelihood ratio confidence interval estimation of best linear combinations of biomarkers," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 186-198.
    4. Osamu Komori, 2011. "A boosting method for maximization of the area under the ROC curve," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(5), pages 961-979, October.
    5. Shen, Pao-sheng, 2010. "Semiparametric estimation of survival function when data are subject to dependent censoring and left truncation," Statistics & Probability Letters, Elsevier, vol. 80(3-4), pages 161-168, February.
    6. Chiang, Chin-Tsang & Chiu, Chih-Heng, 2012. "Nonparametric and semiparametric optimal transformations of markers," Journal of Multivariate Analysis, Elsevier, vol. 103(1), pages 124-141, January.
    7. Rocío Aznar-Gimeno & Luis M. Esteban & Gerardo Sanz & Rafael del-Hoyo-Alonso & Ricardo Savirón-Cornudella, 2021. "Incorporating a New Summary Statistic into the Min–Max Approach: A Min–Max–Median, Min–Max–IQR Combination of Biomarkers for Maximising the Youden Index," Mathematics, MDPI, vol. 9(19), pages 1-17, October.
    8. Cuihong Zhang & Jing Ning & Steven H. Belle & Robert H. Squires & Jianwen Cai & Ruosha Li, 2022. "Assessing predictive discrimination performance of biomarkers in the presence of treatment‐induced dependent censoring," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1137-1157, November.
    9. Schmid Matthias & Hothorn Torsten & Krause Friedemann & Rabe Christina, 2012. "A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-26, October.
    10. Zhang Zhiwei & Ma Shujie & Nie Lei & Soon Guoxing, 2017. "A Quantitative Concordance Measure for Comparing and Combining Treatment Selection Markers," The International Journal of Biostatistics, De Gruyter, vol. 13(1), pages 1-24, May.
    11. Yanqing Wang & Ying‐Qi Zhao & Yingye Zheng, 2020. "Learning‐based biomarker‐assisted rules for optimized clinical benefit under a risk constraint," Biometrics, The International Biometric Society, vol. 76(3), pages 853-862, September.
    12. Matias Busso & Patrick Kline, 2008. "Do Local Economic Development Programs Work? Evidence from the Federal Empowerment Zone Program," Cowles Foundation Discussion Papers 1639, Cowles Foundation for Research in Economics, Yale University.
    13. Mirza Rizwan Sajid & Bader A. Almehmadi & Waqas Sami & Mansour K. Alzahrani & Noryanti Muhammad & Christophe Chesneau & Asif Hanif & Arshad Ali Khan & Ahmad Shahbaz, 2021. "Development of Nonlaboratory-Based Risk Prediction Models for Cardiovascular Diseases Using Conventional and Machine Learning Approaches," IJERPH, MDPI, vol. 18(23), pages 1-16, November.
    14. Glover, Steven & Jones, Sam, 2019. "Can commercial farming promote rural dynamism in sub-Saharan Africa? Evidence from Mozambique," World Development, Elsevier, vol. 114(C), pages 110-121.
    15. Richard J. Cook & Jerald F. Lawless, 2020. "Failure time studies with intermittent observation and losses to follow‐up," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1035-1063, December.
    16. Rachel Axelrod & Daniel Nevo, 2023. "A sensitivity analysis approach for the causal hazard ratio in randomized and observational studies," Biometrics, The International Biometric Society, vol. 79(3), pages 2743-2756, September.
    17. Nazmul Islam & Natalie E. Sheils & Megan S. Jarvis & Kenneth Cohen, 2022. "Comparative effectiveness over time of the mRNA-1273 (Moderna) vaccine and the BNT162b2 (Pfizer-BioNTech) vaccine," Nature Communications, Nature, vol. 13(1), pages 1-7, December.
    18. Wendy Chan, 2018. "Applications of Small Area Estimation to Generalization With Subclassification by Propensity Scores," Journal of Educational and Behavioral Statistics, , vol. 43(2), pages 182-224, April.
    19. Yanyao Yi & Ting Ye & Menggang Yu & Jun Shao, 2020. "Cox regression with survival‐time‐dependent missing covariate values," Biometrics, The International Biometric Society, vol. 76(2), pages 460-471, June.
    20. Xiaofeng Lv & Gupeng Zhang & Guangyu Ren, 2017. "Gini index estimation for lifetime data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(2), pages 275-304, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:1:p:51-65. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.