IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i1p51-65.html
   My bibliography  Save this article

Stacked inverse probability of censoring weighted bagging: A case study in the InfCareHIV Register

Author

Listed:
  • Pablo Gonzalez Ginestet
  • Ales Kotalik
  • David M. Vock
  • Julian Wolfson
  • Erin E. Gabriel

Abstract

We propose an inverse probability of censoring weighted (IPCW) bagging (bootstrap aggregation) pre‐processing that enables the application of any machine learning procedure for classification to be used to predict the cause‐specific cumulative incidence, properly accounting for right‐censored observations and competing risks. We consider the IPCW area under the time‐dependent ROC curve (IPCW‐AUC) as a performance evaluation metric. We also suggest a procedure to optimally stack predictions from any set of IPCW bagged methods. We illustrate our proposed method in the Swedish InfCareHIV register by predicting individuals for whom treatment will not maintain an undetectable viral load for at least 2 years following initial suppression. The R package stackBagg that implements our proposed method is available on Github.

Suggested Citation

  • Pablo Gonzalez Ginestet & Ales Kotalik & David M. Vock & Julian Wolfson & Erin E. Gabriel, 2021. "Stacked inverse probability of censoring weighted bagging: A case study in the InfCareHIV Register," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(1), pages 51-65, January.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:1:p:51-65
    DOI: 10.1111/rssc.12448
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12448
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12448?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Somnath Datta & Glen A. Satten, 2002. "Estimation of Integrated Transition Hazards and Stage Occupation Probabilities for Non-Markov Systems Under Dependent Censoring," Biometrics, The International Biometric Society, vol. 58(4), pages 792-802, December.
    2. Stephen F Weng & Jenna Reps & Joe Kai & Jonathan M Garibaldi & Nadeem Qureshi, 2017. "Can machine-learning improve cardiovascular risk prediction using routine clinical data?," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-14, April.
    3. Shuangge Ma & Jian Huang, 2007. "Combining Multiple Markers for Classification Using ROC," Biometrics, The International Biometric Society, vol. 63(3), pages 751-757, September.
    4. Satten, Glen A. & Datta, Somnath & Robins, James, 2001. "Estimating the marginal survival function in the presence of time dependent covariates," Statistics & Probability Letters, Elsevier, vol. 54(4), pages 397-403, October.
    5. Margaret Sullivan Pepe & Tianxi Cai & Gary Longton, 2006. "Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve," Biometrics, The International Biometric Society, vol. 62(1), pages 221-229, March.
    6. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2006. "Moving the Goalposts: Addressing Limited Overlap in the Estimation of Average Treatment Effects by Changing the Estimand," NBER Technical Working Papers 0330, National Bureau of Economic Research, Inc.
    7. Michael C. Sachs & Andrea Discacciati & Åsa H. Everhov & Ola Olén & Erin E. Gabriel, 2019. "Ensemble prediction of time‐to‐event outcomes with competing risks: a case‐study of surgical complications in Crohn's disease," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 68(5), pages 1431-1446, November.
    8. Yingye Zheng & Tianxi Cai & Yuying Jin & Ziding Feng, 2012. "Evaluating Prognostic Accuracy of Biomarkers under Competing Risk," Biometrics, The International Biometric Society, vol. 68(2), pages 388-396, June.
    9. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    10. Yuanjia Wang & Huaihou Chen & Runze Li & Naihua Duan & Roberto Lewis-Fernández, 2011. "Prediction-Based Structured Variable Selection through the Receiver Operating Characteristic Curves," Biometrics, The International Biometric Society, vol. 67(3), pages 896-905, September.
    11. James M. Robins & Dianne M. Finkelstein, 2000. "Correcting for Noncompliance and Dependent Censoring in an AIDS Clinical Trial with Inverse Probability of Censoring Weighted (IPCW) Log-Rank Tests," Biometrics, The International Biometric Society, vol. 56(3), pages 779-788, September.
    12. Brian K Lee & Justin Lessler & Elizabeth A Stuart, 2011. "Weight Trimming and Propensity Score Weighting," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-6, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Timoth'ee Fabre & Vincent Ragel, 2023. "Interpretable ML for High-Frequency Execution," Papers 2307.04863, arXiv.org, revised Sep 2024.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xin Huang & Gengsheng Qin & Yixin Fang, 2011. "Optimal Combinations of Diagnostic Tests Based on AUC," Biometrics, The International Biometric Society, vol. 67(2), pages 568-576, June.
    2. Yuanjia Wang & Huaihou Chen & Runze Li & Naihua Duan & Roberto Lewis-Fernández, 2011. "Prediction-Based Structured Variable Selection through the Receiver Operating Characteristic Curves," Biometrics, The International Biometric Society, vol. 67(3), pages 896-905, September.
    3. Chen, Xiwei & Vexler, Albert & Markatou, Marianthi, 2015. "Empirical likelihood ratio confidence interval estimation of best linear combinations of biomarkers," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 186-198.
    4. Osamu Komori, 2011. "A boosting method for maximization of the area under the ROC curve," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(5), pages 961-979, October.
    5. Chiang, Chin-Tsang & Chiu, Chih-Heng, 2012. "Nonparametric and semiparametric optimal transformations of markers," Journal of Multivariate Analysis, Elsevier, vol. 103(1), pages 124-141, January.
    6. Yanqing Wang & Ying‐Qi Zhao & Yingye Zheng, 2020. "Learning‐based biomarker‐assisted rules for optimized clinical benefit under a risk constraint," Biometrics, The International Biometric Society, vol. 76(3), pages 853-862, September.
    7. Shen, Pao-sheng, 2010. "Semiparametric estimation of survival function when data are subject to dependent censoring and left truncation," Statistics & Probability Letters, Elsevier, vol. 80(3-4), pages 161-168, February.
    8. Rocío Aznar-Gimeno & Luis M. Esteban & Gerardo Sanz & Rafael del-Hoyo-Alonso & Ricardo Savirón-Cornudella, 2021. "Incorporating a New Summary Statistic into the Min–Max Approach: A Min–Max–Median, Min–Max–IQR Combination of Biomarkers for Maximising the Youden Index," Mathematics, MDPI, vol. 9(19), pages 1-17, October.
    9. Cuihong Zhang & Jing Ning & Steven H. Belle & Robert H. Squires & Jianwen Cai & Ruosha Li, 2022. "Assessing predictive discrimination performance of biomarkers in the presence of treatment‐induced dependent censoring," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1137-1157, November.
    10. Schmid Matthias & Hothorn Torsten & Krause Friedemann & Rabe Christina, 2012. "A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-26, October.
    11. Zhang Zhiwei & Ma Shujie & Nie Lei & Soon Guoxing, 2017. "A Quantitative Concordance Measure for Comparing and Combining Treatment Selection Markers," The International Journal of Biostatistics, De Gruyter, vol. 13(1), pages 1-24, May.
    12. Nazmul Islam & Natalie E. Sheils & Megan S. Jarvis & Kenneth Cohen, 2022. "Comparative effectiveness over time of the mRNA-1273 (Moderna) vaccine and the BNT162b2 (Pfizer-BioNTech) vaccine," Nature Communications, Nature, vol. 13(1), pages 1-7, December.
    13. Priyam Das, 2023. "Black-box optimization on hyper-rectangle using Recursive Modified Pattern Search and application to ROC-based Classification Problem," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(2), pages 365-404, November.
    14. Deininger, Klaus W. & Liu, Yanyan, 2008. "Economic and Social Impacts of Self-Help Groups in India," 2008 Annual Meeting, July 27-29, 2008, Orlando, Florida 6482, American Agricultural Economics Association (New Name 2008: Agricultural and Applied Economics Association).
    15. Douglas E. Schaubel & Guanghui Wei, 2011. "Double Inverse-Weighted Estimation of Cumulative Treatment Effects Under Nonproportional Hazards and Dependent Censoring," Biometrics, The International Biometric Society, vol. 67(1), pages 29-38, March.
    16. Zhiping Qiu & Jing Qin & Yong Zhou, 2016. "Composite Estimating Equation Method for the Accelerated Failure Time Model with Length-biased Sampling Data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(2), pages 396-415, June.
    17. Salvatore Tedesco & Martina Andrulli & Markus Åkerlund Larsson & Daniel Kelly & Antti Alamäki & Suzanne Timmons & John Barton & Joan Condell & Brendan O’Flynn & Anna Nordström, 2021. "Comparison of Machine Learning Techniques for Mortality Prediction in a Prospective Cohort of Older Adults," IJERPH, MDPI, vol. 18(23), pages 1-18, December.
    18. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2006. "Moving the Goalposts: Addressing Limited Overlap in the Estimation of Average Treatment Effects by Changing the Estimand," NBER Technical Working Papers 0330, National Bureau of Economic Research, Inc.
    19. Greg DiRienzo, 2004. "Nonparametric Comparison of Two Survival-Time Distributions in the Presence of Dependent Censoring," Harvard University Biostatistics Working Paper Series 1000, Berkeley Electronic Press.
    20. Włoch, Renata & Śledziewska, Katarzyna & Rożynek, Satia, 2025. "Who's afraid of automation? Examining determinants of fear of automation in six European countries," Technology in Society, Elsevier, vol. 81(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:1:p:51-65. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.