IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i1p190-202.html

Risk prediction with imperfect survival outcome information from electronic health records

Author

Listed:
  • Jue Hou
  • Stephanie F. Chan
  • Xuan Wang
  • Tianxi Cai

Abstract

Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow‐up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one‐step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.

Suggested Citation

  • Jue Hou & Stephanie F. Chan & Xuan Wang & Tianxi Cai, 2023. "Risk prediction with imperfect survival outcome information from electronic health records," Biometrics, The International Biometric Society, vol. 79(1), pages 190-202, March.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:1:p:190-202
    DOI: 10.1111/biom.13599
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13599
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13599?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Donglin Zeng & Lu Mao & D. Y. Lin, 2016. "Maximum likelihood estimation for semiparametric transformation models with interval-censored data," Biometrika, Biometrika Trust, vol. 103(2), pages 253-271.
    2. Chen, Ling & Sun, Jianguo, 2010. "A multiple imputation approach to the analysis of interval-censored failure time data with the additive hazards model," Computational Statistics & Data Analysis, Elsevier, vol. 54(4), pages 1109-1116, April.
    3. Laber, Eric B. & Murphy, Susan A., 2011. "Adaptive Confidence Intervals for the Test Error in Classification," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 904-913.
    4. Lu Tian & Tianxi Cai, 2006. "On the accelerated failure time model for current status and interval censored data," Biometrika, Biometrika Trust, vol. 93(2), pages 329-342, June.
    5. Han, Aaron K., 1987. "Non-parametric analysis of a generalized regression model : The maximum rank correlation estimator," Journal of Econometrics, Elsevier, vol. 35(2-3), pages 303-316, July.
    6. Sherman, Robert P, 1993. "The Limiting Distribution of the Maximum Rank Correlation Estimator," Econometrica, Econometric Society, vol. 61(1), pages 123-137, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Khan, Shakeeb & Tamer, Elie, 2007. "Partial rank estimation of duration models with general forms of censoring," Journal of Econometrics, Elsevier, vol. 136(1), pages 251-280, January.
    2. Jochmans, Koen, 2015. "Multiplicative-error models with sample selection," Journal of Econometrics, Elsevier, vol. 184(2), pages 315-327.
    3. Ming-Yueh Huang & Chin-Tsang Chiang, 2017. "Estimation and Inference Procedures for Semiparametric Distribution Models with Varying Linear-Index," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(2), pages 396-424, June.
    4. repec:spo:wpmain:info:hdl:2441/dambferfb7dfprc9m01h6f4h2 is not listed on IDEAS
    5. Koen Jochmans, 2011. "Identification in Bivariate binary-choice Models with elliptical innovations," Working Papers hal-01069483, HAL.
    6. Yiwei Fan & Jiaqi Gu & Guosheng Yin, 2023. "Sparse concordance‐based ordinal classification," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(3), pages 934-961, September.
    7. Margaret Sullivan Pepe & Tianxi Cai & Gary Longton, 2006. "Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve," Biometrics, The International Biometric Society, vol. 62(1), pages 221-229, March.
    8. repec:spo:wpmain:info:hdl:2441/3vl5fe4i569nbr005tctlc8ll5 is not listed on IDEAS
    9. Aradillas-Lopez, Andres, 2024. "Inference in models with partially identified control functions," Journal of Econometrics, Elsevier, vol. 238(1).
    10. Khan, Shakeeb, 2001. "Two-stage rank estimation of quantile index models," Journal of Econometrics, Elsevier, vol. 100(2), pages 319-355, February.
    11. Feng, Long & Zou, Changliang & Wang, Zhaojun, 2012. "Rank-based inference for the single-index model," Statistics & Probability Letters, Elsevier, vol. 82(3), pages 535-541.
    12. Bijwaard Govert E. & Ridder Geert & Woutersen Tiemen, 2013. "A Simple GMM Estimator for the Semiparametric Mixed Proportional Hazard Model," Journal of Econometric Methods, De Gruyter, vol. 2(1), pages 1-23, July.
    13. Christoph Breunig & Stephan Martin, 2020. "Nonclassical Measurement Error in the Outcome Variable," Papers 2009.12665, arXiv.org, revised May 2021.
    14. Coppejans, Mark, 2001. "Estimation of the binary response model using a mixture of distributions estimator (MOD)," Journal of Econometrics, Elsevier, vol. 102(2), pages 231-269, June.
    15. Hausman, Jerry A. & Woutersen, Tiemen, 2014. "Estimating a semi-parametric duration model without specifying heterogeneity," Journal of Econometrics, Elsevier, vol. 178(P1), pages 114-131.
    16. Liu, Tianqing & Yuan, Xiaohui & Sun, Jianguo, 2021. "Weighted rank estimation for nonparametric transformation models with nonignorable missing data," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    17. Bhattacharjee, Arnab, 2009. "Testing for Proportional Hazards with Unrestricted Univariate Unobserved Heterogeneity," SIRE Discussion Papers 2009-22, Scottish Institute for Research in Economics (SIRE).
    18. Zhang, Haixiang & Huang, Jian & Sun, Liuquan, 2020. "A rank-based approach to estimating monotone individualized two treatment regimes," Computational Statistics & Data Analysis, Elsevier, vol. 151(C).
    19. Caiyun Fan & Wenbin Lu & Rui Song & Yong Zhou, 2017. "Concordance-assisted learning for estimating optimal individualized treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(5), pages 1565-1582, November.
    20. Gorgens, Tue & Horowitz, Joel L., 1999. "Semiparametric estimation of a censored regression model with an unknown transformation of the dependent variable," Journal of Econometrics, Elsevier, vol. 90(2), pages 155-191, June.
    21. Xiaohong Chen & Elie Tamer & Qingsong Yao, 2026. "Online Learning in Semiparametric Econometric Models," Papers 2603.08614, arXiv.org.
    22. Subbotin, Viktor, 2007. "Asymptotic and bootstrap properties of rank regressions," MPRA Paper 9030, University Library of Munich, Germany, revised 20 Mar 2008.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:1:p:190-202. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.