IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v72y2016i4p1037-1045.html
   My bibliography  Save this article

Robust risk prediction with biomarkers under two‐phase stratified cohort design

Author

Listed:
  • Rebecca Payne
  • Ming Yang
  • Yingye Zheng
  • Majken K. Jensen
  • Tianxi Cai

Abstract

Identification of novel biomarkers for risk prediction is important for disease prevention and optimal treatment selection. However, studies aiming to discover which biomarkers are useful for risk prediction often require the use of stored biological samples from large assembled cohorts, and thus the depletion of a finite and precious resource. To make efficient use of such stored samples, two‐phase sampling designs are often adopted as resource‐efficient sampling strategies, especially when the outcome of interest is rare. Existing methods for analyzing data from two‐phase studies focus primarily on single marker analysis or fitting the Cox regression model to combine information from multiple markers. However, the Cox model may not fit the data well. Under model misspecification, the composite score derived from the Cox model may not perform well in predicting the outcome. Under a general two‐phase stratified cohort sampling design, we present a novel approach to combining multiple markers to optimize prediction by fitting a flexible nonparametric transformation model. Using inverse probability weighting to account for the outcome‐dependent sampling, we propose to estimate the model parameters by maximizing an objective function which can be interpreted as a weighted C‐statistic for survival outcomes. Regardless of model adequacy, the proposed procedure yields a sensible composite risk score for prediction. A major obstacle for making inference under two phase studies is due to the correlation induced by the finite population sampling, which prevents standard inference procedures such as the bootstrap from being used for variance estimation. We propose a resampling procedure to derive valid confidence intervals for the model parameters and the C‐statistic accuracy measure. We illustrate the new methods with simulation studies and an analysis of a two‐phase study of high‐density lipoprotein cholesterol (HDL‐C) subtypes for predicting the risk of coronary heart disease.

Suggested Citation

  • Rebecca Payne & Ming Yang & Yingye Zheng & Majken K. Jensen & Tianxi Cai, 2016. "Robust risk prediction with biomarkers under two‐phase stratified cohort design," Biometrics, The International Biometric Society, vol. 72(4), pages 1037-1045, December.
  • Handle: RePEc:bla:biomet:v:72:y:2016:i:4:p:1037-1045
    DOI: 10.1111/biom.12515
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.12515
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.12515?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ornulf Borgan & Espen F. Olsen, 1999. "The Efficiency of Simple and Counter‐matched Nested Case‐control Sampling," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 26(4), pages 493-509, December.
    2. Cai, Tianxi & Zheng, Yingye, 2011. "Nonparametric Evaluation of Biomarker Accuracy Under Nested Case-Control Studies," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 569-580.
    3. Norman E. Breslow & Jon A. Wellner, 2007. "Weighted Likelihood for Semiparametric Models and Two‐phase Stratified Samples, with Application to Cox Regression," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 34(1), pages 86-102, March.
    4. S. Kang & J. Cai, 2009. "Marginal hazards model for case-cohort studies with multiple disease outcomes," Biometrika, Biometrika Trust, vol. 96(4), pages 887-901.
    5. Uno, Hajime & Cai, Tianxi & Tian, Lu & Wei, L.J., 2007. "Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 527-537, June.
    6. Sven Ove Samuelsen & Hallvard Ånestad & Anders Skrondal, 2007. "Stratified Case‐Cohort Analysis of General Cohort Sampling Designs," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 34(1), pages 103-119, March.
    7. Dandan Liu & Tianxi Cai & Yingye Zheng, 2012. "Evaluating the Predictive Value of Biomarkers with Stratified Case-Cohort Design," Biometrics, The International Biometric Society, vol. 68(4), pages 1219-1227, December.
    8. Tianxi Cai & Yingye Zheng, 2013. "Resampling Procedures for Making Inference Under Nested Case--Control Studies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1532-1544, December.
    9. Khan, Shakeeb & Tamer, Elie, 2007. "Partial rank estimation of duration models with general forms of censoring," Journal of Econometrics, Elsevier, vol. 136(1), pages 251-280, January.
    10. Yijian Huang, 2014. "Bootstrap for the case-cohort design," Biometrika, Biometrika Trust, vol. 101(2), pages 465-476.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ying Yan & Haibo Zhou & Jianwen Cai, 2017. "Improving efficiency of parameter estimation in case-cohort studies with multivariate failure time data," Biometrics, The International Biometric Society, vol. 73(3), pages 1042-1052, September.
    2. Qingning Zhou & Jianwen Cai & Haibo Zhou, 2018. "Outcome†dependent sampling with interval†censored failure time data," Biometrics, The International Biometric Society, vol. 74(1), pages 58-67, March.
    3. Soave, David & Lawless, Jerald F., 2023. "Regularized regression for two phase failure time studies," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    4. Jon Arni Steingrimsson & Robert L. Strawderman, 2017. "Estimation in the Semiparametric Accelerated Failure Time Model With Missing Covariates: Improving Efficiency Through Augmentation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1221-1235, July.
    5. Yayun Xu & Soyoung Kim & Mei-Jie Zhang & David Couper & Kwang Woo Ahn, 2022. "Competing risks regression models with covariates-adjusted censoring weight under the generalized case-cohort design," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(2), pages 241-262, April.
    6. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Conditional screening for ultrahigh-dimensional survival data in case-cohort studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(4), pages 632-661, October.
    7. Sangwook Kang & Jianwen Cai, 2009. "Marginal Hazards Regression for Retrospective Studies within Cohort with Possibly Correlated Failure Time Data," Biometrics, The International Biometric Society, vol. 65(2), pages 405-414, June.
    8. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Feature screening for case‐cohort studies with failure time outcome," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 349-370, March.
    9. Jieli Ding & Tsui-Shan Lu & Jianwen Cai & Haibo Zhou, 2017. "Recent progresses in outcome-dependent sampling with failure time data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(1), pages 57-82, January.
    10. Olli Saarela & James A. Hanley, 2015. "Case-base methods for studying vaccination safety," Biometrics, The International Biometric Society, vol. 71(1), pages 42-52, March.
    11. Stephanie Chan & Xuan Wang & Ina Jazić & Sarah Peskoe & Yingye Zheng & Tianxi Cai, 2021. "Developing and evaluating risk prediction models with panel current status data," Biometrics, The International Biometric Society, vol. 77(2), pages 599-609, June.
    12. Ismaël Mourifié & Marc Henry & Romuald Méango, 2020. "Sharp Bounds and Testability of a Roy Model of STEM Major Choices," Journal of Political Economy, University of Chicago Press, vol. 128(8), pages 3220-3283.
    13. Patrice Bertail & Emilie Chautru & Stephan Clémençon, 2017. "Empirical Processes in Survey Sampling with (Conditional) Poisson Designs," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(1), pages 97-111, March.
    14. Yu Zheng & Tianxi Cai, 2017. "Augmented estimation for t‐year survival with censored regression models," Biometrics, The International Biometric Society, vol. 73(4), pages 1169-1178, December.
    15. Yanyuan Ma & Yuanjia Wang, 2014. "Estimating disease onset distribution functions in mutation carriers with censored mixture data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(1), pages 1-23, January.
    16. Sokbae Lee & Myung Hwan Seo & Youngki Shin, 2017. "Correction," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 883-883, April.
    17. Paul Frédéric Blanche & Anders Holt & Thomas Scheike, 2023. "On logistic regression with right censored data, with or without competing risks, and its use for estimating treatment effects," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 441-482, April.
    18. Youngki Shin & Zvezdomir Todorov, 2021. "Exact computation of maximum rank correlation estimator," The Econometrics Journal, Royal Economic Society, vol. 24(3), pages 589-607.
    19. Subbotin, Viktor, 2007. "Asymptotic and bootstrap properties of rank regressions," MPRA Paper 9030, University Library of Munich, Germany, revised 20 Mar 2008.
    20. Soyoung Kim & Donglin Zeng & Jianwen Cai, 2018. "Analysis of multiple survival events in generalized case‐cohort designs," Biometrics, The International Biometric Society, vol. 74(4), pages 1250-1260, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:72:y:2016:i:4:p:1037-1045. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.