IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p1420-1432.html
   My bibliography  Save this article

Design and analysis of two‐phase studies with multivariate longitudinal data

Author

Listed:
  • Chiara Di Gravio
  • Ran Tao
  • Jonathan S. Schildcrout

Abstract

Two‐phase studies are crucial when outcome and covariate data are available in a first‐phase sample (e.g., a cohort study), but costs associated with retrospective ascertainment of a novel exposure limit the size of the second‐phase sample, in whom the exposure is collected. For longitudinal outcomes, one class of two‐phase studies stratifies subjects based on an outcome vector summary (e.g., an average or a slope over time) and oversamples subjects in the extreme value strata while undersampling subjects in the medium‐value stratum. Based on the choice of the summary, two‐phase studies for longitudinal data can increase efficiency of time‐varying and/or time‐fixed exposure parameter estimates. In this manuscript, we extend efficient, two‐phase study designs to multivariate longitudinal continuous outcomes, and we detail two analysis approaches. The first approach is a multiple imputation analysis that combines complete data from subjects selected for phase two with the incomplete data from those not selected. The second approach is a conditional maximum likelihood analysis that is intended for applications where only data from subjects selected for phase two are available. Importantly, we show that both approaches can be applied to secondary analyses of previously conducted two‐phase studies. We examine finite sample operating characteristics of the two approaches and use the Lung Health Study (Connett et al. (1993), Controlled Clinical Trials, 14, 3S–19S) to examine genetic associations with lung function decline over time.

Suggested Citation

  • Chiara Di Gravio & Ran Tao & Jonathan S. Schildcrout, 2023. "Design and analysis of two‐phase studies with multivariate longitudinal data," Biometrics, The International Biometric Society, vol. 79(2), pages 1420-1432, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:1420-1432
    DOI: 10.1111/biom.13616
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13616
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13616?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jonathan S. Schildcrout & Shawn P. Garbett & Patrick J. Heagerty, 2013. "Outcome Vector Dependent Sampling with Longitudinal Continuous Response Data: Stratified Sampling Based on Summary Statistics," Biometrics, The International Biometric Society, vol. 69(2), pages 405-416, June.
    2. Ran Tao & Donglin Zeng & Dan-Yu Lin, 2017. "Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1468-1476, October.
    3. Ran Tao & Donglin Zeng & Nora Franceschini & Kari E. North & Eric Boerwinkle & Dan-Yu Lin, 2015. "Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 560-572, June.
    4. N. E. Breslow & N. Chatterjee, 1999. "Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 48(4), pages 457-468.
    5. Andriy Derkach & Jerald F. Lawless & Lei Sun, 2015. "Score tests for association under response-dependent sampling designs for expensive covariates," Biometrika, Biometrika Trust, vol. 102(4), pages 988-994.
    6. Weaver, Mark A. & Zhou, Haibo, 2005. "An Estimated Likelihood Method for Continuous Outcome Regression Models With Outcome-Dependent Sampling," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 459-469, June.
    7. Haibo Zhou & M. A. Weaver & J. Qin & M. P. Longnecker & M. C. Wang, 2002. "A Semiparametric Empirical Likelihood Method for Data from an Outcome-Dependent Sampling Scheme with a Continuous Outcome," Biometrics, The International Biometric Society, vol. 58(2), pages 413-421, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Brady Ryan & Ananthika Nirmalkanna & Candemir Cigsar & Yildiz E. Yilmaz, 2023. "Evaluation of Designs and Estimation Methods Under Response-Dependent Two-Phase Sampling for Genetic Association Studies," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(2), pages 510-539, July.
    2. Jacob M. Maronge & Ran Tao & Jonathan S. Schildcrout & Paul J. Rathouz, 2023. "Generalized case‐control sampling under generalized linear models," Biometrics, The International Biometric Society, vol. 79(1), pages 332-343, March.
    3. Xiaofei Wang & Haibo Zhou, 2006. "A Semiparametric Empirical Likelihood Method for Biased Sampling Schemes with Auxiliary Covariates," Biometrics, The International Biometric Society, vol. 62(4), pages 1149-1160, December.
    4. Jichang Yu & Haibo Zhou & Jianwen Cai, 2021. "Accelerated failure time model for data from outcome-dependent sampling," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(1), pages 15-37, January.
    5. Xiaofei Wang & Haibo Zhou, 2010. "Design and Inference for Cancer Biomarker Study with an Outcome and Auxiliary-Dependent Subsampling," Biometrics, The International Biometric Society, vol. 66(2), pages 502-511, June.
    6. Qingning Zhou & Jianwen Cai & Haibo Zhou, 2018. "Outcome†dependent sampling with interval†censored failure time data," Biometrics, The International Biometric Society, vol. 74(1), pages 58-67, March.
    7. Jonathan S. Schildcrout & Shawn P. Garbett & Patrick J. Heagerty, 2013. "Outcome Vector Dependent Sampling with Longitudinal Continuous Response Data: Stratified Sampling Based on Summary Statistics," Biometrics, The International Biometric Society, vol. 69(2), pages 405-416, June.
    8. Haibo Zhou & Rui Song & Yuanshan Wu & Jing Qin, 2011. "Statistical Inference for a Two-Stage Outcome-Dependent Sampling Design with a Continuous Outcome," Biometrics, The International Biometric Society, vol. 67(1), pages 194-202, March.
    9. Hu, Jianwei & Chai, Hao, 2013. "Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 96-114.
    10. Gustavo Amorim & Ran Tao & Sarah Lotspeich & Pamela A. Shaw & Thomas Lumley & Bryan E. Shepherd, 2021. "Two‐phase sampling designs for data validation in settings with covariate measurement error and continuous outcome," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1368-1389, October.
    11. Haibo Zhou & Guoyou Qin & Matthew P. Longnecker, 2011. "A Partial Linear Model in the Outcome-Dependent Sampling Setting to Evaluate the Effect of Prenatal PCB Exposure on Cognitive Function in Children," Biometrics, The International Biometric Society, vol. 67(3), pages 876-885, September.
    12. Jieli Ding & Tsui-Shan Lu & Jianwen Cai & Haibo Zhou, 2017. "Recent progresses in outcome-dependent sampling with failure time data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(1), pages 57-82, January.
    13. Jonathan S. Schildcrout & Patrick J. Heagerty, 2011. "Outcome-Dependent Sampling from Existing Cohorts with Longitudinal Binary Response Data: Study Planning and Analysis," Biometrics, The International Biometric Society, vol. 67(4), pages 1583-1593, December.
    14. Qingning Zhou & Jianwen Cai & Haibo Zhou, 2020. "Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 26(1), pages 85-108, January.
    15. Lu Chen & Li Hsu & Kathleen Malone, 2009. "A Frailty-Model-Based Approach to Estimating the Age-Dependent Penetrance Function of Candidate Genes Using Population-Based Case-Control Study Designs: An Application to Data on the BRCA1 Gene," Biometrics, The International Biometric Society, vol. 65(4), pages 1105-1114, December.
    16. Benchimol, Jonathan & El-Shagi, Makram & Saadon, Yossi, 2022. "Do expert experience and characteristics affect inflation forecasts?," Journal of Economic Behavior & Organization, Elsevier, vol. 201(C), pages 205-226.
    17. Judith Clarke & Marsha Courchane, 2004. "Implications of Stratified Sampling for Fair Lending Binary Logit Models," The Journal of Real Estate Finance and Economics, Springer, vol. 30(1), pages 5-31, October.
    18. J. F. Lawless, 2018. "Two-phase outcome-dependent studies for failure times and testing for effects of expensive covariates," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(1), pages 28-44, January.
    19. Schill, Walter & Enders, Dirk & Drescher, Karsten, 2014. "A SAS Package for Logistic Two-Phase Studies," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 57(i09).
    20. Mukherjee, Bhramar & Liu, Ivy, 2009. "A note on bias due to fitting prospective multivariate generalized linear models to categorical outcomes ignoring retrospective sampling schemes," Journal of Multivariate Analysis, Elsevier, vol. 100(3), pages 459-472, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:1420-1432. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.