IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v77y2021i3p1024-1036.html
   My bibliography  Save this article

Combining primary cohort data with external aggregate information without assuming comparability

Author

Listed:
  • Ziqi Chen
  • Jing Ning
  • Yu Shen
  • Jing Qin

Abstract

In comparative effectiveness research (CER) for rare types of cancer, it is appealing to combine primary cohort data containing detailed tumor profiles together with aggregate information derived from cancer registry databases. Such integration of data may improve statistical efficiency in CER. A major challenge in combining information from different resources, however, is that the aggregate information from the cancer registry databases could be incomparable with the primary cohort data, which are often collected from a single cancer center or a clinical trial. We develop an adaptive estimation procedure, which uses the combined information to determine the degree of information borrowing from the aggregate data of the external resource. We establish the asymptotic properties of the estimators and evaluate the finite sample performance via simulation studies. The proposed method yields a substantial gain in statistical efficiency over the conventional method using the primary cohort only, and avoids undesirable biases when the given external information is incomparable to the primary cohort. We apply the proposed method to evaluate the long‐term effect of trimodality treatment to inflammatory breast cancer (IBC) by tumor subtypes, while combining the IBC patient cohort at The University of Texas MD Anderson Cancer Center and the external aggregate information from the National Cancer Data Base.

Suggested Citation

  • Ziqi Chen & Jing Ning & Yu Shen & Jing Qin, 2021. "Combining primary cohort data with external aggregate information without assuming comparability," Biometrics, The International Biometric Society, vol. 77(3), pages 1024-1036, September.
  • Handle: RePEc:bla:biomet:v:77:y:2021:i:3:p:1024-1036
    DOI: 10.1111/biom.13356
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13356
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13356?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hansheng Wang & Bo Li & Chenlei Leng, 2009. "Shrinkage tuning parameter selection with a diverging number of parameters," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 671-683, June.
    2. Chiung-Yu Huang & Jing Qin & Huei-Ting Tsai, 2016. "Efficient Estimation of the Cox Model with Auxiliary Subgroup Survival Information," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 787-799, April.
    3. Guido W. Imbens & Tony Lancaster, 1994. "Combining Micro and Macro Data in Microeconometric Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 61(4), pages 655-680.
    4. Nilanjan Chatterjee & Yi-Hau Chen & Paige Maas & Raymond J. Carroll, 2016. "Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-Level Information From External Big Data Sources," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 107-117, March.
    5. Jing Qin & Han Zhang & Pengfei Li & Demetrius Albanes & Kai Yu, 2015. "Using covariate-specific disease prevalence information to increase the power of case-control studies," Biometrika, Biometrika Trust, vol. 102(1), pages 169-180.
    6. Ziqi Chen & Man-Lai Tang & Wei Gao & Ning-Zhong Shi, 2014. "New Robust Variable Selection Methods for Linear Regression Models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(3), pages 725-741, September.
    7. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    8. Hansheng Wang & Runze Li & Chih-Ling Tsai, 2007. "Tuning parameter selectors for the smoothly clipped absolute deviation method," Biometrika, Biometrika Trust, vol. 94(3), pages 553-568.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tian Gu & Jeremy Michael George Taylor & Bhramar Mukherjee, 2023. "A synthetic data integration framework to leverage external summary‐level information from heterogeneous populations," Biometrics, The International Biometric Society, vol. 79(4), pages 3831-3845, December.
    2. Yu‐Jen Cheng & Yen‐Chun Liu & Chang‐Yu Tsai & Chiung‐Yu Huang, 2023. "Semiparametric estimation of the transformation model by leveraging external aggregate data in the presence of population heterogeneity," Biometrics, The International Biometric Society, vol. 79(3), pages 1996-2009, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fei Gao & K. C. G. Chan, 2023. "Noniterative adjustment to regression estimators with population‐based auxiliary information for semiparametric models," Biometrics, The International Biometric Society, vol. 79(1), pages 140-150, March.
    2. Ying Sheng & Yifei Sun & Chiung‐Yu Huang & Mi‐Ok Kim, 2022. "Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach," Biometrics, The International Biometric Society, vol. 78(2), pages 679-690, June.
    3. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    4. Yunxiao Chen & Xiaoou Li & Jingchen Liu & Zhiliang Ying, 2017. "Regularized Latent Class Analysis with Application in Cognitive Diagnosis," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 660-692, September.
    5. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    6. Guang Cheng & Hao Zhang & Zuofeng Shang, 2015. "Sparse and efficient estimation for partial spline models with increasing dimension," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(1), pages 93-127, February.
    7. Joel L. Horowitz & Lars Nesheim, 2018. "Using penalized likelihood to select parameters in a random coefficients multinomial logit model," CeMMAP working papers CWP29/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Sakyajit Bhattacharya & Paul McNicholas, 2014. "A LASSO-penalized BIC for mixture model selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(1), pages 45-61, March.
    9. Jie He & Hui Li & Shumei Zhang & Xiaogang Duan, 2019. "Additive hazards model with auxiliary subgroup survival information," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(1), pages 128-149, January.
    10. Dengke Xu & Zhongzhan Zhang & Liucang Wu, 2014. "Variable selection in high-dimensional double generalized linear models," Statistical Papers, Springer, vol. 55(2), pages 327-347, May.
    11. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    12. Ping Zeng & Yongyue Wei & Yang Zhao & Jin Liu & Liya Liu & Ruyang Zhang & Jianwei Gou & Shuiping Huang & Feng Chen, 2014. "Variable selection approach for zero-inflated count data via adaptive lasso," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(4), pages 879-894, April.
    13. Fei Wang & Lu Wang & Peter X.‐K. Song, 2016. "Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements," Biometrics, The International Biometric Society, vol. 72(4), pages 1184-1193, December.
    14. Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," CeMMAP working papers 35/15, Institute for Fiscal Studies.
    15. Eun Ryung Lee & Hohsuk Noh & Byeong U. Park, 2014. "Model Selection via Bayesian Information Criterion for Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 216-229, March.
    16. Joel L. Horowitz, 2015. "Variable selection and estimation in high‐dimensional models," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 48(2), pages 389-407, May.
    17. Hirose, Kei & Tateishi, Shohei & Konishi, Sadanori, 2013. "Tuning parameter selection in sparse regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 59(C), pages 28-40.
    18. Yuta Umezu & Yusuke Shimizu & Hiroki Masuda & Yoshiyuki Ninomiya, 2019. "AIC for the non-concave penalized likelihood method," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(2), pages 247-274, April.
    19. Han Zhang & Lu Deng & William Wheeler & Jing Qin & Kai Yu, 2022. "Integrative analysis of multiple case‐control studies," Biometrics, The International Biometric Society, vol. 78(3), pages 1080-1091, September.
    20. Mozhgan Taavoni & Mohammad Arashi & Samuel Manda, 2023. "Multicollinearity and Linear Predictor Link Function Problems in Regression Modelling of Longitudinal Data," Mathematics, MDPI, vol. 11(3), pages 1-9, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:77:y:2021:i:3:p:1024-1036. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.