IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v78y2022i3p852-866.html
   My bibliography  Save this article

Multivariate survival analysis in big data: A divide‐and‐combine approach

Author

Listed:
  • Wei Wang
  • Shou‐En Lu
  • Jerry Q. Cheng
  • Minge Xie
  • John B. Kostis

Abstract

Multivariate failure time data are frequently analyzed using the marginal proportional hazards models and the frailty models. When the sample size is extraordinarily large, using either approach could face computational challenges. In this paper, we focus on the marginal model approach and propose a divide‐and‐combine method to analyze large‐scale multivariate failure time data. Our method is motivated by the Myocardial Infarction Data Acquisition System (MIDAS), a New Jersey statewide database that includes 73,725,160 admissions to nonfederal hospitals and emergency rooms (ERs) from 1995 to 2017. We propose to randomly divide the full data into multiple subsets and propose a weighted method to combine these estimators obtained from individual subsets using three weights. Under mild conditions, we show that the combined estimator is asymptotically equivalent to the estimator obtained from the full data as if the data were analyzed all at once. In addition, to screen out risk factors with weak signals, we propose to perform the regularized estimation on the combined estimator using its combined confidence distribution. Theoretical properties, such as consistency, oracle properties, and asymptotic equivalence between the divide‐and‐combine approach and the full data approach are studied. Performance of the proposed method is investigated using simulation studies. Our method is applied to the MIDAS data to identify risk factors related to multivariate cardiovascular‐related health outcomes.

Suggested Citation

  • Wei Wang & Shou‐En Lu & Jerry Q. Cheng & Minge Xie & John B. Kostis, 2022. "Multivariate survival analysis in big data: A divide‐and‐combine approach," Biometrics, The International Biometric Society, vol. 78(3), pages 852-866, September.
  • Handle: RePEc:bla:biomet:v:78:y:2022:i:3:p:852-866
    DOI: 10.1111/biom.13469
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13469
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13469?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lu Tian & Rui Wang & Tianxi Cai & Lee-Jen Wei, 2011. "The Highest Confidence Density Region and Its Usage for Joint Inferences about Constrained Parameters," Biometrics, The International Biometric Society, vol. 67(2), pages 604-610, June.
    2. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    3. Soyoung Kim & Donglin Zeng & Jianwen Cai, 2018. "Analysis of multiple survival events in generalized case‐cohort designs," Biometrics, The International Biometric Society, vol. 74(4), pages 1250-1260, December.
    4. Dungang Liu & Regina Y. Liu & Minge Xie, 2015. "Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 326-340, March.
    5. Tang, Lu & Zhou, Ling & Song, Peter X.-K., 2020. "Distributed simultaneous inference in generalized linear models via confidence distribution," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    6. Hao Helen Zhang & Wenbin Lu, 2007. "Adaptive Lasso for Cox's proportional hazards model," Biometrika, Biometrika Trust, vol. 94(3), pages 691-703.
    7. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    8. Kauermann G. & Carroll R.J., 2001. "A Note on the Efficiency of Sandwich Covariance Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1387-1396, December.
    9. Min-ge Xie & Kesar Singh, 2013. "Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review," International Statistical Review, International Statistical Institute, vol. 81(1), pages 3-39, April.
    10. Georg Heinze & Michael Schemper, 2001. "A Solution to the Problem of Monotone Likelihood in Cox Regression," Biometrics, The International Biometric Society, vol. 57(1), pages 114-119, March.
    11. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    12. Jianwen Cai & Jianqing Fan & Runze Li & Haibo Zhou, 2005. "Variable selection for multivariate failure time data," Biometrika, Biometrika Trust, vol. 92(2), pages 303-316, June.
    13. Wang, Hansheng & Leng, Chenlei, 2007. "Unified LASSO Estimation by Least Squares Approximation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1039-1048, September.
    14. Michael I. Jordan & Jason D. Lee & Yun Yang, 2019. "Communication-Efficient Distributed Statistical Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 668-681, April.
    15. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    16. Hansheng Wang & Runze Li & Chih-Ling Tsai, 2007. "Tuning parameter selectors for the smoothly clipped absolute deviation method," Biometrika, Biometrika Trust, vol. 94(3), pages 553-568.
    17. D. Y. Lin & D. Zeng, 2010. "On the relative efficiency of using summary statistics versus individual-level data in meta-analysis," Biometrika, Biometrika Trust, vol. 97(2), pages 321-332.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tang, Lu & Zhou, Ling & Song, Peter X.-K., 2020. "Distributed simultaneous inference in generalized linear models via confidence distribution," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    2. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    3. Zhixuan Fu & Shuangge Ma & Haiqun Lin & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized Variable Selection for Multi-center Competing Risks Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(2), pages 379-405, December.
    4. Xingwei Tong & Xin He & Liuquan Sun & Jianguo Sun, 2009. "Variable Selection for Panel Count Data via Non‐Concave Penalized Estimating Function," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(4), pages 620-635, December.
    5. Wei Qian & Yuhong Yang, 2013. "Model selection via standard error adjusted adaptive lasso," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 65(2), pages 295-318, April.
    6. Hao, Meiling & Lin, Yunyuan & Zhao, Xingqiu, 2016. "A relative error-based approach for variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 250-262.
    7. Yanxin Wang & Qibin Fan & Li Zhu, 2018. "Variable selection and estimation using a continuous approximation to the $$L_0$$ L 0 penalty," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(1), pages 191-214, February.
    8. Li, Jianbo & Gu, Minggao, 2012. "Adaptive LASSO for general transformation models with right censored data," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2583-2597.
    9. Zhixuan Fu & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized variable selection in competing risks regression," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 353-376, July.
    10. Pötscher, Benedikt M., 2007. "Confidence Sets Based on Sparse Estimators Are Necessarily Large," MPRA Paper 5677, University Library of Munich, Germany.
    11. Tang, Linjun & Zhou, Zhangong & Wu, Changchun, 2012. "Weighted composite quantile estimation and variable selection method for censored regression model," Statistics & Probability Letters, Elsevier, vol. 82(3), pages 653-663.
    12. Li, Xinjue & Zboňáková, Lenka & Wang, Weining & Härdle, Wolfgang Karl, 2019. "Combining Penalization and Adaption in High Dimension with Application in Bond Risk Premia Forecasting," IRTG 1792 Discussion Papers 2019-030, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    13. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    14. Guang Cheng & Hao Zhang & Zuofeng Shang, 2015. "Sparse and efficient estimation for partial spline models with increasing dimension," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(1), pages 93-127, February.
    15. Ramon I. Garcia & Joseph G. Ibrahim & Hongtu Zhu, 2010. "Variable Selection in the Cox Regression Model with Covariates Missing at Random," Biometrics, The International Biometric Society, vol. 66(1), pages 97-104, March.
    16. Xin Cheng & Wenbin Lu & Mengling Liu, 2015. "Identification of homogeneous and heterogeneous variables in pooled cohort studies," Biometrics, The International Biometric Society, vol. 71(2), pages 397-403, June.
    17. Na You & Shun He & Xueqin Wang & Junxian Zhu & Heping Zhang, 2018. "Subtype classification and heterogeneous prognosis model construction in precision medicine," Biometrics, The International Biometric Society, vol. 74(3), pages 814-822, September.
    18. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    19. Ping Zeng & Yongyue Wei & Yang Zhao & Jin Liu & Liya Liu & Ruyang Zhang & Jianwei Gou & Shuiping Huang & Feng Chen, 2014. "Variable selection approach for zero-inflated count data via adaptive lasso," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(4), pages 879-894, April.
    20. Michael R. Wierzbicki & Li-Bing Guo & Qing-Tao Du & Wensheng Guo, 2014. "Sparse Semiparametric Nonlinear Model With Application to Chromatographic Fingerprints," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1339-1349, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:78:y:2022:i:3:p:852-866. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.