IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i3p2357-2369.html

CEDAR: communication efficient distributed analysis for regressions

Author

Listed:
  • Changgee Chang
  • Zhiqi Bu
  • Qi Long

Abstract

Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often the case that patient‐level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning over multiple EHRs databases without sharing patient‐level data. To tackle such challenges, we propose a novel communication efficient method that aggregates the optimal estimates of external sites, by turning the problem into a missing data problem. In addition, we propose incorporating posterior samples of remote sites, which can provide partial information on the missing quantities and improve efficiency of parameter estimates while having the differential privacy property and thus reducing the risk of information leaking. The proposed approach, without sharing the raw patient level data, allows for proper statistical inference. We provide theoretical investigation for the asymptotic properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses in comparison with several recently developed methods.

Suggested Citation

  • Changgee Chang & Zhiqi Bu & Qi Long, 2023. "CEDAR: communication efficient distributed analysis for regressions," Biometrics, The International Biometric Society, vol. 79(3), pages 2357-2369, September.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:3:p:2357-2369
    DOI: 10.1111/biom.13786
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13786
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13786?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Min-ge Xie & Kesar Singh, 2013. "Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review," International Statistical Review, International Statistical Institute, vol. 81(1), pages 3-39, April.
    2. Ariel Kleiner & Ameet Talwalkar & Purnamrita Sarkar & Michael I. Jordan, 2014. "A scalable bootstrap for massive data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 795-816, September.
    3. Tang, Lu & Zhou, Ling & Song, Peter X.-K., 2020. "Distributed simultaneous inference in generalized linear models via confidence distribution," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    4. Michael I. Jordan & Jason D. Lee & Yun Yang, 2019. "Communication-Efficient Distributed Statistical Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 668-681, April.
    5. D. Y. Lin & D. Zeng, 2010. "On the relative efficiency of using summary statistics versus individual-level data in meta-analysis," Biometrika, Biometrika Trust, vol. 97(2), pages 321-332.
    6. Cun-Hui Zhang & Stephanie S. Zhang, 2014. "Confidence intervals for low dimensional parameters in high dimensional linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 217-242, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tang, Lu & Zhou, Ling & Song, Peter X.-K., 2020. "Distributed simultaneous inference in generalized linear models via confidence distribution," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    2. Wei Wang & Shou‐En Lu & Jerry Q. Cheng & Minge Xie & John B. Kostis, 2022. "Multivariate survival analysis in big data: A divide‐and‐combine approach," Biometrics, The International Biometric Society, vol. 78(3), pages 852-866, September.
    3. Bingyao Huang & Yanyan Liu & Liuhua Peng, 2023. "Distributed inference for two‐sample U‐statistics in massive data analysis," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(3), pages 1090-1115, September.
    4. Xiang, Pengcheng & Zhou, Ling & Tang, Lu, 2024. "Transfer learning via random forests: A one-shot federated approach," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    5. Wang, Xiaoqian & Kang, Yanfei & Hyndman, Rob J. & Li, Feng, 2023. "Distributed ARIMA models for ultra-long time series," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1163-1184.
    6. Xingcai Zhou & Zhaoyang Jing & Chao Huang, 2024. "Distributed Bootstrap Simultaneous Inference for High-Dimensional Quantile Regression," Mathematics, MDPI, vol. 12(5), pages 1-53, February.
    7. Luo, Jiyu & Sun, Qiang & Zhou, Wen-Xin, 2022. "Distributed adaptive Huber regression," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    8. Bingyao Huang & Yanyan Liu & Xin Ye, 2026. "Integrating high-dimensional censored data under privacy constraints via localized computations," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 32(1), pages 1-22, March.
    9. Yaohong Yang & Lei Wang, 2023. "Communication-efficient sparse composite quantile regression for distributed data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 86(3), pages 261-283, April.
    10. Zhaohan Hou & Wei Ma & Lei Wang, 2023. "Sparse and debiased lasso estimation and inference for high-dimensional composite quantile regression with distributed data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(4), pages 1230-1250, December.
    11. Lu Lin & Feng Li, 2023. "Global debiased DC estimations for biased estimators via pro forma regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 726-758, June.
    12. Dungang Liu & Regina Y. Liu & Minge Xie, 2015. "Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 326-340, March.
    13. Li, Xing & Peng, Yanjing & Wang, Lei, 2025. "Communication-efficient estimation and inference for high-dimensional longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 208(C).
    14. Delbianco Fernando & Tohmé Fernando, 2023. "What is a relevant control?: An algorithmic proposal," Asociación Argentina de Economía Política: Working Papers 4643, Asociación Argentina de Economía Política.
    15. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Haixiang Zhang & Jian Huang & Liuquan Sun, 2022. "Projection‐based and cross‐validated estimation in high‐dimensional Cox model," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(1), pages 353-372, March.
    17. Kaspar Wuthrich & Ying Zhu, 2019. "Omitted variable bias of Lasso-based inference methods: A finite sample analysis," Papers 1903.08704, arXiv.org, revised Sep 2021.
    18. Jelena Bradic & Weijie Ji & Yuqian Zhang, 2021. "High-dimensional Inference for Dynamic Treatment Effects," Papers 2110.04924, arXiv.org, revised May 2023.
    19. Xuhua Liu & Xingzhong Xu, 2016. "Confidence distribution inferences in one-way random effects model," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(1), pages 59-74, March.
    20. Chenchuan (Mark) Li & Ulrich K. Müller, 2021. "Linear regression with many controls of limited explanatory power," Quantitative Economics, Econometric Society, vol. 12(2), pages 405-442, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:3:p:2357-2369. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.