IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v114y2019i526p668-681.html
   My bibliography  Save this article

Communication-Efficient Distributed Statistical Inference

Author

Listed:
  • Michael I. Jordan
  • Jason D. Lee
  • Yun Yang

Abstract

We present a communication-efficient surrogate likelihood (CSL) framework for solving distributed statistical inference problems. CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation, and Bayesian inference. For low-dimensional estimation, CSL provably improves upon naive averaging schemes and facilitates the construction of confidence intervals. For high-dimensional regularized estimation, CSL leads to a minimax-optimal estimator with controlled communication cost. For Bayesian inference, CSL can be used to form a communication-efficient quasi-posterior distribution that converges to the true posterior. This quasi-posterior procedure significantly improves the computational efficiency of Markov chain Monte Carlo (MCMC) algorithms even in a nondistributed setting. We present both theoretical analysis and experiments to explore the properties of the CSL approximation. Supplementary materials for this article are available online.

Suggested Citation

  • Michael I. Jordan & Jason D. Lee & Yun Yang, 2019. "Communication-Efficient Distributed Statistical Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 668-681, April.
  • Handle: RePEc:taf:jnlasa:v:114:y:2019:i:526:p:668-681
    DOI: 10.1080/01621459.2018.1429274
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2018.1429274
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2018.1429274?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wang, Kangning & Li, Shaomin, 2021. "Robust distributed modal regression for massive data," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    2. Yang, Yaohong & Wang, Lei & Liu, Jiamin & Li, Rui & Lian, Heng, 2023. "Communication-efficient estimation of quantile matrix regression for massive datasets," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    3. Guangbao Guo & Guoqi Qian & Lu Lin & Wei Shao, 2021. "Parallel inference for big data with the group Bayesian method," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(2), pages 225-243, February.
    4. Wang, Xiaoqian & Kang, Yanfei & Hyndman, Rob J. & Li, Feng, 2023. "Distributed ARIMA models for ultra-long time series," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1163-1184.
    5. Chen, Canyi & Xu, Wangli & Zhu, Liping, 2022. "Distributed estimation in heterogeneous reduced rank regression: With application to order determination in sufficient dimension reduction," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    6. Shaomin Li & Kangning Wang & Yong Xu, 2023. "Robust estimation for nonrandomly distributed data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(3), pages 493-509, June.
    7. Changgee Chang & Zhiqi Bu & Qi Long, 2023. "CEDAR: communication efficient distributed analysis for regressions," Biometrics, The International Biometric Society, vol. 79(3), pages 2357-2369, September.
    8. Bikram Karmakar & Peng Liu & Gourab Mukherjee & Hai Che & Shantanu Dutta, 2022. "Improved retention analysis in freemium role‐playing games by jointly modelling players’ motivation, progression and churn," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 102-133, January.
    9. Benny Ren & Ian Barnett, 2022. "Autoregressive mixture models for clustering time series," Journal of Time Series Analysis, Wiley Blackwell, vol. 43(6), pages 918-937, November.
    10. Lu Lin & Feng Li, 2023. "Global debiased DC estimations for biased estimators via pro forma regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 726-758, June.
    11. Wei Wang & Shou‐En Lu & Jerry Q. Cheng & Minge Xie & John B. Kostis, 2022. "Multivariate survival analysis in big data: A divide‐and‐combine approach," Biometrics, The International Biometric Society, vol. 78(3), pages 852-866, September.
    12. Luo, Jiyu & Sun, Qiang & Zhou, Wen-Xin, 2022. "Distributed adaptive Huber regression," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    13. Xingcai Zhou & Yu Xiang, 2022. "ADMM-Based Differential Privacy Learning for Penalized Quantile Regression on Distributed Functional Data," Mathematics, MDPI, vol. 10(16), pages 1-28, August.
    14. Boris Beranger & Huan Lin & Scott Sisson, 2023. "New models for symbolic data analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 659-699, September.
    15. Xingcai Zhou & Hao Shen, 2022. "Communication-Efficient Distributed Learning for High-Dimensional Support Vector Machines," Mathematics, MDPI, vol. 10(7), pages 1-21, March.
    16. Bao, Yajie & Ren, Haojie, 2023. "Semi-profiled distributed estimation for high-dimensional partially linear model," Computational Statistics & Data Analysis, Elsevier, vol. 188(C).
    17. Zhang, Haixiang & Wang, HaiYing, 2021. "Distributed subdata selection for big data via sampling-based approach," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    18. Yaohong Yang & Lei Wang, 2023. "Communication-efficient sparse composite quantile regression for distributed data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 86(3), pages 261-283, April.
    19. Wang, Kangning & Li, Shaomin & Zhang, Benle, 2021. "Robust communication-efficient distributed composite quantile regression and variable selection for massive data," Computational Statistics & Data Analysis, Elsevier, vol. 161(C).
    20. Shi, Jianwei & Qin, Guoyou & Zhu, Huichen & Zhu, Zhongyi, 2021. "Communication-efficient distributed M-estimation with missing data," Computational Statistics & Data Analysis, Elsevier, vol. 161(C).
    21. Lulu Zuo & Haixiang Zhang & HaiYing Wang & Liuquan Sun, 2021. "Optimal subsample selection for massive logistic regression with distributed data," Computational Statistics, Springer, vol. 36(4), pages 2535-2562, December.
    22. Jiaming Luan & Hongwei Wang & Kangning Wang & Benle Zhang, 2022. "Robust distributed estimation and variable selection for massive datasets via rank regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 435-450, June.
    23. Feifei Wang & Danyang Huang & Tianchen Gao & Shuyuan Wu & Hansheng Wang, 2022. "Sequential one‐step estimator by sub‐sampling for customer churn analysis with massive data sets," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1753-1786, November.
    24. Zhan Liu & Xiaoluo Zhao & Yingli Pan, 2023. "Communication-efficient distributed estimation for high-dimensional large-scale linear regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 86(4), pages 455-485, May.
    25. Zhou, Ping & Yu, Zhen & Ma, Jingyi & Tian, Maozai & Fan, Ye, 2021. "Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:114:y:2019:i:526:p:668-681. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.