IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v38y2023i1d10.1007_s00180-022-01242-3.html
   My bibliography  Save this article

Model aggregation for doubly divided data with large size and large dimension

Author

Listed:
  • Baihua He

    (Wuhan University)

  • Yanyan Liu

    (Wuhan University)

  • Guosheng Yin

    (University of Hong Kong)

  • Yuanshan Wu

    (Zhongnan University of Economics and Law)

Abstract

Massive data are often featured with high dimensionality as well as large sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of a response variable, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.

Suggested Citation

  • Baihua He & Yanyan Liu & Guosheng Yin & Yuanshan Wu, 2023. "Model aggregation for doubly divided data with large size and large dimension," Computational Statistics, Springer, vol. 38(1), pages 509-529, March.
  • Handle: RePEc:spr:compst:v:38:y:2023:i:1:d:10.1007_s00180-022-01242-3
    DOI: 10.1007/s00180-022-01242-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-022-01242-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-022-01242-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ariel Kleiner & Ameet Talwalkar & Purnamrita Sarkar & Michael I. Jordan, 2014. "A scalable bootstrap for massive data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 795-816, September.
    2. Jana Eklund & Sune Karlsson, 2007. "Forecast Combination and Model Averaging Using Predictive Measures," Econometric Reviews, Taylor & Francis Journals, vol. 26(2-4), pages 329-363.
    3. Xinyu Zhang & Guohua Zou & Hua Liang, 2014. "Model averaging and weight choice in linear mixed-effects models," Biometrika, Biometrika Trust, vol. 101(1), pages 205-218.
    4. Tomohiro Ando & Ker-Chau Li, 2014. "A Model-Averaging Approach for High-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 254-265, March.
    5. Wan, Alan T.K. & Zhang, Xinyu & Zou, Guohua, 2010. "Least squares model averaging by Mallows criterion," Journal of Econometrics, Elsevier, vol. 156(2), pages 277-283, June.
    6. Hansen, Bruce E. & Racine, Jeffrey S., 2012. "Jackknife model averaging," Journal of Econometrics, Elsevier, vol. 167(1), pages 38-46.
    7. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    8. Liang, Hua & Zou, Guohua & Wan, Alan T. K. & Zhang, Xinyu, 2011. "Optimal Weight Choice for Frequentist Model Average Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 1053-1066.
    9. Bruce E. Hansen, 2007. "Least Squares Model Averaging," Econometrica, Econometric Society, vol. 75(4), pages 1175-1189, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yan, Xiaodong & Wang, Hongni & Wang, Wei & Xie, Jinhan & Ren, Yanyan & Wang, Xinjun, 2021. "Optimal model averaging forecasting in high-dimensional survival analysis," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1147-1155.
    2. Fang, Fang & Li, Jialiang & Xia, Xiaochao, 2022. "Semiparametric model averaging prediction for dichotomous response," Journal of Econometrics, Elsevier, vol. 229(2), pages 219-245.
    3. Jingwen Tu & Hu Yang & Chaohui Guo & Jing Lv, 2021. "Model averaging marginal regression for high dimensional conditional quantile prediction," Statistical Papers, Springer, vol. 62(6), pages 2661-2689, December.
    4. Xinyu Zhang & Dalei Yu & Guohua Zou & Hua Liang, 2016. "Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1775-1790, October.
    5. Haili Zhang & Guohua Zou, 2020. "Cross-Validation Model Averaging for Generalized Functional Linear Model," Econometrics, MDPI, vol. 8(1), pages 1-35, February.
    6. Zhao, Shangwei & Xie, Tian & Ai, Xin & Yang, Guangren & Zhang, Xinyu, 2023. "Correcting sample selection bias with model averaging for consumer demand forecasting," Economic Modelling, Elsevier, vol. 123(C).
    7. Michael Schomaker & Christian Heumann, 2020. "When and when not to use optimal model averaging," Statistical Papers, Springer, vol. 61(5), pages 2221-2240, October.
    8. Zhang, Xinyu & Yu, Jihai, 2018. "Spatial weights matrix selection and model averaging for spatial autoregressive models," Journal of Econometrics, Elsevier, vol. 203(1), pages 1-18.
    9. Rongjie Jiang & Liming Wang & Yang Bai, 2021. "Optimal model averaging estimator for semi-functional partially linear models," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(2), pages 167-194, February.
    10. Haowen Bao & Zongwu Cai & Yuying Sun & Shouyang Wang, 2023. "Penalized Model Averaging for High Dimensional Quantile Regressions," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202302, University of Kansas, Department of Economics, revised Jan 2023.
    11. Peng, Jingfu & Yang, Yuhong, 2022. "On improvability of model selection by model averaging," Journal of Econometrics, Elsevier, vol. 229(2), pages 246-262.
    12. Jia Chen & Degui Li & Oliver Linton & Zudi Lu, 2015. "Semiparametric model averaging of ultra-high dimensional time series," CeMMAP working papers CWP62/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    13. Shou-Yung Yin & Chu-An Liu & Chang-Ching Lin, 2021. "Focused Information Criterion and Model Averaging for Large Panels With a Multifactor Error Structure," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(1), pages 54-68, January.
    14. Wei, Yuting & Wang, Qihua, 2021. "Cross-validation-based model averaging in linear models with response missing at random," Statistics & Probability Letters, Elsevier, vol. 171(C).
    15. Gao, Yan & Zhang, Xinyu & Wang, Shouyang & Zou, Guohua, 2016. "Model averaging based on leave-subject-out cross-validation," Journal of Econometrics, Elsevier, vol. 192(1), pages 139-151.
    16. Zhang, Xinyu & Liu, Chu-An, 2023. "Model averaging prediction by K-fold cross-validation," Journal of Econometrics, Elsevier, vol. 235(1), pages 280-301.
    17. Yuan, Chaoxia & Fang, Fang & Ni, Lyu, 2022. "Mallows model averaging with effective model size in fragmentary data prediction," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    18. Liao, Jun & Zou, Guohua & Gao, Yan & Zhang, Xinyu, 2021. "Model averaging prediction for time series models with a diverging number of parameters," Journal of Econometrics, Elsevier, vol. 223(1), pages 190-221.
    19. Liao, Jun & Zong, Xianpeng & Zhang, Xinyu & Zou, Guohua, 2019. "Model averaging based on leave-subject-out cross-validation for vector autoregressions," Journal of Econometrics, Elsevier, vol. 209(1), pages 35-60.
    20. Jan R. Magnus & Wendun Wang & Xinyu Zhang, 2016. "Weighted-Average Least Squares Prediction," Econometric Reviews, Taylor & Francis Journals, vol. 35(6), pages 1040-1074, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:38:y:2023:i:1:d:10.1007_s00180-022-01242-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.