IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v169y2022ics016794732100253x.html
   My bibliography  Save this article

Distributed adaptive Huber regression

Author

Listed:
  • Luo, Jiyu
  • Sun, Qiang
  • Zhou, Wen-Xin

Abstract

Distributed data naturally arise in scenarios involving multiple sources of observations, each stored at a different location. Directly pooling all the data together is often prohibited due to limited bandwidth and storage, or due to privacy protocols. A new robust distributed algorithm is introduced for fitting linear regressions when data are subject to heavy-tailed and/or asymmetric errors with finite second moments. The algorithm only communicates gradient information at each iteration, and therefore is communication-efficient. To achieve the bias-robustness tradeoff, the key is a novel double-robustification approach that applies on both the local and global objective functions. Statistically, the resulting estimator achieves the centralized nonasymptotic error bound as if all the data were pooled together and came from a distribution with sub-Gaussian tails. Under a finite (2+δ)-th moment condition, a Berry-Esseen bound for the distributed estimator is established, based on which robust confidence intervals are constructed. In high dimensions, the proposed doubly-robustified loss function is complemented with ℓ1-penalization for fitting sparse linear models with distributed data. Numerical studies further confirm that compared with extant distributed methods, the proposed methods achieve near-optimal accuracy with low variability and better coverage with tighter confidence width.

Suggested Citation

  • Luo, Jiyu & Sun, Qiang & Zhou, Wen-Xin, 2022. "Distributed adaptive Huber regression," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
  • Handle: RePEc:eee:csdana:v:169:y:2022:i:c:s016794732100253x
    DOI: 10.1016/j.csda.2021.107419
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794732100253X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107419?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lan Wang & Bo Peng & Runze Li, 2015. "A High-Dimensional Nonparametric Multivariate Test for Mean Vector," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1658-1669, December.
    2. Runze Li & Dennis K.J. Lin & Bing Li, 2013. "Statistical inference in massive data sets," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 29(5), pages 399-409, September.
    3. Michael I. Jordan & Jason D. Lee & Yun Yang, 2019. "Communication-Efficient Distributed Statistical Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 668-681, April.
    4. Singh, S K & Maddala, G S, 1976. "A Function for Size Distribution of Incomes," Econometrica, Econometric Society, vol. 44(5), pages 963-970, September.
    5. Qiang Sun & Wen-Xin Zhou & Jianqing Fan, 2020. "Adaptive Huber Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 254-265, January.
    6. He, Xuming & Shao, Qi-Man, 2000. "On Parameters of Increasing Dimensions," Journal of Multivariate Analysis, Elsevier, vol. 73(1), pages 120-135, April.
    7. Jianqing Fan & Quefeng Li & Yuyan Wang, 2017. "Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 247-265, January.
    8. Cun-Hui Zhang & Stephanie S. Zhang, 2014. "Confidence intervals for low dimensional parameters in high dimensional linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 217-242, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Han, Dongxiao & Huang, Jian & Lin, Yuanyuan & Shen, Guohao, 2022. "Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors," Journal of Econometrics, Elsevier, vol. 230(2), pages 416-431.
    2. Fan, Jianqing & Guo, Yongyi & Jiang, Bai, 2022. "Adaptive Huber regression on Markov-dependent data," Stochastic Processes and their Applications, Elsevier, vol. 150(C), pages 802-818.
    3. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    5. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform post selection inference for LAD regression and other z-estimation problems," CeMMAP working papers CWP74/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    6. Xing, Li-Min & Zhang, Yue-Jun, 2022. "Forecasting crude oil prices with shrinkage methods: Can nonconvex penalty and Huber loss help?," Energy Economics, Elsevier, vol. 110(C).
    7. Yaohong Yang & Lei Wang, 2023. "Communication-efficient sparse composite quantile regression for distributed data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 86(3), pages 261-283, April.
    8. Max H. Farrell, 2013. "Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations," Papers 1309.4686, arXiv.org, revised Feb 2018.
    9. Xiao, Xuan & Xu, Xingbai & Zhong, Wei, 2023. "Huber estimation for the network autoregressive model," Statistics & Probability Letters, Elsevier, vol. 203(C).
    10. Yanqin Fan & Fang Han & Wei Li & Xiao-Hua Zhou, 2019. "On rank estimators in increasing dimensions," Papers 1908.05255, arXiv.org.
    11. Fan, Yanqin & Han, Fang & Li, Wei & Zhou, Xiao-Hua, 2020. "On rank estimators in increasing dimensions," Journal of Econometrics, Elsevier, vol. 214(2), pages 379-412.
    12. Lu Lin & Feng Li, 2023. "Global debiased DC estimations for biased estimators via pro forma regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 726-758, June.
    13. Wang, Yibo & Karunamuni, Rohana J., 2022. "High-dimensional robust regression with Lq-loss functions," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    14. Yang, Shuquan & Ling, Nengxiang, 2023. "Robust projected principal component analysis for large-dimensional semiparametric factor modeling," Journal of Multivariate Analysis, Elsevier, vol. 195(C).
    15. Guillaume Lecué & Mathieu Lerasle, 2017. "Robust machine learning by median-of-means : theory and practice," Working Papers 2017-32, Center for Research in Economics and Statistics.
    16. Lu Xia & Bin Nan & Yi Li, 2023. "Debiased lasso for generalized linear models with a diverging number of covariates," Biometrics, The International Biometric Society, vol. 79(1), pages 344-357, March.
    17. Huang, Yuan & Li, Changcheng & Li, Runze & Yang, Songshan, 2022. "An overview of tests on high-dimensional means," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    18. Yuyang Liu & Pengfei Pi & Shan Luo, 2023. "A semi-parametric approach to feature selection in high-dimensional linear regression models," Computational Statistics, Springer, vol. 38(2), pages 979-1000, June.
    19. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    20. Luo, Bin & Gao, Xiaoli, 2022. "High-dimensional robust approximated M-estimators for mean regression with asymmetric data," Journal of Multivariate Analysis, Elsevier, vol. 192(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:169:y:2022:i:c:s016794732100253x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.