IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v161y2021ics0167947321000967.html
   My bibliography  Save this article

Robust communication-efficient distributed composite quantile regression and variable selection for massive data

Author

Listed:
  • Wang, Kangning
  • Li, Shaomin
  • Zhang, Benle

Abstract

Statistical analysis of massive data is becoming more and more common. Distributed composite quantile regression (CQR) for massive data is proposed in this paper. Specifically, the global CQR loss function is approximated by a surrogate one on the first machine, which relates to the local data only through their gradients, then the estimator is obtained on the first machine by minimizing the surrogate loss. Because the gradients of local datasets can be efficiently communicated, the communication cost is significantly reduced. In order to reduce the computational burdens, the induced smoothing method is applied. Theoretically, the resulting estimator is proved to be statistically as efficient as the global CQR estimator. What is more, as a direct application, a smooth-threshold distributed CQR estimating equations for variable selection is proposed. The new methods inherit the robustness and efficiency advantages of CQR. The promising performances of the new methods are supported by extensive numerical examples and real data analysis.

Suggested Citation

  • Wang, Kangning & Li, Shaomin & Zhang, Benle, 2021. "Robust communication-efficient distributed composite quantile regression and variable selection for massive data," Computational Statistics & Data Analysis, Elsevier, vol. 161(C).
  • Handle: RePEc:eee:csdana:v:161:y:2021:i:c:s0167947321000967
    DOI: 10.1016/j.csda.2021.107262
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321000967
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107262?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Kangning Wang & Lu Lin, 2019. "Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data," Statistical Papers, Springer, vol. 60(5), pages 1649-1676, October.
    3. Rong Jiang & Wei-Min Qian & Zhan-Gong Zhou, 2016. "Single-index composite quantile regression with heteroscedasticity and general error distributions," Statistical Papers, Springer, vol. 57(1), pages 185-203, March.
    4. Tian, Yuzhu & Zhu, Qianqian & Tian, Maozai, 2016. "Estimation of linear composite quantile regression using EM algorithm," Statistics & Probability Letters, Elsevier, vol. 117(C), pages 183-191.
    5. Bo Kai & Runze Li & Hui Zou, 2010. "Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(1), pages 49-69, January.
    6. B. M. Brown & You-Gan Wang, 2005. "Standard errors and covariance matrices for smoothed rank estimators," Biometrika, Biometrika Trust, vol. 92(1), pages 149-158, March.
    7. Michael I. Jordan & Jason D. Lee & Yun Yang, 2019. "Communication-Efficient Distributed Statistical Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 668-681, April.
    8. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    9. Yan Fan & Wolfgang Karl Härdle & Weining Wang & Lixing Zhu, 2018. "Single-Index-Based CoVaR With Very High-Dimensional Covariates," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(2), pages 212-226, April.
    10. Chen, Lanjue & Zhou, Yong, 2020. "Quantile regression in big data: A divide and conquer based strategy," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    11. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    12. Zhao, Weihua & Lian, Heng & Song, Xinyuan, 2017. "Composite quantile regression for correlated data," Computational Statistics & Data Analysis, Elsevier, vol. 109(C), pages 15-33.
    13. Masao Ueki, 2009. "A note on automatic variable selection using smooth-threshold estimating equations," Biometrika, Biometrika Trust, vol. 96(4), pages 1005-1011.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kangning Wang & Mengjie Hao & Xiaofei Sun, 2021. "Robust and efficient estimating equations for longitudinal data partial linear models and its applications," Statistical Papers, Springer, vol. 62(5), pages 2147-2168, October.
    2. Rong Jiang & Mengxian Sun, 2022. "Single-index composite quantile regression for ultra-high-dimensional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(2), pages 443-460, June.
    3. Kangning Wang & Wen Shan, 2021. "Copula and composite quantile regression-based estimating equations for longitudinal data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 441-455, June.
    4. Tian, Yuzhu & Song, Xinyuan, 2020. "Bayesian bridge-randomized penalized quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    5. Kangning Wang & Xiaofei Sun, 2020. "Efficient parameter estimation and variable selection in partial linear varying coefficient quantile regression model with longitudinal data," Statistical Papers, Springer, vol. 61(3), pages 967-995, June.
    6. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    7. Shi Chen & Wolfgang Karl Hardle & Brenda L'opez Cabrera, 2020. "Regularization Approach for Network Modeling of German Power Derivative Market," Papers 2009.09739, arXiv.org.
    8. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    9. Lai, Peng & Wang, Qihua & Lian, Heng, 2012. "Bias-corrected GEE estimation and smooth-threshold GEE variable selection for single-index models with clustered data," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 422-432.
    10. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    11. Anders Bredahl Kock, 2012. "On the Oracle Property of the Adaptive Lasso in Stationary and Nonstationary Autoregressions," CREATES Research Papers 2012-05, Department of Economics and Business Economics, Aarhus University.
    12. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.
    13. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    14. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    15. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    16. Jianqing Fan & Yang Feng & Jiancheng Jiang & Xin Tong, 2016. "Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 275-287, March.
    17. Lee, Ji Hyung & Shi, Zhentao & Gao, Zhan, 2022. "On LASSO for predictive regression," Journal of Econometrics, Elsevier, vol. 229(2), pages 322-349.
    18. Lan Wang & Yichao Wu & Runze Li, 2012. "Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 214-222, March.
    19. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    20. Jiang, Rong & Qian, Weimin & Zhou, Zhangong, 2012. "Variable selection and coefficient estimation via composite quantile regression with randomly censored data," Statistics & Probability Letters, Elsevier, vol. 82(2), pages 308-317.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:161:y:2021:i:c:s0167947321000967. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.