IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v63y2022i5d10.1007_s00362-022-01292-1.html
   My bibliography  Save this article

Optimal subsampling for composite quantile regression in big data

Author

Listed:
  • Xiaohui Yuan

    (Jilin University
    Changchun University of Technology)

  • Yong Li

    (Jilin University)

  • Xiaogang Dong

    (Changchun University of Technology)

  • Tianqing Liu

    (Jilin University)

Abstract

The composite quantile regression (CQR) is an efficient and robust alternative to the least squares for estimating regression coefficients in a linear model. We investigate optimal subsampling for CQR with massive datasets. By establishing the consistency and asymptotic normality of the CQR estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities under the L- and A-optimality criteria. The L-optimality criterion minimizes the trace of the asymptotic variance–covariance matrix of the estimator for a linearly transformed regression parameters and the A-optimality criterion minimizes that of the estimator for regression parameters. The L-optimal subsampling probabilities is easy to implement as they do not depend on the densities of the responses given covariates. Based on the L-optimal subsampling probabilities, we propose algorithms for computing the resulting estimators and their asymptotic distributions and asymptotic optimality are established. To obtain standard errors for CQR estimators without estimating the densities of the responses given the covariates, we propose an iterative subsampling procedure based on the L-optimal subsampling probabilities. The proposed methods are illustrated through numerical experiments on simulated and real datasets.

Suggested Citation

  • Xiaohui Yuan & Yong Li & Xiaogang Dong & Tianqing Liu, 2022. "Optimal subsampling for composite quantile regression in big data," Statistical Papers, Springer, vol. 63(5), pages 1649-1676, October.
  • Handle: RePEc:spr:stpapr:v:63:y:2022:i:5:d:10.1007_s00362-022-01292-1
    DOI: 10.1007/s00362-022-01292-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-022-01292-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-022-01292-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rong Jiang & Wei-Min Qian & Zhan-Gong Zhou, 2016. "Single-index composite quantile regression with heteroscedasticity and general error distributions," Statistical Papers, Springer, vol. 57(1), pages 185-203, March.
    2. Wang, Shangshan & Xiang, Liming, 2017. "Two-layer EM algorithm for ALD mixture regression models: A new solution to composite quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 136-154.
    3. Bo Kai & Runze Li & Hui Zou, 2010. "Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(1), pages 49-69, January.
    4. Jiang, Rong & Zhou, Zhan-Gong & Qian, Wei-Min & Chen, Yong, 2013. "Two step composite quantile regression for single-index models," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 180-191.
    5. Ning, Zijun & Tang, Linjun, 2014. "Estimation and test procedures for composite quantile regression with covariates missing at random," Statistics & Probability Letters, Elsevier, vol. 95(C), pages 15-25.
    6. HaiYing Wang & Min Yang & John Stufken, 2019. "Information-Based Optimal Subdata Selection for Big Data Linear Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(525), pages 393-405, January.
    7. Jing Sun, 2020. "An improvement on the efficiency of complete-case-analysis with nonignorable missing covariate data," Computational Statistics, Springer, vol. 35(4), pages 1621-1636, December.
    8. Tang, Linjun & Zhou, Zhangong & Wu, Changchun, 2012. "Weighted composite quantile estimation and variable selection method for censored regression model," Statistics & Probability Letters, Elsevier, vol. 82(3), pages 653-663.
    9. Linjun Tang & Zhangong Zhou, 2015. "Weighted local linear CQR for varying-coefficient models with missing covariates," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 583-604, September.
    10. Yaqiong Yao & HaiYing Wang, 2019. "Optimal subsampling for softmax regression," Statistical Papers, Springer, vol. 60(2), pages 585-599, April.
    11. Haiying Wang & Yanyuan Ma, 2021. "Optimal subsampling for quantile regression in big data," Biometrika, Biometrika Trust, vol. 108(1), pages 99-112.
    12. HaiYing Wang & Rong Zhu & Ping Ma, 2018. "Optimal Subsampling for Large Sample Logistic Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 829-844, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jiang, Rong & Qian, Wei-Min & Zhou, Zhan-Gong, 2016. "Weighted composite quantile regression for single-index models," Journal of Multivariate Analysis, Elsevier, vol. 148(C), pages 34-48.
    2. Ziyang Wang & HaiYing Wang & Nalini Ravishanker, 2023. "Subsampling in Longitudinal Models," Methodology and Computing in Applied Probability, Springer, vol. 25(1), pages 1-29, March.
    3. Su, Miaomiao & Wang, Ruoyu & Wang, Qihua, 2022. "A two-stage optimal subsampling estimation for missing data problems with large-scale data," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    4. Jun Yu & Jiaqi Liu & HaiYing Wang, 2023. "Information-based optimal subdata selection for non-linear models," Statistical Papers, Springer, vol. 64(4), pages 1069-1093, August.
    5. Jun Yu & HaiYing Wang, 2022. "Subdata selection algorithm for linear model discrimination," Statistical Papers, Springer, vol. 63(6), pages 1883-1906, December.
    6. Tianzhen Wang & Haixiang Zhang, 2022. "Optimal subsampling for multiplicative regression with massive data," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 76(4), pages 418-449, November.
    7. J. Lars Kirkby & Dang H. Nguyen & Duy Nguyen & Nhu N. Nguyen, 2022. "Inversion-free subsampling Newton’s method for large sample logistic regression," Statistical Papers, Springer, vol. 63(3), pages 943-963, June.
    8. Amalan Mahendran & Helen Thompson & James M. McGree, 2023. "A model robust subsampling approach for Generalised Linear Models in big data settings," Statistical Papers, Springer, vol. 64(4), pages 1137-1157, August.
    9. Yujing Shao & Lei Wang, 2022. "Optimal subsampling for composite quantile regression model in massive data," Statistical Papers, Springer, vol. 63(4), pages 1139-1161, August.
    10. Hong-Xia Xu & Guo-Liang Fan & Zhen-Long Chen & Jiang-Feng Wang, 2018. "Weighted quantile regression and testing for varying-coefficient models with randomly truncated data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(4), pages 565-588, October.
    11. Feifei Wang & Danyang Huang & Tianchen Gao & Shuyuan Wu & Hansheng Wang, 2022. "Sequential one‐step estimator by sub‐sampling for customer churn analysis with massive data sets," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1753-1786, November.
    12. Jiang, Depeng & Zhao, Puying & Tang, Niansheng, 2016. "A propensity score adjustment method for regression models with nonignorable missing covariates," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 98-119.
    13. Jiang, Rong & Zhou, Zhan-Gong & Qian, Wei-Min & Chen, Yong, 2013. "Two step composite quantile regression for single-index models," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 180-191.
    14. Duarte, Belmiro P.M. & Atkinson, Anthony C. & Oliveira, Nuno M.C., 2024. "Using hierarchical information-theoretic criteria to optimize subsampling of extensive datasets," LSE Research Online Documents on Economics 121641, London School of Economics and Political Science, LSE Library.
    15. Weiming Yang & Yiping Yang, 2020. "Composite quantile regression estimation of linear error-in-variable models using instrumental variables," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(1), pages 1-16, January.
    16. Lu Lin & Feng Li, 2023. "Global debiased DC estimations for biased estimators via pro forma regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 726-758, June.
    17. Xie, Qichang & Sun, Qiankun, 2019. "Computation and application of robust data-driven bandwidth selection for gradient function estimation," Applied Mathematics and Computation, Elsevier, vol. 361(C), pages 274-293.
    18. Rong Jiang & Mengxian Sun, 2022. "Single-index composite quantile regression for ultra-high-dimensional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(2), pages 443-460, June.
    19. Rong Jiang & Wei-Min Qian & Jing-Ru Li, 2014. "Testing in linear composite quantile regression models," Computational Statistics, Springer, vol. 29(5), pages 1381-1402, October.
    20. Zhang, Haixiang & Wang, HaiYing, 2021. "Distributed subdata selection for big data via sampling-based approach," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:63:y:2022:i:5:d:10.1007_s00362-022-01292-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.