IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v65y2024i4d10.1007_s00362-023-01480-7.html
   My bibliography  Save this article

Robust optimal subsampling based on weighted asymmetric least squares

Author

Listed:
  • Min Ren

    (Qufu Normal University)

  • Shengli Zhao

    (Qufu Normal University)

  • Mingqiu Wang

    (Qufu Normal University)

  • Xinbei Zhu

    (Virginia Tech University)

Abstract

With the development of contemporary science, a large amount of generated data includes heterogeneity and outliers in the response and/or covariates. Furthermore, subsampling is an effective method to overcome the limitation of computational resources. However, when data include heterogeneity and outliers, incorrect subsampling probabilities may select inferior subdata, and statistic inference on this subdata may have a far inferior performance. Combining the asymmetric least squares and $$L_2$$ L 2 estimation, this paper proposes a double-robustness framework (DRF), which can simultaneously tackle the heterogeneity and outliers in the response and/or covariates. The Poisson subsampling is implemented based on the DRF for massive data, and a more robust probability will be derived to select the subdata. Under some regularity conditions, we establish the asymptotic properties of the subsampling estimator based on the DRF. Numerical studies and actual data demonstrate the effectiveness of the proposed method.

Suggested Citation

  • Min Ren & Shengli Zhao & Mingqiu Wang & Xinbei Zhu, 2024. "Robust optimal subsampling based on weighted asymmetric least squares," Statistical Papers, Springer, vol. 65(4), pages 2221-2251, June.
  • Handle: RePEc:spr:stpapr:v:65:y:2024:i:4:d:10.1007_s00362-023-01480-7
    DOI: 10.1007/s00362-023-01480-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-023-01480-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-023-01480-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ciuperca, Gabriela, 2021. "Variable selection in high-dimensional linear model with possibly asymmetric errors," Computational Statistics & Data Analysis, Elsevier, vol. 155(C).
    2. Yujing Shao & Lei Wang, 2022. "Optimal subsampling for composite quantile regression model in massive data," Statistical Papers, Springer, vol. 63(4), pages 1139-1161, August.
    3. Koenker,Roger, 2005. "Quantile Regression," Cambridge Books, Cambridge University Press, number 9780521845731, July.
    4. Xiong, Shifeng & Li, Guoying, 2008. "Some results on the convergence of conditional distributions," Statistics & Probability Letters, Elsevier, vol. 78(18), pages 3249-3253, December.
    5. Aigner, D J & Amemiya, Takeshi & Poirier, Dale J, 1976. "On the Estimation of Production Frontiers: Maximum Likelihood Estimation of the Parameters of a Discontinuous Density Function," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 17(2), pages 377-396, June.
    6. HaiYing Wang & Min Yang & John Stufken, 2019. "Information-Based Optimal Subdata Selection for Big Data Linear Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(525), pages 393-405, January.
    7. Lina Liao & Cheolwoo Park & Hosik Choi, 2019. "Penalized expectile regression: an alternative to penalized quantile regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(2), pages 409-438, April.
    8. Jun Yu & HaiYing Wang, 2022. "Subdata selection algorithm for linear model discrimination," Statistical Papers, Springer, vol. 63(6), pages 1883-1906, December.
    9. Pollard, David, 1991. "Asymptotics for Least Absolute Deviation Regression Estimators," Econometric Theory, Cambridge University Press, vol. 7(2), pages 186-199, June.
    10. Xiaohui Yuan & Yong Li & Xiaogang Dong & Tianqing Liu, 2022. "Optimal subsampling for composite quantile regression in big data," Statistical Papers, Springer, vol. 63(5), pages 1649-1676, October.
    11. Jun Yu & HaiYing Wang & Mingyao Ai & Huiming Zhang, 2022. "Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(537), pages 265-276, January.
    12. Koenker, Roger W & Bassett, Gilbert, Jr, 1978. "Regression Quantiles," Econometrica, Econometric Society, vol. 46(1), pages 33-50, January.
    13. Haiying Wang & Yanyuan Ma, 2021. "Optimal subsampling for quantile regression in big data," Biometrika, Biometrika Trust, vol. 108(1), pages 99-112.
    14. HaiYing Wang & Rong Zhu & Ping Ma, 2018. "Optimal Subsampling for Large Sample Logistic Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 829-844, April.
    15. Newey, Whitney K & Powell, James L, 1987. "Asymmetric Least Squares Estimation and Testing," Econometrica, Econometric Society, vol. 55(4), pages 819-847, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xing Li & Yujing Shao & Lei Wang, 2024. "Optimal subsampling for $$L_p$$ L p -quantile regression via decorrelated score," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 33(4), pages 1084-1104, December.
    2. Yue Chao & Lei Huang & Xuejun Ma & Jiajun Sun, 2024. "Optimal subsampling for modal regression in massive data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 87(4), pages 379-409, May.
    3. Baolin Chen & Shanshan Song & Yong Zhou, 2024. "Estimation and testing of expectile regression with efficient subsampling for massive data," Statistical Papers, Springer, vol. 65(9), pages 5593-5613, December.
    4. Man, Rebeka & Tan, Kean Ming & Wang, Zian & Zhou, Wen-Xin, 2024. "Retire: Robust expectile regression in high dimensions," Journal of Econometrics, Elsevier, vol. 239(2).
    5. Deng, Jiayi & Huang, Danyang & Ding, Yi & Zhu, Yingqiu & Jing, Bingyi & Zhang, Bo, 2024. "Subsampling spectral clustering for stochastic block models in large-scale networks," Computational Statistics & Data Analysis, Elsevier, vol. 189(C).
    6. Qian Yan & Hanyu Li & Chengmei Niu, 2023. "Optimal subsampling for functional quantile regression," Statistical Papers, Springer, vol. 64(6), pages 1943-1968, December.
    7. Hanji He & Jianfeng He & Liwei Zhang, 2025. "Imbalanced data sampling design based on grid boundary domain for big data," Computational Statistics, Springer, vol. 40(1), pages 27-64, January.
    8. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    9. Jun Yu & Jiaqi Liu & HaiYing Wang, 2023. "Information-based optimal subdata selection for non-linear models," Statistical Papers, Springer, vol. 64(4), pages 1069-1093, August.
    10. Jun Yu & Mingyao Ai & Zhiqiang Ye, 2024. "A review on design inspired subsampling for big data," Statistical Papers, Springer, vol. 65(2), pages 467-510, April.
    11. Bernardi, Mauro & Bottone, Marco & Petrella, Lea, 2018. "Bayesian quantile regression using the skew exponential power distribution," Computational Statistics & Data Analysis, Elsevier, vol. 126(C), pages 92-111.
    12. Kim, Joonpyo & Oh, Hee-Seok, 2020. "Pseudo-quantile functional data clustering," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    13. Kuosmanen, Timo & Zhou, Xun, 2021. "Shadow prices and marginal abatement costs: Convex quantile regression approach," European Journal of Operational Research, Elsevier, vol. 289(2), pages 666-675.
    14. Otto-Sobotka, Fabian & Salvati, Nicola & Ranalli, Maria Giovanna & Kneib, Thomas, 2019. "Adaptive semiparametric M-quantile regression," Econometrics and Statistics, Elsevier, vol. 11(C), pages 116-129.
    15. Parente, Paulo M.D.C. & Smith, Richard J., 2011. "Gel Methods For Nonsmooth Moment Indicators," Econometric Theory, Cambridge University Press, vol. 27(1), pages 74-113, February.
    16. Gabriela Ciuperca, 2022. "Real-time detection of a change-point in a linear expectile model," Statistical Papers, Springer, vol. 63(4), pages 1323-1367, August.
    17. Holger Dette & Marc Hallin & Tobias Kley & Stanislav Volgushev, 2011. "Of Copulas, Quantiles, Ranks and Spectra - An L1-Approach to Spectral Analysis," Working Papers ECARES ECARES 2011-038, ULB -- Universite Libre de Bruxelles.
    18. Halkos, George E., 2011. "Nonparametric modelling of biodiversity: Determinants of threatened species," Journal of Policy Modeling, Elsevier, vol. 33(4), pages 618-635, July.
    19. Bonaccolto, Giovanni & Caporin, Massimiliano & Maillet, Bertrand B., 2022. "Dynamic large financial networks via conditional expected shortfalls," European Journal of Operational Research, Elsevier, vol. 298(1), pages 322-336.
    20. Zhang, Feipeng & Li, Qunhua, 2017. "A continuous threshold expectile model," Computational Statistics & Data Analysis, Elsevier, vol. 116(C), pages 49-66.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:65:y:2024:i:4:d:10.1007_s00362-023-01480-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.