IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v64y2023i5d10.1007_s00362-022-01353-5.html
   My bibliography  Save this article

Non-asymptotic analysis and inference for an outlyingness induced winsorized mean

Author

Listed:
  • Yijun Zuo

    (Michigan State University)

Abstract

Robust estimation of a mean vector, a topic regarded as obsolete in the traditional robust statistics community, has recently surged in machine learning literature in the last decade. The latest focus is on the sub-Gaussian performance and computability of the estimators in a non-asymptotic setting. Numerous traditional robust estimators are computationally intractable, which partly contributes to the renewal of the interest in the robust mean estimation. Robust centrality estimators, however, include the trimmed mean and the sample median. The latter has the best robustness but suffers a low efficiency drawback. Trimmed mean and median of means, achieving sub-Gaussian performance have been proposed and studied in the literature. This article investigates the robustness of leading sub-Gaussian estimators of mean and reveals that none of them can resist greater than $$25\%$$ 25 % contamination in data and consequently introduces an outlyingness induced winsorized mean which has the best possible robustness (can resist up to $$50\%$$ 50 % contamination without breakdown) meanwhile achieving high efficiency. Furthermore, it has a sub-Gaussian performance for uncontaminated samples and a bounded estimation error for contaminated samples at a given confidence level in a finite sample setting. It can be computed in linear time.

Suggested Citation

  • Yijun Zuo, 2023. "Non-asymptotic analysis and inference for an outlyingness induced winsorized mean," Statistical Papers, Springer, vol. 64(5), pages 1465-1481, October.
  • Handle: RePEc:spr:stpapr:v:64:y:2023:i:5:d:10.1007_s00362-022-01353-5
    DOI: 10.1007/s00362-022-01353-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-022-01353-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-022-01353-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Qiang Sun & Wen-Xin Zhou & Jianqing Fan, 2020. "Adaptive Huber Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 254-265, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xing, Li-Min & Zhang, Yue-Jun, 2022. "Forecasting crude oil prices with shrinkage methods: Can nonconvex penalty and Huber loss help?," Energy Economics, Elsevier, vol. 110(C).
    2. Xiaowei Yang & Xinqiao Liu & Haoyu Wei, 2022. "Concentration inequalities of MLE and robust MLE," Papers 2210.09398, arXiv.org, revised Dec 2022.
    3. Han, Dongxiao & Huang, Jian & Lin, Yuanyuan & Shen, Guohao, 2022. "Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors," Journal of Econometrics, Elsevier, vol. 230(2), pages 416-431.
    4. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    5. Pan Shang & Lingchen Kong, 2021. "Regularization Parameter Selection for the Low Rank Matrix Recovery," Journal of Optimization Theory and Applications, Springer, vol. 189(3), pages 772-792, June.
    6. Xiao, Xuan & Xu, Xingbai & Zhong, Wei, 2023. "Huber estimation for the network autoregressive model," Statistics & Probability Letters, Elsevier, vol. 203(C).
    7. Neil Shephard, 2020. "An estimator for predictive regression: reliable inference for financial economics," Papers 2008.06130, arXiv.org.
    8. Wang, Yibo & Karunamuni, Rohana J., 2022. "High-dimensional robust regression with Lq-loss functions," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    9. Yang, Shuquan & Ling, Nengxiang, 2023. "Robust projected principal component analysis for large-dimensional semiparametric factor modeling," Journal of Multivariate Analysis, Elsevier, vol. 195(C).
    10. Peter Bossaerts & Shijie Huang & Nitin Yadav, 2020. "Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk," Risks, MDPI, vol. 8(4), pages 1-20, October.
    11. Yuyang Liu & Pengfei Pi & Shan Luo, 2023. "A semi-parametric approach to feature selection in high-dimensional linear regression models," Computational Statistics, Springer, vol. 38(2), pages 979-1000, June.
    12. Donggyu Kim & Minseog Oh, 2023. "Dynamic Realized Minimum Variance Portfolio Models," Papers 2310.13511, arXiv.org.
    13. Luo, Jiyu & Sun, Qiang & Zhou, Wen-Xin, 2022. "Distributed adaptive Huber regression," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    14. Qian Zhang & Xinyuan Zhao & Chao Ding, 2021. "Matrix optimization based Euclidean embedding with outliers," Computational Optimization and Applications, Springer, vol. 79(2), pages 235-271, June.
    15. Liang, Wanfeng & Wu, Yue & Ma, Xiaoyan, 2022. "Robust sparse precision matrix estimation for high-dimensional compositional data," Statistics & Probability Letters, Elsevier, vol. 184(C).
    16. Donggyu Kim & Minseok Shin, 2023. "Volatility models for stylized facts of high‐frequency financial data," Journal of Time Series Analysis, Wiley Blackwell, vol. 44(3), pages 262-279, May.
    17. Joaquim Fernando Pinto da Costa & Manuel Cabral, 2022. "Statistical Methods with Applications in Data Mining: A Review of the Most Recent Works," Mathematics, MDPI, vol. 10(6), pages 1-22, March.
    18. Elvezio Ronchetti, 2021. "The main contributions of robust statistics to statistical science and a new challenge," METRON, Springer;Sapienza Università di Roma, vol. 79(2), pages 127-135, August.
    19. Li, Kangqiang & Tang, Songqiao & Zhang, Lixin, 2022. "Robust parameter estimation of regression models under weakened moment assumptions," Statistics & Probability Letters, Elsevier, vol. 191(C).
    20. Chen, Huangyue & Kong, Lingchen & Shang, Pan & Pan, Shanshan, 2020. "Safe feature screening rules for the regularized Huber regression," Applied Mathematics and Computation, Elsevier, vol. 386(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:64:y:2023:i:5:d:10.1007_s00362-022-01353-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.