IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v16y2022i4d10.1007_s11634-021-00471-6.html
   My bibliography  Save this article

The minimum weighted covariance determinant estimator for high-dimensional data

Author

Listed:
  • Jan Kalina

    (The Czech Institute of Sciences, Institute of Computer Science
    The Czech Institute of Sciences, Institute of Information Theory and Automation)

  • Jan Tichavský

    (The Czech Institute of Sciences, Institute of Computer Science)

Abstract

In a variety of diverse applications, it is very desirable to perform a robust analysis of high-dimensional measurements without being harmed by the presence of a possibly larger percentage of outlying measurements. The minimum weighted covariance determinant (MWCD) estimator, based on implicit weights assigned to individual observations, represents a promising and flexible extension of the popular minimum covariance determinant (MCD) estimator of the expectation and scatter matrix of mlutivariate data. In this work, a regularized version of the MWCD denoted as the minimum regularized weighted covariance determinant (MRWCD) estimator is proposed. At the same time, it is accompanied by an outlier detection procedure. The novel MRWCD estimator is able to outperform other available robust estimators in several simulation scenarios, especially in estimating the scatter matrix of contaminated high-dimensional data.

Suggested Citation

  • Jan Kalina & Jan Tichavský, 2022. "The minimum weighted covariance determinant estimator for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 977-999, December.
  • Handle: RePEc:spr:advdac:v:16:y:2022:i:4:d:10.1007_s11634-021-00471-6
    DOI: 10.1007/s11634-021-00471-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00471-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00471-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. DeMiguel, Victor & Martin-Utrera, Alberto & Nogales, Francisco J., 2013. "Size matters: Optimal calibration of shrinkage estimators for portfolio selection," Journal of Banking & Finance, Elsevier, vol. 37(8), pages 3018-3034.
    2. Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
    3. Couillet, Romain & McKay, Matthew, 2014. "Large dimensional analysis and optimization of robust shrinkage covariance matrix estimators," Journal of Multivariate Analysis, Elsevier, vol. 131(C), pages 99-120.
    4. Ella Roelant & Stefan Aelst & Gert Willems, 2009. "The minimum weighted covariance determinant estimator," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 70(2), pages 177-204, September.
    5. Claudio Agostinelli & Andy Leung & Victor Yohai & Ruben Zamar, 2015. "Rejoinder on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 484-488, September.
    6. Cerioli, Andrea & Farcomeni, Alessio, 2011. "Error rates for multivariate outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 544-553, January.
    7. Filzmoser, Peter & Maronna, Ricardo & Werner, Mark, 2008. "Outlier identification in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1694-1711, January.
    8. Kwangil Ro & Changliang Zou & Zhaojun Wang & Guosheng Yin, 2015. "Outlier detection for high-dimensional data," Biometrika, Biometrika Trust, vol. 102(3), pages 589-599.
    9. Schäfer Juliane & Strimmer Korbinian, 2005. "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-32, November.
    10. Claudio Agostinelli & Andy Leung & Victor Yohai & Ruben Zamar, 2015. "Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 441-461, September.
    11. Cerioli, Andrea, 2010. "Multivariate Outlier Detection With High-Breakdown Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 147-156.
    12. Andrea Cerioli & Marco Riani & Anthony C. Atkinson & Aldo Corbellini, 2018. "The power of monitoring: how to make the most of a contaminated multivariate sample," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 559-587, December.
    13. Marco Marozzi & Amitava Mukherjee & Jan Kalina, 2020. "Interpoint distance tests for high-dimensional comparison studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(4), pages 653-665, March.
    14. Todorov, Valentin & Filzmoser, Peter, 2009. "An Object-Oriented Framework for Robust Multivariate Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 32(i03).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Archimbaud, Aurore & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2018. "ICS for multivariate outlier detection with application to quality control," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 184-199.
    2. Cerioli, Andrea & Farcomeni, Alessio & Riani, Marco, 2013. "Robust distances for outlier-free goodness-of-fit testing," Computational Statistics & Data Analysis, Elsevier, vol. 65(C), pages 29-45.
    3. Pokojovy, Michael & Jobe, J. Marcus, 2022. "A robust deterministic affine-equivariant algorithm for multivariate location and scatter," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    4. P. Navarro-Esteban & J. A. Cuesta-Albertos, 2021. "High-dimensional outlier detection using random projections," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 908-934, December.
    5. Henry Velasco & Henry Laniado & Mauricio Toro & Víctor Leiva & Yuhlong Lio, 2020. "Robust Three-Step Regression Based on Comedian and Its Performance in Cell-Wise and Case-Wise Outliers," Mathematics, MDPI, vol. 8(8), pages 1-18, August.
    6. Silvia Salini & Andrea Cerioli & Fabrizio Laurini & Marco Riani, 2016. "Reliable Robust Regression Diagnostics," International Statistical Review, International Statistical Institute, vol. 84(1), pages 99-127, April.
    7. Elisa Cabana & Rosa E. Lillo & Henry Laniado, 2021. "Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators," Statistical Papers, Springer, vol. 62(4), pages 1583-1609, August.
    8. Leung, Andy & Zhang, Hongyang & Zamar, Ruben, 2016. "Robust regression estimation and inference in the presence of cellwise and casewise contamination," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 1-11.
    9. Stephane Heritier & Maria-Pia Victoria-Feser, 2018. "Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco Riani, Anthony C. Atkinson and Aldo Corbellini," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 595-602, December.
    10. Hannart, Alexis & Naveau, Philippe, 2014. "Estimating high dimensional covariance matrices: A new look at the Gaussian conjugate framework," Journal of Multivariate Analysis, Elsevier, vol. 131(C), pages 149-162.
    11. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    12. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    13. Avagyan, Vahe & Alonso Fernández, Andrés Modesto & Nogales, Francisco J., 2015. "D-trace Precision Matrix Estimation Using Adaptive Lasso Penalties," DES - Working Papers. Statistics and Econometrics. WS 21775, Universidad Carlos III de Madrid. Departamento de Estadística.
    14. Füss, Roland & Miebs, Felix & Trübenbach, Fabian, 2014. "A jackknife-type estimator for portfolio revision," Journal of Banking & Finance, Elsevier, vol. 43(C), pages 14-28.
    15. Christian Bongiorno, 2020. "Bootstraps Regularize Singular Correlation Matrices," Working Papers hal-02536278, HAL.
    16. van Wieringen, Wessel N. & Stam, Koen A. & Peeters, Carel F.W. & van de Wiel, Mark A., 2020. "Updating of the Gaussian graphical model through targeted penalized estimation," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    17. Alessio Farcomeni & Antonio Punzo, 2020. "Robust model-based clustering with mild and gross outliers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 989-1007, December.
    18. Christophe Croux & Viktoria Öllerer, 2015. "Comments on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 462-466, September.
    19. Zongliang Hu & Zhishui Hu & Kai Dong & Tiejun Tong & Yuedong Wang, 2021. "A shrinkage approach to joint estimation of multiple covariance matrices," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(3), pages 339-374, April.
    20. Giovanni Saraceno & Claudio Agostinelli & Luca Greco, 2021. "Robust estimation for multivariate wrapped models," METRON, Springer;Sapienza Università di Roma, vol. 79(2), pages 225-240, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:16:y:2022:i:4:d:10.1007_s11634-021-00471-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.