IDEAS home Printed from https://ideas.repec.org/a/taf/gnstxx/v25y2013i2p447-461.html
   My bibliography  Save this article

Computationally easy outlier detection via projection pursuit with finitely many directions

Author

Listed:
  • Robert Serfling
  • Satyaki Mazumder

Abstract

Outlier detection is fundamental to data analysis. Desirable properties are affine invariance, robustness, low computational burden, and nonimposition of elliptical contours. However, leading methods fail to possess all of these features. The Mahalanobis distance outlyingness (MD) imposes elliptical contours. The projection outlyingness, powerfully involving projections of the data onto all univariate directions, is highly computationally intensive. Computationally easy variants using projection pursuit with but finitely many directions have been introduced, but these fail to capture at once the other desired properties. Here, we develop a 'robust Mahalanobis spatial outlyingness on projections' (RMSP) function, which indeed satisfies all the four desired properties. Pre-transformation to a strong invariant coordinate system yields affine invariance, 'spatial trimming' yields robustness, and 'spatial Mahalanobis outlyingness' is used to obtain computational ease and smooth, unconstrained contours. From empirical study using artificial and actual data, our findings are that SUP is outclassed by MD and RMSP, that MD and RMSP are competitive, and that RMSP is especially advantageous in describing the intermediate outlyingness structure when elliptical contours are not assumed.

Suggested Citation

  • Robert Serfling & Satyaki Mazumder, 2013. "Computationally easy outlier detection via projection pursuit with finitely many directions," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 25(2), pages 447-461, June.
  • Handle: RePEc:taf:gnstxx:v:25:y:2013:i:2:p:447-461
    DOI: 10.1080/10485252.2013.766335
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/10485252.2013.766335
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/10485252.2013.766335?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Filzmoser, Peter & Maronna, Ricardo & Werner, Mark, 2008. "Outlier identification in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1694-1711, January.
    2. Lutz Dümbgen & David E. Tyler, 2005. "On the Breakdown Properties of Some Multivariate M‐Functionals," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(2), pages 247-264, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. P. Navarro-Esteban & J. A. Cuesta-Albertos, 2021. "High-dimensional outlier detection using random projections," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 908-934, December.
    2. Loperfido, Nicola, 2018. "Skewness-based projection pursuit: A computational approach," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 42-57.
    3. Wang, Shanshan & Serfling, Robert, 2018. "On masking and swamping robustness of leading nonparametric outlier identifiers for multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 166(C), pages 32-49.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    2. Thomas Triebs & Subal C. Kumbhakar, 2012. "Management Practice in Production," ifo Working Paper Series 129, ifo Institute - Leibniz Institute for Economic Research at the University of Munich.
    3. David E. Tyler & Frank Critchley & Lutz Dümbgen & Hannu Oja, 2009. "Invariant co‐ordinate selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 549-592, June.
    4. Roelant, E. & Van Aelst, S. & Croux, C., 2009. "Multivariate generalized S-estimators," Journal of Multivariate Analysis, Elsevier, vol. 100(5), pages 876-887, May.
    5. Frahm, Gabriel & Jaekel, Uwe, 2010. "A generalization of Tyler's M-estimators to the case of incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 374-393, February.
    6. Davy Paindaveine & Germain Van Bever, 2017. "Halfspace Depths for Scatter, Concentration and Shape Matrices," Working Papers ECARES ECARES 2017-19, ULB -- Universite Libre de Bruxelles.
    7. Junlong Zhao & Chao Liu & Lu Niu & Chenlei Leng, 2019. "Multiple influential point detection in high dimensional regression spaces," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 385-408, April.
    8. Van Aelst, S. & Vandervieren, E. & Willems, G., 2012. "A Stahel–Donoho estimator based on huberized outlyingness," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 531-542.
    9. C. Croux & C. Dehon & A. Yadine, 2010. "The k-step spatial sign covariance matrix," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 137-150, September.
    10. C. Chatzinakos & L. Pitsoulis & G. Zioutas, 2016. "Optimization techniques for robust multivariate location and scatter estimation," Journal of Combinatorial Optimization, Springer, vol. 31(4), pages 1443-1460, May.
    11. Chung, Hee Cheol & Ahn, Jeongyoun, 2021. "Subspace rotations for high-dimensional outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    12. Shieh Albert D & Hung Yeung Sam, 2009. "Detecting Outlier Samples in Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-24, February.
    13. Jan Kalina & Jan Tichavský, 2022. "The minimum weighted covariance determinant estimator for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 977-999, December.
    14. Stefano Marchetti & Caterina Giusti & Nicola Salvati & Monica Pratesi, 2017. "Small area estimation based on M-quantile models in presence of outliers in auxiliary variables," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 26(4), pages 531-555, November.
    15. Heewon Park & Teppei Shimamura & Satoru Miyano & Seiya Imoto, 2014. "Robust Prediction of Anti-Cancer Drug Sensitivity and Sensitivity-Specific Biomarker," PLOS ONE, Public Library of Science, vol. 9(10), pages 1-10, October.
    16. Alexander A. Aduenko & Anastasia P. Motrenko & Vadim V. Strijov, 2018. "Object selection in credit scoring using covariance matrix of parameters estimations," Annals of Operations Research, Springer, vol. 260(1), pages 3-21, January.
    17. P. Navarro-Esteban & J. A. Cuesta-Albertos, 2021. "High-dimensional outlier detection using random projections," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 908-934, December.
    18. D. Rosadi & P. Filzmoser, 2019. "Robust second-order least-squares estimation for regression models with autoregressive errors," Statistical Papers, Springer, vol. 60(1), pages 105-122, February.
    19. Boente, Graciela & Pires, Ana M. & Rodrigues, Isabel M., 2010. "Detecting influential observations in principal components and common principal components," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2967-2975, December.
    20. Francesca DE BATTISTI & Silvia SALINI, 2011. "Robust analysis of bibliometric data," Departmental Working Papers 2011-36, Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:gnstxx:v:25:y:2013:i:2:p:447-461. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/GNST20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.