IDEAS home Printed from https://ideas.repec.org/p/tse/wpaper/129830.html
   My bibliography  Save this paper

ICS for complex data with application to outlier detection for density data

Author

Listed:
  • Thomas-Agnan, Christine
  • Mondon, Camille
  • Trinh, Thi-Huong
  • Ruiz-Gazen, Anne

Abstract

Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.

Suggested Citation

  • Thomas-Agnan, Christine & Mondon, Camille & Trinh, Thi-Huong & Ruiz-Gazen, Anne, 2024. "ICS for complex data with application to outlier detection for density data," TSE Working Papers 24-1585, Toulouse School of Economics (TSE), revised May 2025.
  • Handle: RePEc:tse:wpaper:129830
    as

    Download full text from publisher

    File URL: https://www.tse-fr.eu/sites/default/files/TSE/documents/doc/wp/2024/wp_tse_1585.pdf
    File Function: Working Paper Version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ruiz-Gazen, Anne & Thomas-Agnan, Christine & Laurent, Thibault & Mondon, Camille, 2022. "Detecting outliers in compositional data using Invariant Coordinate Selection," TSE Working Papers 22-1320, Toulouse School of Economics (TSE).
    2. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2022. "On the usage of joint diagonalization in multivariate statistics," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Loperfido, Nicola, 2021. "Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    4. David E. Tyler & Frank Critchley & Lutz Dümbgen & Hannu Oja, 2009. "Invariant co‐ordinate selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 549-592, June.
    5. Archimbaud, Aurore & Boulfani, Feriel & Gendre, Xavier & Nordhausen, Klaus & Ruiz-Gazen, Anne & Virta, Joni, 2025. "ICS for multivariate functional anomaly detection with applications to predictive maintenance and quality control," Econometrics and Statistics, Elsevier, vol. 33(C), pages 282-303.
    6. Archimbaud, Aurore & Boulfani, Feriel & Gendre, Xavier & Nordhausen, Klaus & Ruiz-Gazen, Anne & Virta, Joni, 2025. "ICS for multivariate functional anomaly detection with applications to predictive maintenance and quality control," Econometrics and Statistics, Elsevier, vol. 33(C), pages 282-303.
    7. Dai, Wenlin & Mrkvička, Tomáš & Sun, Ying & Genton, Marc G., 2020. "Functional outlier detection and taxonomy by sequential transformations," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    8. Archimbaud, Aurore & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2018. "ICS for multivariate outlier detection with application to quality control," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 184-199.
    9. Tyler, David E., 2010. "A note on multivariate location and scatter statistics for sparse data sets," Statistics & Probability Letters, Elsevier, vol. 80(17-18), pages 1409-1413, September.
    10. Virta, Joni & Li, Bing & Nordhausen, Klaus & Oja, Hannu, 2020. "Independent component analysis for multivariate functional data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    11. J. Machalová & K. Hron & G.S. Monti, 2016. "Preprocessing of centred logratio transformed density functions using smoothing splines," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(8), pages 1419-1435, June.
    12. Thomas-Agnan, Christine & Simioni, Michel & Trinh, Thi-Huong, 2023. "Discrete and smooth scalar-on-density compositional regression for assessing the impact of climate change on rice yield in Vietnam," TSE Working Papers 23-1410, Toulouse School of Economics (TSE), revised Jun 2025.
    13. Aurore Archimbaud & Zlatko Drmac & Klaus Nordhausen & Una Radojicic & Anne Ruiz-Gazen, 2023. "Numerical Considerations and a New Implementation for Invariant Coordinate Selection," Post-Print hal-04038657, HAL.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Archimbaud, Aurore & Boulfani, Feriel & Gendre, Xavier & Nordhausen, Klaus & Ruiz-Gazen, Anne & Virta, Joni, 2025. "ICS for multivariate functional anomaly detection with applications to predictive maintenance and quality control," Econometrics and Statistics, Elsevier, vol. 33(C), pages 282-303.
    2. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2022. "On the usage of joint diagonalization in multivariate statistics," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Ruiz-Gazen, Anne & Thomas-Agnan, Christine & Laurent, Thibault & Mondon, Camille, 2022. "Detecting outliers in compositional data using Invariant Coordinate Selection," TSE Working Papers 22-1320, Toulouse School of Economics (TSE).
    4. Loperfido, Nicola, 2021. "Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    5. Fischer, Daniel & Berro, Alain & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2019. "REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit," TSE Working Papers 19-1001, Toulouse School of Economics (TSE).
    6. Dominique Guégan & Matteo Iacopini, 2018. "Nonparameteric forecasting of multivariate probability density functions," Documents de travail du Centre d'Economie de la Sorbonne 18012, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    7. Cristian Preda & Quentin Grimonprez & Vincent Vandewalle, 2021. "Categorical Functional Data Analysis. The cfda R Package," Mathematics, MDPI, vol. 9(23), pages 1-31, November.
    8. Virta, J., 2016. "One-step M-estimates of scatter and the independence property," Statistics & Probability Letters, Elsevier, vol. 110(C), pages 133-136.
    9. Moritz Herrmann & Fabian Scheipl, 2021. "A Geometric Perspective on Functional Outlier Detection," Stats, MDPI, vol. 4(4), pages 1-41, November.
    10. Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2018. "Hotelling’s T2 in separable Hilbert spaces," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 284-305.
    11. Javed, Farrukh & Loperfido, Nicola & Mazur, Stepan, 2025. "The method of moments for multivariate random sums in the Poisson-Skew-Normal case," Statistics & Probability Letters, Elsevier, vol. 219(C).
    12. Karel Hron & Jitka Machalová & Alessandra Menafoglio, 2023. "Bivariate densities in Bayes spaces: orthogonal decomposition and spline representation," Statistical Papers, Springer, vol. 64(5), pages 1629-1667, October.
    13. Ojo, Oluwasegun Taiwo & Fernández Anta, Antonio & Genton, Marc G. & Lillo Rodríguez, Rosa Elvira, 2022. "Multivariate Functional Outlier Detection using the FastMUOD Indices," DES - Working Papers. Statistics and Econometrics. WS 35665, Universidad Carlos III de Madrid. Departamento de Estadística.
    14. Zhong, Rou & Liu, Shishi & Li, Haocheng & Zhang, Jingxiao, 2022. "Robust functional principal component analysis for non-Gaussian longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    15. Nordhausen, Klaus & Oja, Hannu & Tyler, David E., 2022. "Asymptotic and bootstrap tests for subspace dimension," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    16. Peña, Daniel & Prieto, Francisco J. & Rendón, Carolina, 2014. "Independent components techniques based on kurtosis for functional data analysis," DES - Working Papers. Statistics and Econometrics. WS ws141006, Universidad Carlos III de Madrid. Departamento de Estadística.
    17. Thomas-Agnan, Christine & Simioni, Michel & Trinh, Thi-Huong, 2023. "Discrete and smooth scalar-on-density compositional regression for assessing the impact of climate change on rice yield in Vietnam," TSE Working Papers 23-1410, Toulouse School of Economics (TSE), revised Jun 2025.
    18. Dargel, Lukas & Thomas-Agnan, Christine, 2024. "Pairwise share ratio interpretations of compositional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 195(C).
    19. Taskinen, Sara & Koch, Inge & Oja, Hannu, 2012. "Robustifying principal component analysis with spatial sign vectors," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 765-774.
    20. Talská, R. & Menafoglio, A. & Machalová, J. & Hron, K. & Fišerová, E., 2018. "Compositional regression with functional response," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 66-85.

    More about this item

    Keywords

    Bayes spaces; Distributional data; Extreme weather; Functional data; Invariant coordinate selection; Outlier detection; Temperature distribution;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tse:wpaper:129830. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/tsetofr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.