IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v174y2022ics0167947322000767.html
   My bibliography  Save this article

Outlier detection in multivariate functional data through a contaminated mixture model

Author

Listed:
  • Amovin-Assagba, Martial
  • Gannaz, Irène
  • Jacques, Julien

Abstract

In an industrial context, the activity of sensors is recorded at a high frequency. A challenge is to automatically detect abnormal measurement behavior. Considering the sensor measures as functional data, the problem can be formulated as the detection of outliers in a multivariate functional data set. Due to the heterogeneity of this data set, the proposed contaminated mixture model both clusters the multivariate functional data into homogeneous groups and detects outliers. The main advantage of this procedure over its competitors is that it does not require to specify the proportion of outliers. Model inference is performed through an Expectation-Conditional Maximization algorithm, and the BIC is used to select the number of clusters. Numerical experiments on simulated data demonstrate the high performance achieved by the inference algorithm. In particular, the proposed model outperforms the competitors. Its application on the real data which motivated this study allows to correctly detect abnormal behaviors.

Suggested Citation

  • Amovin-Assagba, Martial & Gannaz, Irène & Jacques, Julien, 2022. "Outlier detection in multivariate functional data through a contaminated mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
  • Handle: RePEc:eee:csdana:v:174:y:2022:i:c:s0167947322000767
    DOI: 10.1016/j.csda.2022.107496
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322000767
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107496?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Di Zio, Marco & Guarnera, Ugo & Rocci, Roberto, 2007. "A mixture of mixture models for a classification problem: The unity measure error," Computational Statistics & Data Analysis, Elsevier, vol. 51(5), pages 2573-2585, February.
    2. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    3. Antonio Punzo & Angelo Mazza & Antonello Maruotti, 2018. "Fitting insurance and economic data with outliers: a flexible approach based on finite mixtures of contaminated gamma distributions," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(14), pages 2563-2584, October.
    4. Anastasios Bellas & Charles Bouveyron & Marie Cottrell & Jérôme Lacaille, 2013. "Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 281-300, September.
    5. Luis Angel Garcia-Escudero & Alfonso Gordaliza, 2005. "A Proposal for Robust Curve Clustering," Journal of Classification, Springer;The Classification Society, vol. 22(2), pages 185-201, September.
    6. Antonio Cuevas & Manuel Febrero & Ricardo Fraiman, 2007. "Robust estimation and classification for functional data via projection-based depth notions," Computational Statistics, Springer, vol. 22(3), pages 481-496, September.
    7. Jacques, Julien & Preda, Cristian, 2014. "Model-based clustering for multivariate functional data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 92-106.
    8. Julien Jacques & Cristian Preda, 2014. "Functional data clustering: a survey," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(3), pages 231-255, September.
    9. Giorgino, Toni, 2009. "Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 31(i07).
    10. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    11. Ricardo Fraiman & Graciela Muniz, 2001. "Trimmed means for functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 10(2), pages 419-440, December.
    12. Salvatore D. Tomarchio & Antonio Punzo, 2020. "Dichotomous unimodal compound models: application to the distribution of insurance losses," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(13-15), pages 2328-2353, November.
    13. C. Abraham & P. A. Cornillon & E. Matzner‐Løber & N. Molinari, 2003. "Unsupervised Curve Clustering using B‐Splines," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 30(3), pages 581-595, September.
    14. Heard, Nicholas A. & Holmes, Christopher C. & Stephens, David A., 2006. "A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 18-29, March.
    15. Ferraty, F. & Vieu, P., 2003. "Curves discrimination: a nonparametric functional approach," Computational Statistics & Data Analysis, Elsevier, vol. 44(1-2), pages 161-173, October.
    16. Antonio Punzo, 2019. "A new look at the inverse Gaussian distribution with applications to insurance and economic data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 46(7), pages 1260-1287, May.
    17. Charles Bouveyron & Julien Jacques, 2011. "Model-based clustering of time series in group-specific functional subspaces," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 281-300, December.
    18. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Rejoinder to ‘multivariate functional outlier detection’," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 269-277, July.
    19. Dai, Wenlin & Genton, Marc G., 2019. "Directional outlyingness for multivariate functional data," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 50-65.
    20. Amandine Schmutz & Julien Jacques & Charles Bouveyron & Laurence Chèze & Pauline Martin, 2020. "Clustering multivariate functional data in group-specific functional subspaces," Computational Statistics, Springer, vol. 35(3), pages 1101-1131, September.
    21. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2017. "Multivariate and functional classification using depth and distance," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 445-466, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Boente, Graciela & Parada, Daniela, 2023. "Robust estimation for functional quadratic regression models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    2. Golovkine, Steven & Klutchnikoff, Nicolas & Patilea, Valentin, 2022. "Clustering multivariate functional data using unsupervised binary trees," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    3. Alvarez, Agustín & Boente, Graciela & Kudraszow, Nadia, 2019. "Robust sieve estimators for functional canonical correlation analysis," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 46-62.
    4. Fabrizio Maturo & Rosanna Verde, 2023. "Supervised classification of curves via a combined use of functional data analysis and tree-based methods," Computational Statistics, Springer, vol. 38(1), pages 419-459, March.
    5. Carlo Sguera & Sara López-Pintado, 2021. "A notion of depth for sparse functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 630-649, September.
    6. Dai, Wenlin & Mrkvička, Tomáš & Sun, Ying & Genton, Marc G., 2020. "Functional outlier detection and taxonomy by sequential transformations," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    7. Fang, Kuangnan & Chen, Yuanxing & Ma, Shuangge & Zhang, Qingzhao, 2022. "Biclustering analysis of functionals via penalized fusion," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    8. Kuhnt, Sonja & Rehage, André, 2016. "An angle-based multivariate functional pseudo-depth for shape outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 325-340.
    9. Graciela Boente & Matías Salibián-Barrera, 2021. "Robust functional principal components for sparse longitudinal data," METRON, Springer;Sapienza Università di Roma, vol. 79(2), pages 159-188, August.
    10. Qiu, Zhiping & Chen, Jianwei & Zhang, Jin-Ting, 2021. "Two-sample tests for multivariate functional data with applications," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    11. Oluwasegun Taiwo Ojo & Antonio Fernández Anta & Rosa E. Lillo & Carlo Sguera, 2022. "Detecting and classifying outliers in big functional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 725-760, September.
    12. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    13. Carlo Sguera & Pedro Galeano & Rosa Lillo, 2014. "Spatial depth-based classification for functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(4), pages 725-750, December.
    14. Zhuo Qu & Wenlin Dai & Marc G. Genton, 2021. "Robust functional multivariate analysis of variance with environmental applications," Environmetrics, John Wiley & Sons, Ltd., vol. 32(1), February.
    15. Flores Díaz, Ramón Jesús & Lillo Rodríguez, Rosa Elvira & Romo, Juan, 2014. "Homogeneity test for functional data based on depth measures," DES - Working Papers. Statistics and Econometrics. WS ws140101, Universidad Carlos III de Madrid. Departamento de Estadística.
    16. Philip A. White & Alan E. Gelfand, 2021. "Multivariate functional data modeling with time-varying clustering," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 586-602, September.
    17. O. I. Traore & P. Cristini & N. Favretto-Cristini & L. Pantera & P. Vieu & S. Viguier-Pla, 2019. "Clustering acoustic emission signals by mixing two stages dimension reduction and nonparametric approaches," Computational Statistics, Springer, vol. 34(2), pages 631-652, June.
    18. Virta, Joni & Li, Bing & Nordhausen, Klaus & Oja, Hannu, 2020. "Independent component analysis for multivariate functional data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    19. Moritz Herrmann & Fabian Scheipl, 2021. "A Geometric Perspective on Functional Outlier Detection," Stats, MDPI, vol. 4(4), pages 1-41, November.
    20. Olusola Samuel Makinde, 2019. "Classification rules based on distribution functions of functional depth," Statistical Papers, Springer, vol. 60(3), pages 629-640, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:174:y:2022:i:c:s0167947322000767. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.