IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v16y2022i3d10.1007_s11634-021-00460-9.html
   My bibliography  Save this article

Detecting and classifying outliers in big functional data

Author

Listed:
  • Oluwasegun Taiwo Ojo

    (IMDEA Networks Institute
    Universidad Carlos III de Madrid)

  • Antonio Fernández Anta

    (IMDEA Networks Institute)

  • Rosa E. Lillo

    (Universidad Carlos III de Madrid
    Universidad Carlos III de Madrid)

  • Carlo Sguera

    (Universidad Carlos III de Madrid)

Abstract

We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD detects and classifies outliers by computing for each curve, three indices, all based on the concept of linear regression and correlation, which measure outlyingness in terms of shape, magnitude and amplitude, relative to the other curves in the data. ‘Semifast-MUOD’, the first method, uses a sample of the observations in computing the indices, while ‘Fast-MUOD’, the second method, uses the point-wise or $$L_1$$ L 1 median in computing the indices. The classical boxplot is used to separate the indices of the outliers from those of the typical observations. Performance evaluation of the proposed methods using simulated data show significant improvements compared to MUOD, both in outlier detection and computational time. We show that Fast-MUOD is especially well suited to handling big and dense functional datasets with very small computational time compared to other methods. Further comparisons with some recent outlier detection methods for functional data also show superior or comparable outlier detection accuracy of the proposed methods. We apply the proposed methods on weather, population growth, and video data.

Suggested Citation

  • Oluwasegun Taiwo Ojo & Antonio Fernández Anta & Rosa E. Lillo & Carlo Sguera, 2022. "Detecting and classifying outliers in big functional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 725-760, September.
  • Handle: RePEc:spr:advdac:v:16:y:2022:i:3:d:10.1007_s11634-021-00460-9
    DOI: 10.1007/s11634-021-00460-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00460-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00460-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Heinrich Fritz & Peter Filzmoser & Christophe Croux, 2012. "A comparison of algorithms for the multivariate L 1 -median," Computational Statistics, Springer, vol. 27(3), pages 393-410, September.
    2. Febrero-Bande, Manuel & de la Fuente, Manuel Oviedo, 2012. "Statistical Computing in Functional Data Analysis: The R Package fda.usc," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i04).
    3. Gerda Claeskens & Mia Hubert & Leen Slaets & Kaveh Vakili, 2014. "Multivariate Functional Halfspace Depth," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 411-423, March.
    4. Carling, Kenneth, 2000. "Resistant outlier rules and the non-Gaussian case," Computational Statistics & Data Analysis, Elsevier, vol. 33(3), pages 249-258, May.
    5. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    6. López-Pintado, Sara & Romo, Juan, 2009. "On the Concept of Depth for Functional Data," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 718-734.
    7. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
    8. Ricardo Fraiman & Graciela Muniz, 2001. "Trimmed means for functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 10(2), pages 419-440, December.
    9. López-Pintado, Sara & Romo, Juan, 2011. "A half-region depth for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1679-1695, April.
    10. Eddelbuettel, Dirk & Francois, Romain, 2011. "Rcpp: Seamless R and C++ Integration," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i08).
    11. Naveen N. Narisetty & Vijayan N. Nair, 2016. "Extremal Depth for Functional Data and Applications," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1705-1714, October.
    12. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Rejoinder to ‘multivariate functional outlier detection’," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 269-277, July.
    13. Dai, Wenlin & Genton, Marc G., 2019. "Directional outlyingness for multivariate functional data," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 50-65.
    14. Dai, Wenlin & Mrkvička, Tomáš & Sun, Ying & Genton, Marc G., 2020. "Functional outlier detection and taxonomy by sequential transformations," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Luis Alfonso Menéndez-García & Paulino José García-Nieto & Esperanza García-Gonzalo & Fernando Sánchez Lasheras & Laura Álvarez-de-Prado & Antonio Bernardo-Sánchez, 2023. "Method for the Detection of Functional Outliers Applied to Quality Monitoring Samples in the Vicinity of El Musel Seaport in the Metropolitan Area of Gijón (Northern Spain)," Mathematics, MDPI, vol. 11(12), pages 1-23, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dai, Wenlin & Mrkvička, Tomáš & Sun, Ying & Genton, Marc G., 2020. "Functional outlier detection and taxonomy by sequential transformations," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    2. Dai, Wenlin & Genton, Marc G., 2019. "Directional outlyingness for multivariate functional data," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 50-65.
    3. Francesca Ieva & Anna Maria Paganoni, 2020. "Component-wise outlier detection methods for robustifying multivariate functional samples," Statistical Papers, Springer, vol. 61(2), pages 595-614, April.
    4. Carlo Sguera & Sara López-Pintado, 2021. "A notion of depth for sparse functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 630-649, September.
    5. Ojo, Oluwasegun Taiwo & Fernández Anta, Antonio & Genton, Marc G. & Lillo Rodríguez, Rosa Elvira, 2022. "Multivariate Functional Outlier Detection using the FastMUOD Indices," DES - Working Papers. Statistics and Econometrics. WS 35665, Universidad Carlos III de Madrid. Departamento de Estadística.
    6. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    7. Boente, Graciela & Parada, Daniela, 2023. "Robust estimation for functional quadratic regression models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    8. Nagy, Stanislav & Ferraty, Frédéric, 2019. "Data depth for measurable noisy random functions," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 95-114.
    9. Moritz Herrmann & Fabian Scheipl, 2021. "A Geometric Perspective on Functional Outlier Detection," Stats, MDPI, vol. 4(4), pages 1-41, November.
    10. Kuhnt, Sonja & Rehage, André, 2016. "An angle-based multivariate functional pseudo-depth for shape outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 325-340.
    11. Francesca Ieva & Anna Paganoni, 2015. "Discussion of “multivariate functional outlier detection” by M. Hubert, P. Rousseeuw and P. Segaert," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 217-221, July.
    12. Zhuo Qu & Wenlin Dai & Marc G. Genton, 2021. "Robust functional multivariate analysis of variance with environmental applications," Environmetrics, John Wiley & Sons, Ltd., vol. 32(1), February.
    13. Davy Paindaveine & Germain Van Bever, 2017. "Halfspace Depths for Scatter, Concentration and Shape Matrices," Working Papers ECARES ECARES 2017-19, ULB -- Universite Libre de Bruxelles.
    14. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2017. "Multivariate and functional classification using depth and distance," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 445-466, September.
    15. Antonio Elías & Raúl Jiménez & Han Lin Shang, 2023. "Depth-based reconstruction method for incomplete functional data," Computational Statistics, Springer, vol. 38(3), pages 1507-1535, September.
    16. Graciela Boente & Matías Salibián-Barrera, 2021. "Robust functional principal components for sparse longitudinal data," METRON, Springer;Sapienza Università di Roma, vol. 79(2), pages 159-188, August.
    17. Cleveland, Jason & Zhao, Weilong & Wu, Wei, 2018. "Robust template estimation for functional data with phase variability using band depth," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 10-26.
    18. Nagy, Stanislav & Gijbels, Irène & Hlubinka, Daniel, 2016. "Weak convergence of discretely observed functional data with applications," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 46-62.
    19. Guillermo Vinue & Irene Epifanio, 2021. "Robust archetypoids for anomaly detection in big functional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(2), pages 437-462, June.
    20. Qiu, Zhiping & Chen, Jianwei & Zhang, Jin-Ting, 2021. "Two-sample tests for multivariate functional data with applications," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:16:y:2022:i:3:d:10.1007_s11634-021-00460-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.