IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v11y2017i3d10.1007_s11634-016-0269-3.html
   My bibliography  Save this article

Multivariate and functional classification using depth and distance

Author

Listed:
  • Mia Hubert

    (KU Leuven)

  • Peter Rousseeuw

    (KU Leuven)

  • Pieter Segaert

    (KU Leuven)

Abstract

We construct classifiers for multivariate and functional data. Our approach is based on a kind of distance between data points and classes. The distance measure needs to be robust to outliers and invariant to linear transformations of the data. For this purpose we can use the bagdistance which is based on halfspace depth. It satisfies most of the properties of a norm but is able to reflect asymmetry when the class is skewed. Alternatively we can compute a measure of outlyingness based on the skew-adjusted projection depth. In either case we propose the DistSpace transform which maps each data point to the vector of its distances to all classes, followed by k-nearest neighbor (kNN) classification of the transformed data points. This combines invariance and robustness with the simplicity and wide applicability of kNN. The proposal is compared with other methods in experiments with real and simulated data.

Suggested Citation

  • Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2017. "Multivariate and functional classification using depth and distance," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 445-466, September.
  • Handle: RePEc:spr:advdac:v:11:y:2017:i:3:d:10.1007_s11634-016-0269-3
    DOI: 10.1007/s11634-016-0269-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-016-0269-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-016-0269-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Alonso, Andrés M. & Casado, David & Romo, Juan, 2012. "Supervised classification for functional data: A weighted distance approach," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2334-2346.
    2. Tatjana Lange & Karl Mosler & Pavlo Mozharovskyi, 2014. "Fast nonparametric classification based on data depth," Statistical Papers, Springer, vol. 55(1), pages 49-69, February.
    3. Andreas Christmann & Paul Fischer & Thorsten Joachims, 2002. "Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications," Computational Statistics, Springer, vol. 17(2), pages 273-287, July.
    4. Gerda Claeskens & Mia Hubert & Leen Slaets & Kaveh Vakili, 2014. "Multivariate Functional Halfspace Depth," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 411-423, March.
    5. Struyf, Anja & Rousseeuw, Peter J., 2000. "High-dimensional computation of the deepest location," Computational Statistics & Data Analysis, Elsevier, vol. 34(4), pages 415-426, October.
    6. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    7. A. Delaigle & P. Hall & N. Bathia, 2012. "Componentwise classification and clustering of functional data," Biometrika, Biometrika Trust, vol. 99(2), pages 299-313.
    8. Ruts, Ida & Rousseeuw, Peter J., 1996. "Computing depth contours of bivariate point clouds," Computational Statistics & Data Analysis, Elsevier, vol. 23(1), pages 153-168, November.
    9. Marc Hallin & Davy Paindaveine & Miroslav Siman, 2008. "Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth," Working Papers ECARES 2008_042, ULB -- Universite Libre de Bruxelles.
    10. Masse, J. C. & Theodorescu, R., 1994. "Halfplane Trimming for Bivariate Distributions," Journal of Multivariate Analysis, Elsevier, vol. 48(2), pages 188-202, February.
    11. Dyckerhoff, Rainer & Mozharovskyi, Pavlo, 2016. "Exact computation of the halfspace depth," Computational Statistics & Data Analysis, Elsevier, vol. 98(C), pages 19-30.
    12. Davy Paindaveine & Miroslav Šiman, 2012. "Computing multiple-output regression quantile regions from projection quantiles," Computational Statistics, Springer, vol. 27(1), pages 29-49, March.
    13. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
    14. Christmann, Andreas & Rousseeuw, Peter J., 1999. "Measuring overlap in logistic regression," Technical Reports 1999,25, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    15. Jun Li & Juan A. Cuesta-Albertos & Regina Y. Liu, 2012. "DD -Classifier: Nonparametric Classification Procedure Based on DD -Plot," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 737-753, June.
    16. Martin-Barragan, Belen & Lillo, Rosa & Romo, Juan, 2014. "Interpretable support vector machines for functional data," European Journal of Operational Research, Elsevier, vol. 232(1), pages 146-155.
    17. Peter J. Rousseeuw & Ida Ruts, 1996. "Bivariate Location Depth," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 45(4), pages 516-526, December.
    18. Li, Bin & Yu, Qingzhao, 2008. "Classification of functional data: A segmentation approach," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4790-4800, June.
    19. Koenker, Roger W & Bassett, Gilbert, Jr, 1978. "Regression Quantiles," Econometrica, Econometric Society, vol. 46(1), pages 33-50, January.
    20. Pigoli, Davide & Sangalli, Laura M., 2012. "Wavelets in functional data analysis: Estimation of multidimensional curves and their derivatives," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1482-1498.
    21. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Rejoinder to ‘multivariate functional outlier detection’," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 269-277, July.
    22. Hubert, Mia & Van Driessen, Katrien, 2004. "Fast and robust discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 45(2), pages 301-320, March.
    23. Anil K. Ghosh & Probal Chaudhuri, 2005. "On Maximum Depth and Related Classifiers," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(2), pages 327-350, June.
    24. Paindaveine, Davy & Šiman, Miroslav, 2012. "Computing multiple-output regression quantile regions," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 840-853.
    25. Mia Hubert & Stephan Van der Veeken, 2010. "Robust classification for skewed data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(4), pages 239-254, December.
    26. Daniel Hlubinka & Irène Gijbels & Marek Omelka & Stanislav Nagy, 2015. "Integrated data depth for smooth functions and its application in supervised classification," Computational Statistics, Springer, vol. 30(4), pages 1011-1031, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Amparo Baíllo & Javier Cárcamo & Konstantin Getman, 2019. "New distance measures for classifying X-ray astronomy data into stellar classes," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 531-557, June.
    2. Alvarez, Agustín & Boente, Graciela & Kudraszow, Nadia, 2019. "Robust sieve estimators for functional canonical correlation analysis," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 46-62.
    3. Antonio Elías & Raúl Jiménez & J. E. Yukich, 2023. "Localization processes for functional data analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 485-517, June.
    4. Vencalek, Ondrej & Pokotylo, Oleksii, 2018. "Depth-weighted Bayes classification," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 1-12.
    5. Amovin-Assagba, Martial & Gannaz, Irène & Jacques, Julien, 2022. "Outlier detection in multivariate functional data through a contaminated mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    6. Elías, Antonio & Jiménez, Raúl & Shang, Han Lin, 2022. "On projection methods for functional time series forecasting," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    7. Beatriz Sinova & Stefan Van Aelst & Pedro Terán, 2021. "M-estimators and trimmed means: from Hilbert-valued to fuzzy set-valued data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(2), pages 267-288, June.
    8. Guillermo Vinue & Irene Epifanio, 2021. "Robust archetypoids for anomaly detection in big functional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(2), pages 437-462, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    2. Xiaohui Liu & Karl Mosler & Pavlo Mozharovskyi, 2017. "Fast computation of Tukey trimmed regions and median in dimension p > 2," Working Papers 2017-71, Center for Research in Economics and Statistics.
    3. Dyckerhoff, Rainer & Mozharovskyi, Pavlo, 2016. "Exact computation of the halfspace depth," Computational Statistics & Data Analysis, Elsevier, vol. 98(C), pages 19-30.
    4. Davy Paindaveine & Germain Van Bever, 2017. "Halfspace Depths for Scatter, Concentration and Shape Matrices," Working Papers ECARES ECARES 2017-19, ULB -- Universite Libre de Bruxelles.
    5. Olusola Samuel Makinde, 2019. "Classification rules based on distribution functions of functional depth," Statistical Papers, Springer, vol. 60(3), pages 629-640, June.
    6. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    7. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    8. Francesca Ieva & Anna Maria Paganoni, 2020. "Component-wise outlier detection methods for robustifying multivariate functional samples," Statistical Papers, Springer, vol. 61(2), pages 595-614, April.
    9. Hamel, Andreas H. & Kostner, Daniel, 2022. "Computation of quantile sets for bivariate ordered data," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    10. repec:hal:spmain:info:hdl:2441/3qnaslliat80pbqa8t90240unj is not listed on IDEAS
    11. Tian, Yahui & Gel, Yulia R., 2019. "Fusing data depth with complex networks: Community detection with prior information," Computational Statistics & Data Analysis, Elsevier, vol. 139(C), pages 99-116.
    12. repec:hal:spmain:info:hdl:2441/64itsev5509q8aa5mrbhi0g0b6 is not listed on IDEAS
    13. Kotík, Lukáš & Hlubinka, Daniel, 2017. "A weighted localization of halfspace depth and its properties," Journal of Multivariate Analysis, Elsevier, vol. 157(C), pages 53-69.
    14. Xiaohui Liu & Shihua Luo & Yijun Zuo, 2020. "Some results on the computing of Tukey’s halfspace median," Statistical Papers, Springer, vol. 61(1), pages 303-316, February.
    15. Vencalek, Ondrej & Pokotylo, Oleksii, 2018. "Depth-weighted Bayes classification," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 1-12.
    16. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    17. Petrella, Lea & Raponi, Valentina, 2019. "Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 70-84.
    18. J. A. Cuesta-Albertos & M. Febrero-Bande & M. Oviedo de la Fuente, 2017. "The $$\hbox {DD}^G$$ DD G -classifier in the functional setting," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(1), pages 119-142, March.
    19. Daniel Hlubinka & Irène Gijbels & Marek Omelka & Stanislav Nagy, 2015. "Integrated data depth for smooth functions and its application in supervised classification," Computational Statistics, Springer, vol. 30(4), pages 1011-1031, December.
    20. Pavel Boček & Miroslav Šiman, 2017. "On weighted and locally polynomial directional quantile regression," Computational Statistics, Springer, vol. 32(3), pages 929-946, September.
    21. Oluwasegun Taiwo Ojo & Antonio Fernández Anta & Rosa E. Lillo & Carlo Sguera, 2022. "Detecting and classifying outliers in big functional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 725-760, September.
    22. Balcilar, Mehmet & Ozdemir, Zeynel Abidin & Ozdemir, Huseyin & Wohar, Mark E., 2020. "Transmission of US and EU Economic Policy Uncertainty Shock to Asian Economies in Bad and Good Times," IZA Discussion Papers 13274, Institute of Labor Economics (IZA).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:11:y:2017:i:3:d:10.1007_s11634-016-0269-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.