IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-03548865.html
   My bibliography  Save this paper

REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit

Author

Listed:
  • Daniel Fischer

    (LUKE - Natural Resources Institute Finland)

  • Alain Berro

    (IRIT-SEPIA - Système d’exploitation, systèmes répartis, de l’intergiciel à l’architecture - IRIT - Institut de recherche en informatique de Toulouse - UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse - UT2J - Université Toulouse - Jean Jaurès - UT - Université de Toulouse - UT3 - Université Toulouse III - Paul Sabatier - UT - Université de Toulouse - CNRS - Centre National de la Recherche Scientifique - Toulouse INP - Institut National Polytechnique (Toulouse) - UT - Université de Toulouse - TMBI - Toulouse Mind & Brain Institut - UT2J - Université Toulouse - Jean Jaurès - UT - Université de Toulouse - UT3 - Université Toulouse III - Paul Sabatier - UT - Université de Toulouse, UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse)

  • Klaus Nordhausen

    (TU Wien - Vienna University of Technology = Technische Universität Wien)

  • Anne Ruiz-Gazen

    (TSE-R - Toulouse School of Economics - UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse)

Abstract

The R-package REPPlab is designed to explore multivariate data sets using one-dimensional unsupervised projection pursuit. It is useful as a preprocessing step to find clusters or as an outlier detection tool for multivariate data. Except from the packages tourr and rggobi, there is no implementation of exploratory projection pursuit tools available in R. REPPlab is an R interface for the Java program EPP-lab that implements four projection indices and three biologically inspired optimization algorithms. It also proposes new tools for plotting and combining the results and specific tools for outlier detection. The functionality of the package is illustrated through some simulations and using some real data.

Suggested Citation

  • Daniel Fischer & Alain Berro & Klaus Nordhausen & Anne Ruiz-Gazen, 2021. "REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit," Post-Print hal-03548865, HAL.
  • Handle: RePEc:hal:journl:hal-03548865
    DOI: 10.1080/03610918.2019.1626880
    Note: View the original document on HAL open archive server: https://hal.science/hal-03548865
    as

    Download full text from publisher

    File URL: https://hal.science/hal-03548865/document
    Download Restriction: no

    File URL: https://libkey.io/10.1080/03610918.2019.1626880?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Wickham, Hadley & Cook, Dianne & Hofmann, Heike & Buja, Andreas, 2011. "tourr: An R Package for Exploring Multivariate Data with Projections," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i02).
    2. David E. Tyler & Frank Critchley & Lutz Dümbgen & Hannu Oja, 2009. "Invariant co‐ordinate selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 549-592, June.
    3. Huang, Bei & Cook, Dianne & Wickham, Hadley, 2012. "tourrGui: A gWidgets GUI for the Tour to Explore High-Dimensional Data Using Low-Dimensional Projections," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 49(i06).
    4. Nordhausen, Klaus & Oja, Hannu & Tyler, David E., 2008. "Tools for Exploring Multivariate Data: The Package ICS," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i06).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fischer, Daniel & Berro, Alain & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2019. "REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit," TSE Working Papers 19-1001, Toulouse School of Economics (TSE).
    2. Alashwali, Fatimah & Kent, John T., 2016. "The use of a common location measure in the invariant coordinate selection and projection pursuit," Journal of Multivariate Analysis, Elsevier, vol. 152(C), pages 145-161.
    3. Dümbgen, Lutz & Nordhausen, Klaus & Schuhmacher, Heike, 2016. "New algorithms for M-estimation of multivariate scatter and location," Journal of Multivariate Analysis, Elsevier, vol. 144(C), pages 200-217.
    4. Archimbaud, Aurore & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2018. "ICS for multivariate outlier detection with application to quality control," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 184-199.
    5. Nordhausen, Klaus & Oja, Hannu & Tyler, David E., 2022. "Asymptotic and bootstrap tests for subspace dimension," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    6. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2022. "On the usage of joint diagonalization in multivariate statistics," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    7. Valero-Mora, Pedro M. & Ledesma, Ruben, 2012. "Graphical User Interfaces for R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 49(i01).
    8. Ilmonen, Pauliina, 2013. "On asymptotic properties of the scatter matrix based estimates for complex valued independent component analysis," Statistics & Probability Letters, Elsevier, vol. 83(4), pages 1219-1226.
    9. Ursula Laa & Dianne Cook & Andreas Buja & German Valencia, 2020. "Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions," Monash Econometrics and Business Statistics Working Papers 17/20, Monash University, Department of Econometrics and Business Statistics.
    10. Ruiz-Gazen, Anne & Thomas-Agnan, Christine & Laurent, Thibault & Mondon, Camille, 2022. "Detecting outliers in compositional data using Invariant Coordinate Selection," TSE Working Papers 22-1320, Toulouse School of Economics (TSE).
    11. Nicola Loperfido, 2019. "Finite mixtures, projection pursuit and tensor rank: a triangulation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 145-173, March.
    12. Klaus Nordhausen, 2014. "On robustifying some second order blind source separation methods for nonstationary time series," Statistical Papers, Springer, vol. 55(1), pages 141-156, February.
    13. Dürre, Alexander & Vogel, Daniel & Tyler, David E., 2014. "The spatial sign covariance matrix with unknown location," Journal of Multivariate Analysis, Elsevier, vol. 130(C), pages 107-117.
    14. Jin Wang & Weihua Zhou, 2015. "Effect of kurtosis on efficiency of some multivariate medians," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 27(3), pages 331-348, September.
    15. Ursula Laa & Dianne Cook, 2020. "Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics," Computational Statistics, Springer, vol. 35(3), pages 1171-1205, September.
    16. Virta, J., 2016. "One-step M-estimates of scatter and the independence property," Statistics & Probability Letters, Elsevier, vol. 110(C), pages 133-136.
    17. Niladri Roy Chowdhury & Dianne Cook & Heike Hofmann & Mahbubul Majumder & Eun-Kyung Lee & Amy Toth, 2015. "Using visual statistical inference to better understand random class separations in high dimension, low sample size data," Computational Statistics, Springer, vol. 30(2), pages 293-316, June.
    18. Jorge M. Arevalillo & Hilario Navarro, 2021. "Skewness-Kurtosis Model-Based Projection Pursuit with Application to Summarizing Gene Expression Data," Mathematics, MDPI, vol. 9(9), pages 1-18, April.
    19. Huang, Bei & Cook, Dianne & Wickham, Hadley, 2012. "tourrGui: A gWidgets GUI for the Tour to Explore High-Dimensional Data Using Low-Dimensional Projections," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 49(i06).
    20. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2021. "On the usage of joint diagonalization in multivariate statistics," TSE Working Papers 21-1268, Toulouse School of Economics (TSE).

    More about this item

    Keywords

    Unsupervised data analysis; Projection matrix; Tribes; Projection index; Particle swarm optimization; Kurtosis; Java; Genetic algorithms;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-03548865. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.