IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v15y2021i1d10.1007_s11634-020-00386-8.html
   My bibliography  Save this article

Efficient regularized spectral data embedding

Author

Listed:
  • Lazhar Labiod

    (LIPADE, Université de Paris)

  • Mohamed Nadif

    (LIPADE, Université de Paris)

Abstract

Data embedding (DE) or dimensionality reduction techniques are particularly well suited to embedding high-dimensional data into a space that in most cases will have just two dimensions. Low-dimensional space, in which data samples (data points) can more easily be visualized, is also often used for learning methods such as clustering. Sometimes, however, DE will identify dimensions that contribute little in terms of the clustering structures that they reveal. In this paper we look at regularized data embedding by clustering, and we propose a simultaneous learning approach for DE and clustering that reinforces the relationships between these two tasks. Our approach is based on a matrix decomposition technique for learning a spectral DE, a cluster membership matrix, and a rotation matrix that closely maps out the continuous spectral embedding, in order to obtain a good clustering solution. We compare our approach with some traditional clustering methods and perform numerical experiments on a collection of benchmark datasets to demonstrate its potential.

Suggested Citation

  • Lazhar Labiod & Mohamed Nadif, 2021. "Efficient regularized spectral data embedding," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(1), pages 99-119, March.
  • Handle: RePEc:spr:advdac:v:15:y:2021:i:1:d:10.1007_s11634-020-00386-8
    DOI: 10.1007/s11634-020-00386-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-020-00386-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-020-00386-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wei‐Chien Chang, 1983. "On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 32(3), pages 267-275, November.
    2. Gérard Govaert & Mohamed Nadif, 2018. "Mutual information, phi-squared and model-based co-clustering for contingency tables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 455-488, September.
    3. Vichi, Maurizio & Kiers, Henk A. L., 2001. "Factorial k-means analysis for two-way data," Computational Statistics & Data Analysis, Elsevier, vol. 37(1), pages 49-64, July.
    4. Vichi, Maurizio & Saporta, Gilbert, 2009. "Clustering and disjoint principal component analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3194-3208, June.
    5. Peter Schönemann, 1966. "A generalized solution of the orthogonal procrustes problem," Psychometrika, Springer;The Psychometric Society, vol. 31(1), pages 1-10, March.
    6. Aghiles Salah & Mohamed Nadif, 2019. "Directional co-clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 591-620, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alfonso Iodice D’Enza & Francesco Palumbo, 2013. "Iterative factor clustering of binary data," Computational Statistics, Springer, vol. 28(2), pages 789-807, April.
    2. Jérome SARACCO & Marie CHAVENT & Vanessa KUENTZ, 2010. "Clustering of categorical variables around latent variables," Cahiers du GREThA (2007-2019) 2010-02, Groupe de Recherche en Economie Théorique et Appliquée (GREThA).
    3. Dirk Depril & Iven Mechelen & Tom Wilderjans, 2012. "Lowdimensional Additive Overlapping Clustering," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 297-320, October.
    4. Donatella Vicari & Paolo Giordani, 2023. "CPclus: Candecomp/Parafac Clustering Model for Three-Way Data," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 432-465, July.
    5. Michael C. Thrun & Alfred Ultsch, 2021. "Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 280-312, July.
    6. Yoshikazu Terada, 2015. "Strong consistency of factorial $$K$$ K -means clustering," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(2), pages 335-357, April.
    7. Paul Riverain & Simon Fossier & Mohamed Nadif, 2023. "Poisson degree corrected dynamic stochastic block model," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 135-162, March.
    8. Vanessa Kuentz-Simonet & Amaury Labenne & Tina Rambonilaza, 2017. "Using ClustOfVar to Construct Quality of Life Indicators for Vulnerability Assessment Municipality Trajectories in Southwest France from 1999 to 2009," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 131(3), pages 973-997, April.
    9. Cristina Tortora & Mireille Gettler Summa & Marina Marino & Francesco Palumbo, 2016. "Factor probabilistic distance clustering (FPDC): a new clustering method," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 441-464, December.
    10. José Fernando Romero Cañizares & Purificación Vicente Galindo & Yannis Phillis & Evangelos Grigoroudis, 2022. "Graphical sustainability analysis using disjoint biplots," Operational Research, Springer, vol. 22(2), pages 1575-1596, April.
    11. Yoshikazu Terada, 2014. "Strong Consistency of Reduced K-means Clustering," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(4), pages 913-931, December.
    12. Heungsun Hwang & Hec Montréal & William Dillon & Yoshio Takane, 2006. "An Extension of Multiple Correspondence Analysis for Identifying Heterogeneous Subgroups of Respondents," Psychometrika, Springer;The Psychometric Society, vol. 71(1), pages 161-171, March.
    13. Wang, Zihan & Daeipour, Mohamad & Xu, Hongyi, 2023. "Quantification and propagation of Aleatoric uncertainties in topological structures," Reliability Engineering and System Safety, Elsevier, vol. 233(C).
    14. Kohei Adachi & Nickolay T. Trendafilov, 2018. "Sparsest factor analysis for clustering variables: a matrix decomposition approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 559-585, September.
    15. Ahlquist, John S. & Breunig, Christian, 2009. "Country clustering in comparative political economy," MPIfG Discussion Paper 09/5, Max Planck Institute for the Study of Societies.
    16. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    17. Yannis Yatracos, 2013. "Detecting Clusters in the Data from Variance Decompositions of Its Projections," Journal of Classification, Springer;The Classification Society, vol. 30(1), pages 30-55, April.
    18. Uno, Kohei & Satomura, Hironori & Adachi, Kohei, 2016. "Fixed factor analysis with clustered factor score constraint," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 265-274.
    19. Blasius, J. & Greenacre, M. & Groenen, P.J.F. & van de Velden, M., 2009. "Special issue on correspondence analysis and related methods," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3103-3106, June.
    20. DeSarbo, Wayne S. & Selin Atalay, A. & Blanchard, Simon J., 2009. "A three-way clusterwise multidimensional unfolding procedure for the spatial representation of context dependent preferences," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3217-3230, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:15:y:2021:i:1:d:10.1007_s11634-020-00386-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.