IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v40y2023i2d10.1007_s00357-023-09441-3.html
   My bibliography  Save this article

A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges

Author

Listed:
  • C. Biernacki

    (Université de Lille)

  • J. Jacques

    (Université de Lyon)

  • C. Keribin

    (Université Paris-Saclay, CNRS, Inria, Laboratoire de mathématiques d’Orsay)

Abstract

Model-based co-clustering can be seen as a particularly important extension of model-based clustering. It allows for a significant reduction of both the number of rows (individuals) and columns (variables) of a data set in a parsimonious manner, and also allows interpretability of the resulting reduced data set since the meaning of the initial individuals and features is preserved. Moreover, it benefits from the rich statistical theory for both estimation and model selection. Many works have produced new advances on this topic in recent years, and this paper offers a general update of the related literature. In addition, we advocate two main messages, supported by specific research material: (1) co-clustering requires further research to fix some well-identified estimation issues, and (2) co-clustering is one of the most promising approaches for clustering in the (very) high-dimensional setting, which corresponds to the global trend in modern data sets.

Suggested Citation

  • C. Biernacki & J. Jacques & C. Keribin, 2023. "A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 332-381, July.
  • Handle: RePEc:spr:jclass:v:40:y:2023:i:2:d:10.1007_s00357-023-09441-3
    DOI: 10.1007/s00357-023-09441-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-023-09441-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-023-09441-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. Biernacki, Christophe & Chrétien, Stéphane, 2003. "Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with EM," Statistics & Probability Letters, Elsevier, vol. 61(4), pages 373-382, February.
    3. Christophe Biernacki, 2007. "Degeneracy in the Maximum Likelihood Estimation of Univariate Gaussian Mixtures for Grouped Data and Behaviour of the EM Algorithm," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 34(3), pages 569-586, September.
    4. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    5. A. S. M. Cheam & M. Marbac & P. D. McNicholas, 2017. "Model‐based clustering for spatiotemporal data on air quality monitoring," Environmetrics, John Wiley & Sons, Ltd., vol. 28(3), May.
    6. Jacques, Julien & Biernacki, Christophe, 2018. "Model-based co-clustering for ordinal data," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 101-115.
    7. Valerie Robert & Yann Vasseur & Vincent Brault, 2021. "Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 158-186, April.
    8. Selosse, Margot & Jacques, Julien & Biernacki, Christophe, 2020. "Model-based co-clustering for mixed type data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    9. Bhatia, Parmeet Singh & Iovleff, Serge & Govaert, Gérard, 2017. "blockcluster: An R Package for Model-Based Co-Clustering," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i09).
    10. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2009. "Variable selection in model-based clustering: A general variable role modeling," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3872-3882, September.
    11. Wyse, Jason & Friel, Nial & Latouche, Pierre, 2017. "Inferring structure in bipartite networks using the latent blockmodel and exact ICL," Network Science, Cambridge University Press, vol. 5(1), pages 45-69, March.
    12. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    13. Christophe Ambroise & Catherine Matias, 2012. "New consistent and asymptotically normal parameter estimates for random‐graph mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(1), pages 3-35, January.
    14. Li, Gen, 2020. "Generalized Co-clustering Analysis via Regularized Alternating Least Squares," Computational Statistics & Data Analysis, Elsevier, vol. 150(C).
    15. Govaert, Gérard & Nadif, Mohamed, 2008. "Block clustering with Bernoulli mixture models: Comparison of different approaches," Computational Statistics & Data Analysis, Elsevier, vol. 52(6), pages 3233-3245, February.
    16. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard, 2003. "Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 561-575, January.
    17. Tomoki Tokuda & Junichiro Yoshimoto & Yu Shimizu & Go Okada & Masahiro Takamura & Yasumasa Okamoto & Shigeto Yamawaki & Kenji Doya, 2017. "Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-29, October.
    18. Ingrassia, Salvatore & Rocci, Roberto, 2007. "Constrained monotone EM algorithms for finite mixture of multivariate Gaussians," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5339-5351, July.
    19. Charles Bouveyron & Laurent Bozzi & Julien Jacques & François‐Xavier Jollois, 2018. "The functional latent block model for the co‐clustering of electricity consumption curves," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(4), pages 897-915, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alessandro Casa & Charles Bouveyron & Elena Erosheva & Giovanna Menardi, 2021. "Co-clustering of Time-Dependent Data via the Shape Invariant Model," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 626-649, October.
    2. Andrews, Jeffrey L., 2018. "Addressing overfitting and underfitting in Gaussian model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 160-171.
    3. Selosse, Margot & Jacques, Julien & Biernacki, Christophe, 2020. "Model-based co-clustering for mixed type data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    4. Utkarsh J. Dang & Michael P.B. Gallaugher & Ryan P. Browne & Paul D. McNicholas, 2023. "Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 145-167, April.
    5. M. P. B. Gallaugher & C. Biernacki & P. D. McNicholas, 2023. "Parameter-wise co-clustering for high-dimensional data," Computational Statistics, Springer, vol. 38(3), pages 1597-1619, September.
    6. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    7. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    8. Goffinet, Etienne & Lebbah, Mustapha & Azzag, Hanane & Loïc, Giraldi & Coutant, Anthony, 2022. "Functional non-parametric latent block model: A multivariate time series clustering approach for autonomous driving validation," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    9. Sanjeena Subedi & Paul D. McNicholas, 2021. "A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 89-108, April.
    10. Luis Angel García-Escudero & Alfonso Gordaliza & Francesca Greselin & Salvatore Ingrassia & Agustín Mayo-Iscar, 2018. "Eigenvalues and constraints in mixture modeling: geometric and computational issues," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 203-233, June.
    11. Alex Sharp & Glen Chalatov & Ryan P. Browne, 2023. "A dual subspace parsimonious mixture of matrix normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 801-822, September.
    12. Bergé, Laurent R. & Bouveyron, Charles & Corneli, Marco & Latouche, Pierre, 2019. "The latent topic block model for the co-clustering of textual interaction data," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 247-270.
    13. Riccardo Rastelli & Michael Fop, 2020. "A stochastic block model for interaction lengths," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 485-512, June.
    14. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.
    15. Blazquez-Soriano, Amparo & Ramos-Sandoval, Rosmery, 2022. "Information transfer as a tool to improve the resilience of farmers against the effects of climate change: The case of the Peruvian National Agrarian Innovation System," Agricultural Systems, Elsevier, vol. 200(C).
    16. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    17. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    18. Volodymyr Melnykov, 2013. "Finite mixture modelling in mass spectrometry analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(4), pages 573-592, August.
    19. Seo, Byungtae & Kim, Daeyoung, 2012. "Root selection in normal mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2454-2470.
    20. Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:40:y:2023:i:2:d:10.1007_s00357-023-09441-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.