IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v15y2021i4d10.1007_s11634-021-00440-z.html
   My bibliography  Save this article

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Author

Listed:
  • Etienne Côme

    (COSYS/GRETTIA, Université Gustave-Eiffel)

  • Nicolas Jouvin

    (Université Paris 1 Panthéon-Sorbonne
    FP2M, CNRS FR 2036)

  • Pierre Latouche

    (FP2M, CNRS FR 2036
    Université de Paris, MAP5, CNRS)

  • Charles Bouveyron

    (Université Côte d’Azur, CNRS, Laboratoire J.A. Dieudonné
    Inria, Maasai Research Team)

Abstract

Finding a set of nested partitions of a dataset is useful to uncover relevant structure at different scales, and is often dealt with a data-dependent methodology. In this paper, we introduce a general two-step methodology for model-based hierarchical clustering. Considering the integrated classification likelihood criterion as an objective function, this work applies to every discrete latent variable models (DLVMs) where this quantity is tractable. The first step of the methodology involves maximizing the criterion with respect to the partition. Addressing the known problem of sub-optimal local maxima found by greedy hill climbing heuristics, we introduce a new hybrid algorithm based on a genetic algorithm which allows to efficiently explore the space of solutions. The resulting algorithm carefully combines and merges different solutions, and allows the joint inference of the number K of clusters as well as the clusters themselves. Starting from this natural partition, the second step of the methodology is based on a bottom-up greedy procedure to extract a hierarchy of clusters. In a Bayesian context, this is achieved by considering the Dirichlet cluster proportion prior parameter $$\alpha $$ α as a regularization term controlling the granularity of the clustering. A new approximation of the criterion is derived as a log-linear function of $$\alpha $$ α , enabling a simple functional form of the merge decision criterion. This second step allows the exploration of the clustering at coarser scales. The proposed approach is compared with existing strategies on simulated as well as real settings, and its results are shown to be particularly relevant. A reference implementation of this work is available in the R-package greed accompanying the paper.

Suggested Citation

  • Etienne Côme & Nicolas Jouvin & Pierre Latouche & Charles Bouveyron, 2021. "Hierarchical clustering with discrete latent variable models and the integrated classification likelihood," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 957-986, December.
  • Handle: RePEc:spr:advdac:v:15:y:2021:i:4:d:10.1007_s11634-021-00440-z
    DOI: 10.1007/s11634-021-00440-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00440-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00440-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wyse, Jason & Friel, Nial & Latouche, Pierre, 2017. "Inferring structure in bipartite networks using the latent blockmodel and exact ICL," Network Science, Cambridge University Press, vol. 5(1), pages 45-69, March.
    2. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    3. Eddelbuettel, Dirk & Sanderson, Conrad, 2014. "RcppArmadillo: Accelerating R with high-performance C++ linear algebra," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 1054-1063.
    4. Pablo M. Gleiser & Leon Danon, 2003. "Community Structure In Jazz," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 6(04), pages 565-573.
    5. Marco Bertoletti & Nial Friel & Riccardo Rastelli, 2015. "Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion," METRON, Springer;Sapienza Università di Roma, vol. 73(2), pages 177-199, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Marino, Maria Francesca & Pandolfi, Silvia, 2022. "Hybrid maximum likelihood inference for stochastic block models," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wilson J. Wright & Peter N. Neitlich & Alyssa E. Shiel & Mevin B. Hooten, 2022. "Mechanistic spatial models for heavy metal pollution," Environmetrics, John Wiley & Sons, Ltd., vol. 33(8), December.
    2. Zhang, Wen-Yao & Wei, Zong-Wen & Wang, Bing-Hong & Han, Xiao-Pu, 2016. "Measuring mixing patterns in complex networks by Spearman rank correlation coefficient," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 451(C), pages 440-450.
    3. Shen Liu & Hongyan Liu, 2021. "Tagging Items Automatically Based on Both Content Information and Browsing Behaviors," INFORMS Journal on Computing, INFORMS, vol. 33(3), pages 882-897, July.
    4. Zhang, Yun & Liu, Yongguo & Li, Jieting & Zhu, Jiajing & Yang, Changhong & Yang, Wen & Wen, Chuanbiao, 2020. "WOCDA: A whale optimization based community detection algorithm," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 539(C).
    5. Rezvanian, Alireza & Meybodi, Mohammad Reza, 2015. "Sampling social networks using shortest paths," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 254-268.
    6. Kong, Hanzhang & Kang, Qinma & Li, Wenquan & Liu, Chao & Kang, Yunfan & He, Hong, 2019. "A hybrid iterated carousel greedy algorithm for community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 536(C).
    7. Loaiza-Maya, Rubén & Smith, Michael Stanley & Nott, David J. & Danaher, Peter J., 2022. "Fast and accurate variational inference for models with many latent variables," Journal of Econometrics, Elsevier, vol. 230(2), pages 339-362.
    8. Yuan, Quan & Liu, Binghui, 2021. "Community detection via an efficient nonconvex optimization approach based on modularity," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    9. Xing Qin & Shuangge Ma & Mengyun Wu, 2023. "Two‐level Bayesian interaction analysis for survival data incorporating pathway information," Biometrics, The International Biometric Society, vol. 79(3), pages 1761-1774, September.
    10. Youngseon Lee & Seongil Jo & Jaeyong Lee, 2022. "A variational inference for the Lévy adaptive regression with multiple kernels," Computational Statistics, Springer, vol. 37(5), pages 2493-2515, November.
    11. Xinyu Huang & Dongming Chen & Dongqi Wang & Tao Ren, 2020. "MINE: Identifying Top- k Vital Nodes in Complex Networks via Maximum Influential Neighbors Expansion," Mathematics, MDPI, vol. 8(9), pages 1-25, August.
    12. Nathaniel Tomasetti & Catherine Forbes & Anastasios Panagiotelis, 2019. "Updating Variational Bayes: Fast Sequential Posterior Inference," Monash Econometrics and Business Statistics Working Papers 13/19, Monash University, Department of Econometrics and Business Statistics.
    13. Bachoc, François & Genton, Mark G. & Nordhausen, Klaus & Ruiz-Gazen, Anne & Virta, Joni, 2019. "Spatial Blind Source Separation," TSE Working Papers 19-998, Toulouse School of Economics (TSE).
    14. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    15. Fatemi, Samira & Salehi, Mostafa & Veisi, Hadi & Jalili, Mahdi, 2018. "A fuzzy logic based estimator for respondent driven sampling of complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 42-51.
    16. Zhao, Shuying & Sun, Shaowei, 2023. "Identification of node centrality based on Laplacian energy of networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).
    17. Djohan Bonnet & Tifenn Hirtzlin & Atreya Majumdar & Thomas Dalgaty & Eduardo Esmanhotto & Valentina Meli & Niccolo Castellani & Simon Martin & Jean-François Nodin & Guillaume Bourgeois & Jean-Michel P, 2023. "Bringing uncertainty quantification to the extreme-edge with memristor-based Bayesian neural networks," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    18. Ho, Paul, 2023. "Global robust Bayesian analysis in large models," Journal of Econometrics, Elsevier, vol. 235(2), pages 608-642.
    19. Liang, Xinbin & Liu, Zhuoxuan & Wang, Jie & Jin, Xinqiao & Du, Zhimin, 2023. "Uncertainty quantification-based robust deep learning for building energy systems considering distribution shift problem," Applied Energy, Elsevier, vol. 337(C).
    20. Zhe Li & Xinyu Huang, 2023. "Identifying Influential Spreaders Using Local Information," Mathematics, MDPI, vol. 11(6), pages 1-14, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:15:y:2021:i:4:d:10.1007_s11634-021-00440-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.