IDEAS home Printed from https://ideas.repec.org/p/ven/wpaper/201304.html
   My bibliography  Save this paper

A merging algorithm for Gaussian mixture components

Author

Listed:
  • Andrea Pastore

    (Department of Economics, University Of Venice C� Foscari)

  • Stefano Tonellato

    (Department of Economics, University Of Venice C� Foscari)

Abstract

In finite mixture model clustering, each component of the fitted mixture is usually associated with a cluster. In other words, each component of the mixture is interpreted as the probability distribution of the variables of interest conditionally on the membership to a given cluster. The Gaussian mixture model (GMM) is very popular in this context for its simplicity and flexibility. It may happen, however, that the components of the fitted model are not well separated. In such a circumstance, the number of clusters is often overestimated and a better clustering could be obtained by joining some subsets of the partition based on the fitted GMM. Some methods for the aggregation of mixture components have been recently proposed in the literature. In this work, we propose a hierarchical aggregation algorithm based on a generalisation of the definition of silhouette-width taking into account the Mahalanobis distances induced by the precison matrices of the components of the fitted GMM. The algorithm chooses the number of groups corresponding to the hierarchy level giving rise to the highest average-silhouette-width. Some simulation experiments and real data applications indicate that its performance is at least as good as the one of other existing methods.

Suggested Citation

  • Andrea Pastore & Stefano Tonellato, 2013. "A merging algorithm for Gaussian mixture components," Working Papers 2013:04, Department of Economics, University of Venice "Ca' Foscari".
  • Handle: RePEc:ven:wpaper:2013:04
    as

    Download full text from publisher

    File URL: http://www.unive.it/pag/fileadmin/user_upload/dipartimenti/economia/doc/Pubblicazioni_scientifiche/working_papers/2013/WP_DSE_pastore_tonellato_04_13.pdf
    File Function: First version, 2013
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Fraley C. & Raftery A.E., 2002. "Model-Based Clustering, Discriminant Analysis, and Density Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 611-631, June.
    2. Christian Hennig, 2010. "Methods for merging Gaussian mixture components," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(1), pages 3-34, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kim, Daeyoung & Seo, Byungtae, 2014. "Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 100-120.
    2. Coffey, N. & Hinde, J. & Holian, E., 2014. "Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 14-29.
    3. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    4. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    5. Pourahmadi, Mohsen & Daniels, Michael J. & Park, Trevor, 2007. "Simultaneous modelling of the Cholesky decomposition of several covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 98(3), pages 568-587, March.
    6. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    7. Scrucca, Luca, 2011. "Model-based SIR for dimension reduction," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 3010-3026, November.
    8. Semhar Michael & Volodymyr Melnykov, 2016. "An effective strategy for initializing the EM algorithm in finite mixture models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 563-583, December.
    9. Di Zio, Marco & Guarnera, Ugo & Luzi, Orietta, 2007. "Imputation through finite Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5305-5316, July.
    10. Sylvia Frühwirth‐Schnatter & Christoph Pamminger & Andrea Weber & Rudolf Winter‐Ebmer, 2012. "Labor market entry and earnings dynamics: Bayesian inference using mixtures‐of‐experts Markov chain clustering," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 27(7), pages 1116-1137, November.
    11. Montanari, Angela & Viroli, Cinzia, 2011. "Maximum likelihood estimation of mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2712-2723, September.
    12. Giovanna Devetag & Sibilla Guida & Luca Polonio, 2016. "An eye-tracking study of feature-based choice in one-shot games," Experimental Economics, Springer;Economic Science Association, vol. 19(1), pages 177-201, March.
    13. Minjung Kyung & Ju-Hyun Park & Ji Yeh Choi, 2022. "Bayesian Mixture Model of Extended Redundancy Analysis," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 946-966, September.
    14. Wu, Han-Ming, 2011. "On biological validity indices for soft clustering algorithms for gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1969-1979, May.
    15. Marek Śmieja & Magdalena Wiercioch, 2017. "Constrained clustering with a complex cluster structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 493-518, September.
    16. So Pyay Thar & Thiagarajah Ramilan & Robert J. Farquharson & Deli Chen, 2021. "Identifying Potential for Decision Support Tools through Farm Systems Typology Analysis Coupled with Participatory Research: A Case for Smallholder Farmers in Myanmar," Agriculture, MDPI, vol. 11(6), pages 1-20, June.
    17. Seo, Byungtae & Kim, Daeyoung, 2012. "Root selection in normal mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2454-2470.
    18. repec:cte:wsrepe:ws1450804 is not listed on IDEAS
    19. De la Cruz-Mesia, Rolando & Quintana, Fernando A. & Marshall, Guillermo, 2008. "Model-based clustering for longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1441-1457, January.
    20. Henner Gimpel & Daniel Rau & Maximilian Röglinger, 2018. "Understanding FinTech start-ups – a taxonomy of consumer-oriented service offerings," Electronic Markets, Springer;IIM University of St. Gallen, vol. 28(3), pages 245-264, August.
    21. Chang, George T. & Walther, Guenther, 2007. "Clustering with mixtures of log-concave distributions," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6242-6251, August.

    More about this item

    Keywords

    similarity indices; Rand index; mixture models; bootstrap.;
    All these keywords.

    JEL classification:

    • C39 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Other
    • C46 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Specific Distributions

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ven:wpaper:2013:04. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Geraldine Ludbrook (email available below). General contact details of provider: https://edirc.repec.org/data/dsvenit.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.