A merging algorithm for Gaussian mixture components
AbstractIn finite mixture model clustering, each component of the fitted mixture is usually associated with a cluster. In other words, each component of the mixture is interpreted as the probability distribution of the variables of interest conditionally on the membership to a given cluster. The Gaussian mixture model (GMM) is very popular in this context for its simplicity and flexibility. It may happen, however, that the components of the fitted model are not well separated. In such a circumstance, the number of clusters is often overestimated and a better clustering could be obtained by joining some subsets of the partition based on the fitted GMM. Some methods for the aggregation of mixture components have been recently proposed in the literature. In this work, we propose a hierarchical aggregation algorithm based on a generalisation of the definition of silhouette-width taking into account the Mahalanobis distances induced by the precison matrices of the components of the fitted GMM. The algorithm chooses the number of groups corresponding to the hierarchy level giving rise to the highest average-silhouette-width. Some simulation experiments and real data applications indicate that its performance is at least as good as the one of other existing methods.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by Department of Economics, University of Venice "Ca' Foscari" in its series Working Papers with number 2013:04.
Length: 27 pagine
Date of creation: 2013
Date of revision:
Contact details of provider:
Postal: Cannaregio, S. Giobbe no 873 , 30121 Venezia
Web page: http://www.unive.it/dip.economia
More information through EDIRC
similarity indices; Rand index; mixture models; bootstrap.;
Find related papers by JEL classification:
- C39 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Other
- C46 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Specific Distributions
This paper has been announced in the following NEP Reports:
You can help add them by filling out this form.
reading list or among the top items on IDEAS.Access and download statisticsgeneral information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Geraldine Ludbrook).
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If references are entirely missing, you can add them using this form.
If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.
Please note that corrections may take a couple of weeks to filter through the various RePEc services.