IDEAS home Printed from https://ideas.repec.org/p/ven/wpaper/201304.html
   My bibliography  Save this paper

A merging algorithm for Gaussian mixture components

Author

Listed:
  • Andrea Pastore

    () (Department of Economics, University Of Venice C� Foscari)

  • Stefano Tonellato

    (Department of Economics, University Of Venice C� Foscari)

Abstract

In finite mixture model clustering, each component of the fitted mixture is usually associated with a cluster. In other words, each component of the mixture is interpreted as the probability distribution of the variables of interest conditionally on the membership to a given cluster. The Gaussian mixture model (GMM) is very popular in this context for its simplicity and flexibility. It may happen, however, that the components of the fitted model are not well separated. In such a circumstance, the number of clusters is often overestimated and a better clustering could be obtained by joining some subsets of the partition based on the fitted GMM. Some methods for the aggregation of mixture components have been recently proposed in the literature. In this work, we propose a hierarchical aggregation algorithm based on a generalisation of the definition of silhouette-width taking into account the Mahalanobis distances induced by the precison matrices of the components of the fitted GMM. The algorithm chooses the number of groups corresponding to the hierarchy level giving rise to the highest average-silhouette-width. Some simulation experiments and real data applications indicate that its performance is at least as good as the one of other existing methods.

Suggested Citation

  • Andrea Pastore & Stefano Tonellato, 2013. "A merging algorithm for Gaussian mixture components," Working Papers 2013:04, Department of Economics, University of Venice "Ca' Foscari".
  • Handle: RePEc:ven:wpaper:2013:04
    as

    Download full text from publisher

    File URL: http://www.unive.it/pag/fileadmin/user_upload/dipartimenti/economia/doc/Pubblicazioni_scientifiche/working_papers/2013/WP_DSE_pastore_tonellato_04_13.pdf
    File Function: First version, 2013
    Download Restriction: no

    More about this item

    Keywords

    similarity indices; Rand index; mixture models; bootstrap.;

    JEL classification:

    • C39 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Other
    • C46 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Specific Distributions

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ven:wpaper:2013:04. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Geraldine Ludbrook). General contact details of provider: http://edirc.repec.org/data/dsvenit.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.