IDEAS home Printed from https://ideas.repec.org/a/taf/lstaxx/v46y2017i23p11635-11656.html
   My bibliography  Save this article

Model-based clustering of Gaussian copulas for mixed data

Author

Listed:
  • Matthieu Marbac
  • Christophe Biernacki
  • Vincent Vandewalle

Abstract

Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features.

Suggested Citation

  • Matthieu Marbac & Christophe Biernacki & Vincent Vandewalle, 2017. "Model-based clustering of Gaussian copulas for mixed data," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 46(23), pages 11635-11656, December.
  • Handle: RePEc:taf:lstaxx:v:46:y:2017:i:23:p:11635-11656
    DOI: 10.1080/03610926.2016.1277753
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/03610926.2016.1277753
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/03610926.2016.1277753?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Selosse, Margot & Jacques, Julien & Biernacki, Christophe, 2020. "Model-based co-clustering for mixed type data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    2. Fuchs, Sebastian & Di Lascio, F. Marta L. & Durante, Fabrizio, 2021. "Dissimilarity functions for rank-invariant hierarchical clustering of continuous variables," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    3. Christophe Biernacki & Matthieu Marbac & Vincent Vandewalle, 2021. "Gaussian-Based Visualization of Gaussian and Non-Gaussian-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 129-157, April.
    4. Mazo, Gildas & Averyanov, Yaroslav, 2019. "Constraining kernel estimators in semiparametric copula mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 138(C), pages 170-189.
    5. L. L. Henn, 2022. "Limitations and performance of three approaches to Bayesian inference for Gaussian copula regression models of discrete data," Computational Statistics, Springer, vol. 37(2), pages 909-946, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:lstaxx:v:46:y:2017:i:23:p:11635-11656. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/lsta .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.