IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v10y2016i2d10.1007_s11634-016-0238-x.html
   My bibliography  Save this article

Model based clustering for mixed data: clustMD

Author

Listed:
  • Damien McParland

    (University College Dublin)

  • Isobel Claire Gormley

    (University College Dublin)

Abstract

A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unified approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate clustMD; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering simulated mixed type data and prostate cancer patients, on whom mixed data have been recorded.

Suggested Citation

  • Damien McParland & Isobel Claire Gormley, 2016. "Model based clustering for mixed data: clustMD," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(2), pages 155-169, June.
  • Handle: RePEc:spr:advdac:v:10:y:2016:i:2:d:10.1007_s11634-016-0238-x
    DOI: 10.1007/s11634-016-0238-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-016-0238-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-016-0238-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. repec:taf:jnlasa:v:108:y:2013:i:502:p:656-665 is not listed on IDEAS
    2. Geweke, John & Keane, Michael P & Runkle, David, 1994. "Alternative Computational Approaches to Inference in the Multinomial Probit Model," The Review of Economics and Statistics, MIT Press, vol. 76(4), pages 609-632, November.
    3. Cai, Jing-Heng & Song, Xin-Yuan & Lam, Kwok-Hap & Ip, Edward Hak-Sing, 2011. "A mixture of generalized latent variable models for mixed mode and heterogeneous data," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2889-2907, November.
    4. Quinn, Kevin M., 2004. "Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses," Political Analysis, Cambridge University Press, vol. 12(4), pages 338-353.
    5. Everitt, B. S., 1988. "A finite mixture model for the clustering of mixed-mode data," Statistics & Probability Letters, Elsevier, vol. 6(5), pages 305-309, April.
    6. Bengt Muthén & Kerby Shedden, 1999. "Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm," Biometrics, The International Biometric Society, vol. 55(2), pages 463-469, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mantas Svazas & Valentinas Navickas & Yuriy Bilan & Joanna Nakonieczny & Jana Spankova, 2021. "Biomass Clusterization from a Regional Perspective: The Case of Lithuania," Energies, MDPI, vol. 14(21), pages 1-15, October.
    2. Keefe Murphy & Thomas Brendan Murphy, 2020. "Gaussian parsimonious clustering models with covariates and a noise component," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 293-325, June.
    3. Selosse, Margot & Jacques, Julien & Biernacki, Christophe, 2020. "Model-based co-clustering for mixed type data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    4. Christophe Biernacki & Alexandre Lourme, 2019. "Unifying data units and models in (co-)clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 7-31, March.
    5. Felix Mbuga & Cristina Tortora, 2021. "Spectral Clustering of Mixed-Type Data," Stats, MDPI, vol. 5(1), pages 1-11, December.
    6. Christophe Biernacki & Matthieu Marbac & Vincent Vandewalle, 2021. "Gaussian-Based Visualization of Gaussian and Non-Gaussian-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 129-157, April.
    7. Daniel Fernández & Richard Arnold & Shirley Pledger & Ivy Liu & Roy Costilla, 2019. "Finite mixture biclustering of discrete type multivariate data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 117-143, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Q. & Ip, E.H., 2014. "Variable assessment in latent class models," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 146-156.
    2. Leila Amiri & Mojtaba Khazaei & Mojtaba Ganjali, 2018. "A mixture latent variable model for modeling mixed data in heterogeneous populations and its applications," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(1), pages 95-115, January.
    3. Ranalli, Monia & Rocci, Roberto, 2017. "Mixture models for mixed-type data through a composite likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 87-102.
    4. Marco Guerra & Francesca Bassi & José G. Dias, 2020. "A Multiple-Indicator Latent Growth Mixture Model to Track Courses with Low-Quality Teaching," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 147(2), pages 361-381, January.
    5. Paleti, Rajesh, 2018. "Generalized multinomial probit Model: Accommodating constrained random parameters," Transportation Research Part B: Methodological, Elsevier, vol. 118(C), pages 248-262.
    6. Michael Lechner & Ruth Miquel & Conny Wunsch, 2011. "Long‐Run Effects Of Public Sector Sponsored Training In West Germany," Journal of the European Economic Association, European Economic Association, vol. 9(4), pages 742-784, August.
    7. Simon Hug & Tobias Schulz, 2007. "Referendums in the EU’s constitution building process," The Review of International Organizations, Springer, vol. 2(2), pages 177-218, June.
    8. Michael Gerfin & Michael Lechner, 2002. "A Microeconometric Evaluation of the Active Labour Market Policy in Switzerland," Economic Journal, Royal Economic Society, vol. 112(482), pages 854-893, October.
    9. Michael Prendergast & David Huang & Yih-Ing Hser, 2008. "Patterns of Crime and Drug Use Trajectories in Relation to Treatment Initiation and 5-Year Outcomes," Evaluation Review, , vol. 32(1), pages 59-82, February.
    10. Haaijer, Marinus E., 1996. "Predictions in conjoint choice experiments : the x-factor probit model," Research Report 96B22, University of Groningen, Research Institute SOM (Systems, Organisations and Management).
    11. Yai, Tetsuo & Iwakura, Seiji & Morichi, Shigeru, 1997. "Multinomial probit with structured covariance for route choice behavior," Transportation Research Part B: Methodological, Elsevier, vol. 31(3), pages 195-207, June.
    12. Silvia Bacci & Francesco Bartolucci & Giulia Bettin & Claudia Pigini, 2017. "A mixture growth model for migrants' remittances: An application to the German Socio-Economic Panel," Mo.Fi.R. Working Papers 145, Money and Finance Research group (Mo.Fi.R.) - Univ. Politecnica Marche - Dept. Economic and Social Sciences.
    13. Ye, Mao & Lu, Zhao-Hua & Li, Yimei & Song, Xinyuan, 2019. "Finite mixture of varying coefficient model: Estimation and component selection," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 452-474.
    14. Ricardo Smith Ramírez, 2007. "FIML estimation of treatment effect models with endogenous selection and multiple censored responses via a Monte Carlo EM Algorithm," Working papers DTE 403, CIDE, División de Economía.
    15. Ziegler, Andreas, 2002. "Simulated Classical Tests in the Multiperiod Multinomial Probit Model," ZEW Discussion Papers 02-38, ZEW - Leibniz Centre for European Economic Research.
    16. Patrick Sturgis & Louise Sullivan, 2008. "Exploring social mobility with latent trajectory groups," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 171(1), pages 65-88, January.
    17. Getachew A. Dagne, 2016. "A growth mixture Tobit model: application to AIDS studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(7), pages 1174-1185, July.
    18. Dirk Engel & Christoph M. Schmidt & Vivien Procher, 2010. "The Asymmetries of a Small World: Entry Into and Withdrawal From International Markets by French Firms," Ruhr Economic Papers 0192, Rheinisch-Westfälisches Institut für Wirtschaftsforschung, Ruhr-Universität Bochum, Universität Dortmund, Universität Duisburg-Essen.
    19. Bacci, Silvia & Bartolucci, Francesco & Pigini, Claudia & Signorelli, Marcello, 2014. "A finite mixture latent trajectory model for hirings and separations in the labor market," MPRA Paper 59730, University Library of Munich, Germany.
    20. Domanski, Adam, 2009. "Estimating Mixed Logit Recreation Demand Models With Large Choice Sets," 2009 Annual Meeting, July 26-28, 2009, Milwaukee, Wisconsin 49413, Agricultural and Applied Economics Association.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:10:y:2016:i:2:d:10.1007_s11634-016-0238-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.