IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v32y2015i2p145-175.html
   My bibliography  Save this article

Model-Based Clustering for Conditionally Correlated Categorical Data

Author

Listed:
  • Matthieu Marbac
  • Christophe Biernacki
  • Vincent Vandewalle

Abstract

An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model. Copyright Classification Society of North America 2015

Suggested Citation

  • Matthieu Marbac & Christophe Biernacki & Vincent Vandewalle, 2015. "Model-Based Clustering for Conditionally Correlated Categorical Data," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 145-175, July.
  • Handle: RePEc:spr:jclass:v:32:y:2015:i:2:p:145-175
    DOI: 10.1007/s00357-015-9180-4
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00357-015-9180-4
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00357-015-9180-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Beth A. Reboussin & Edward H. Ip & Mark Wolfson, 2008. "Locally dependent latent class models with covariates: an application to under‐age drinking in the USA," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 171(4), pages 877-897, October.
    2. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2009. "Variable selection in model-based clustering: A general variable role modeling," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3872-3882, September.
    3. Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
    4. Dean Harper, 1972. "Local dependence latent structure models," Psychometrika, Springer;The Psychometric Society, vol. 37(1), pages 53-59, March.
    5. Pascal Hattum & Herbert Hoijtink, 2009. "Market Segmentation Using Brand Strategy Research: Bayesian Inference with Respect to Mixtures of Log-Linear Models," Journal of Classification, Springer;The Classification Society, vol. 26(3), pages 297-328, December.
    6. Gilles Celeux & Gérard Govaert, 1991. "Clustering criteria for discrete data and latent class models," Journal of Classification, Springer;The Classification Society, vol. 8(2), pages 157-176, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mazo, Gildas, 2016. "A semiparametric and location-shift copula-based mixture model," LIDAM Discussion Papers ISBA 2016026, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    2. Gildas Mazo, 2017. "A Semiparametric and Location-Shift Copula-Based Mixture Model," Journal of Classification, Springer;The Classification Society, vol. 34(3), pages 444-464, October.
    3. Adelchi Azzalini & Giovanna Menardi, 2016. "Density-based clustering with non-continuous data," Computational Statistics, Springer, vol. 31(2), pages 771-798, June.
    4. Douglas L. Steinley, 2016. "Editorial," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 327-330, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hector E. Najera Catalan, 2017. "Multiple Deprivation, Severity and Latent Sub-Groups: Advantages of Factor Mixture Modelling for Analysing Material Deprivation," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 131(2), pages 681-700, March.
    2. Sunil Kumar & Zakir Husain & Diganta Mukherjee, 2015. "Assessing Consistency of Consumer Confidence Data using Dynamic Latent Class Analysis," Papers 1509.01215, arXiv.org.
    3. Kumar, Sunil & Husain, Zakir & Mukherjee, Diganta, 2017. "Assessing consistency of consumer confidence data using latent class analysis with time factor," Economic Analysis and Policy, Elsevier, vol. 55(C), pages 35-46.
    4. Matthieu Marbac & Christophe Biernacki & Vincent Vandewalle, 2016. "Latent class model with conditional dependency per modes to cluster categorical data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(2), pages 183-207, June.
    5. Matthieu Marbac & Mohammed Sedki & Tienne Patin, 2020. "Variable Selection for Mixed Data Clustering: Application in Human Population Genomics," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 124-142, April.
    6. Brown, Sarah & Greene, William H. & Harris, Mark N. & Taylor, Karl, 2015. "An inverse hyperbolic sine heteroskedastic latent class panel tobit model: An application to modelling charitable donations," Economic Modelling, Elsevier, vol. 50(C), pages 228-236.
    7. Shuang Zhang & Xingdong Feng, 2022. "Distributed identification of heterogeneous treatment effects," Computational Statistics, Springer, vol. 37(1), pages 57-89, March.
    8. Silvia Bianconcini, 2014. "Comments on: Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(3), pages 466-468, September.
    9. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2011. "Variable selection in model-based discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(10), pages 1374-1387, November.
    10. N. T. Longford & Pierpaolo D'Urso, 2011. "Mixture models with an improper component," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(11), pages 2511-2521, January.
    11. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    12. Zhengyi Zhou & David S. Matteson & Dawn B. Woodard & Shane G. Henderson & Athanasios C. Micheas, 2015. "A Spatio-Temporal Point Process Model for Ambulance Demand," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 6-15, March.
    13. Francisco Richter & Bart Haegeman & Rampal S. Etienne & Ernst C. Wit, 2020. "Introducing a general class of species diversification models for phylogenetic trees," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 74(3), pages 261-274, August.
    14. Nalini Ravishanker & Dipak K. Dey, 2000. "Multivariate Survival Models with a Mixture of Positive Stable Frailties," Methodology and Computing in Applied Probability, Springer, vol. 2(3), pages 293-308, September.
    15. Yasutomo Murasawa, 2020. "Measuring public inflation perceptions and expectations in the UK," Empirical Economics, Springer, vol. 59(1), pages 315-344, July.
    16. Minjung Kyung & Ju-Hyun Park & Ji Yeh Choi, 2022. "Bayesian Mixture Model of Extended Redundancy Analysis," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 946-966, September.
    17. Luigi Spezia, 2019. "Modelling covariance matrices by the trigonometric separation strategy with application to hidden Markov models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 399-422, June.
    18. Michael E. Sobel & Bengt Muthén, 2012. "Compliance Mixture Modelling with a Zero-Effect Complier Class and Missing Data," Biometrics, The International Biometric Society, vol. 68(4), pages 1037-1045, December.
    19. Sarah Brown & William Greene & Mark Harris, 2020. "A novel approach to latent class modelling: identifying the various types of body mass index individuals," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 983-1004, June.
    20. Jinsun Kim & Jiyeon Choi & Minji Park & Joong-Hyuk Min & Jong Mun Lee & Jimin Lee & Eun Hye Na & Heeseon Jang, 2022. "A Study on Identifying Priority Management Areas and Implementing Best Management Practice for Effective Management of Nonpoint Source Pollution in a Rural Watershed, Korea," Sustainability, MDPI, vol. 14(21), pages 1-22, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:32:y:2015:i:2:p:145-175. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.