IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v13y2019i1d10.1007_s11634-018-0324-3.html
   My bibliography  Save this article

Finite mixture biclustering of discrete type multivariate data

Author

Listed:
  • Daniel Fernández

    (CIBERSAM
    Victoria University of Wellington)

  • Richard Arnold

    (Victoria University of Wellington)

  • Shirley Pledger

    (Victoria University of Wellington)

  • Ivy Liu

    (Victoria University of Wellington)

  • Roy Costilla

    (University of Queensland)

Abstract

Many of the methods which deal with clustering in matrices of data are based on mathematical techniques such as distance-based algorithms or matrix decomposition and eigenvalues. In general, it is not possible to use statistical inferences or select the appropriateness of a model via information criteria with these techniques because there is no underlying probability model. This article summarizes some recent model-based methodologies for matrices of binary, count, and ordinal data, which are modelled under a unified statistical framework using finite mixtures to group the rows and/or columns. The model parameter can be constructed from a linear predictor of parameters and covariates through link functions. This likelihood-based one-mode and two-mode fuzzy clustering provides maximum likelihood estimation of parameters and the options of using likelihood information criteria for model comparison. Additionally, a Bayesian approach is presented in which the parameters and the number of clusters are estimated simultaneously from their joint posterior distribution. Visualization tools focused on ordinal data, the fuzziness of the clustering structures, and analogies of various standard plots used in the multivariate analysis are presented. Finally, a set of future extensions is enumerated.

Suggested Citation

  • Daniel Fernández & Richard Arnold & Shirley Pledger & Ivy Liu & Roy Costilla, 2019. "Finite mixture biclustering of discrete type multivariate data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 117-143, March.
  • Handle: RePEc:spr:advdac:v:13:y:2019:i:1:d:10.1007_s11634-018-0324-3
    DOI: 10.1007/s11634-018-0324-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-018-0324-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-018-0324-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Damien McParland & Isobel Claire Gormley, 2016. "Model based clustering for mixed data: clustMD," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(2), pages 155-169, June.
    2. Fernández, D. & Arnold, R. & Pledger, S., 2016. "Mixture-based clustering for the ordered stereotype model," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 46-75.
    3. Rocci, Roberto & Vichi, Maurizio, 2008. "Two-mode multi-partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1984-2003, January.
    4. van Dijk, A. & van Rosmalen, J.M. & Paap, R., 2009. "A Bayesian approach to two-mode clustering," Econometric Institute Research Papers EI 2009-06, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    5. Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
    6. Eleni Matechou & Ivy Liu & Daniel Fernández & Miguel Farias & Bergljot Gjelsvik, 2016. "Biclustering Models for Two-Mode Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 611-624, September.
    7. Bohning, Dankmar & Seidel, Wilfried & Alfo, Macro & Garel, Bernard & Patilea, Valentin & Walther, Gunther, 2007. "Advances in Mixture Models," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5205-5210, July.
    8. Sylvia Frühwirth‐Schnatter & Christoph Pamminger & Andrea Weber & Rudolf Winter‐Ebmer, 2012. "Labor market entry and earnings dynamics: Bayesian inference using mixtures‐of‐experts Markov chain clustering," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 27(7), pages 1116-1137, November.
    9. Shirley Pledger, 2000. "Unified Maximum Likelihood Estimates for Closed Capture–Recapture Models Using Mixtures," Biometrics, The International Biometric Society, vol. 56(2), pages 434-442, June.
    10. Sugar, Catherine A. & James, Gareth M., 2003. "Finding the Number of Clusters in a Dataset: An Information-Theoretic Approach," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 750-763, January.
    11. Francesco Bartolucci & Silvia Bacci & Fulvia Pennoni, 2014. "Longitudinal analysis of self-reported health status by mixture latent auto-regressive models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(2), pages 267-288, February.
    12. Cai, Jing-Heng & Song, Xin-Yuan & Lam, Kwok-Hap & Ip, Edward Hak-Sing, 2011. "A mixture of generalized latent variable models for mixed mode and heterogeneous data," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2889-2907, November.
    13. Pledger, Shirley & Arnold, Richard, 2014. "Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 241-261.
    14. Richard Arnold & Yu Hayakawa & Paul Yip, 2010. "Capture–Recapture Estimation Using Finite Mixtures of Arbitrary Dimension," Biometrics, The International Biometric Society, vol. 66(2), pages 644-655, June.
    15. repec:dau:papers:123456789/1906 is not listed on IDEAS
    16. repec:dau:papers:123456789/6040 is not listed on IDEAS
    17. Stephen Johnson, 1967. "Hierarchical clustering schemes," Psychometrika, Springer;The Psychometric Society, vol. 32(3), pages 241-254, September.
    18. D. Fernández & S. Pledger, 2016. "Categorising Count Data into Ordinal Responses with Application to Ecological Communities," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(2), pages 348-362, June.
    19. Raftery, Adrian E. & Dean, Nema, 2006. "Variable Selection for Model-Based Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 168-178, March.
    20. Frydman, Halina, 2005. "Estimation in the Mixture of Markov Chains Moving With Different Speeds," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1046-1053, September.
    21. Chris Fraley & Adrian E. Raftery, 2007. "Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 24(2), pages 155-181, September.
    22. G. J. McLachlan, 1987. "On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 36(3), pages 318-324, November.
    23. Volodymyr Melnykov, 2013. "Finite mixture modelling in mass spectrometry analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(4), pages 573-592, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fernández, D. & Arnold, R. & Pledger, S., 2016. "Mixture-based clustering for the ordered stereotype model," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 46-75.
    2. Roy Costilla & Ivy Liu & Richard Arnold & Daniel Fernández, 2019. "Bayesian model-based clustering for longitudinal ordinal data," Computational Statistics, Springer, vol. 34(3), pages 1015-1038, September.
    3. Daniel Fernández & Radim J. Sram & Miroslav Dostal & Anna Pastorkova & Hans Gmuender & Hyunok Choi, 2018. "Modeling Unobserved Heterogeneity in Susceptibility to Ambient Benzo[ a ]pyrene Concentration among Children with Allergic Asthma Using an Unsupervised Learning Algorithm," IJERPH, MDPI, vol. 15(1), pages 1-18, January.
    4. Álvarez de Toledo, Pablo & Núñez, Fernando & Usabiaga, Carlos, 2018. "Matching and clustering in square contingency tables. Who matches with whom in the Spanish labour market," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 135-159.
    5. Tatjana Miljkovic & Daniel Fernández, 2018. "On Two Mixture-Based Clustering Approaches Used in Modeling an Insurance Portfolio," Risks, MDPI, vol. 6(2), pages 1-18, May.
    6. Pledger, Shirley & Arnold, Richard, 2014. "Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 241-261.
    7. Aurore Lomet & Gérard Govaert & Yves Grandvalet, 2018. "Model selection for Gaussian latent block clustering with the integrated classification likelihood," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 489-508, September.
    8. Eleni Matechou & Ivy Liu & Daniel Fernández & Miguel Farias & Bergljot Gjelsvik, 2016. "Biclustering Models for Two-Mode Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 611-624, September.
    9. Christian Carmona & Luis Nieto-Barajas & Antonio Canale, 2019. "Model-based approach for household clustering with mixed scale variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 559-583, June.
    10. Jacques, Julien & Biernacki, Christophe, 2018. "Model-based co-clustering for ordinal data," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 101-115.
    11. Wang, Ketong & Porter, Michael D., 2018. "Optimal Bayesian clustering using non-negative matrix factorization," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 395-411.
    12. Zhang, Q. & Ip, E.H., 2014. "Variable assessment in latent class models," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 146-156.
    13. Fang, Yixin & Wang, Junhui, 2011. "Penalized cluster analysis with applications to family data," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2128-2136, June.
    14. J. Vera & Rodrigo Macías & Willem Heiser, 2013. "Cluster Differences Unfolding for Two-Way Two-Mode Preference Rating Data," Journal of Classification, Springer;The Classification Society, vol. 30(3), pages 370-396, October.
    15. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    16. Mantas Svazas & Valentinas Navickas & Yuriy Bilan & Joanna Nakonieczny & Jana Spankova, 2021. "Biomass Clusterization from a Regional Perspective: The Case of Lithuania," Energies, MDPI, vol. 14(21), pages 1-15, October.
    17. Christophe Biernacki & Alexandre Lourme, 2019. "Unifying data units and models in (co-)clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 7-31, March.
    18. Csereklyei, Zsuzsanna & Thurner, Paul W. & Langer, Johannes & Küchenhoff, Helmut, 2017. "Energy paths in the European Union: A model-based clustering approach," Energy Economics, Elsevier, vol. 65(C), pages 442-457.
    19. Lo, Yungtai, 2011. "Bias from misspecification of the component variances in a normal mixture," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2739-2747, September.
    20. Fengqin Tang & Chunning Wang & Jinxia Su & Yuanyuan Wang, 2020. "Spectral clustering-based community detection using graph distance and node attributes," Computational Statistics, Springer, vol. 35(1), pages 69-94, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:13:y:2019:i:1:d:10.1007_s11634-018-0324-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.