IDEAS home Printed from https://ideas.repec.org/p/ven/wpaper/201305.html
   My bibliography  Save this paper

On the comparison of model-based clustering solutions

Author

Listed:
  • Stefano Tonellato

    (Department of Economics, University Of Venice C� Foscari)

  • Andrea Pastore

    (Department of Economics, University of Venice Ca' Foscari)

Abstract

In this paper we propose a new similarity index, which can be used to compare model-based clustering solutions. We define also an adjusted-for-chance version, although we advise that, whenever feasible, bootstrap replications should be preferred to chance-corrected similarity indices. We describe the properties of the proposed index and of its chance-corrected version. Finally, we present some applications on simulated and real data.

Suggested Citation

  • Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
  • Handle: RePEc:ven:wpaper:2013:05
    as

    Download full text from publisher

    File URL: http://www.unive.it/pag/fileadmin/user_upload/dipartimenti/economia/doc/Pubblicazioni_scientifiche/working_papers/2013/WP_DSE_pastore_tonellato_05_13.pdf
    File Function: First version, 2013
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chae, Seong S. & DuBien, Janice L. & Warde, William D., 2006. "A method of predicting the number of clusters using Rand's statistic," Computational Statistics & Data Analysis, Elsevier, vol. 50(12), pages 3531-3546, August.
    2. Fraley C. & Raftery A.E., 2002. "Model-Based Clustering, Discriminant Analysis, and Density Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 611-631, June.
    3. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    4. Matthijs Warrens, 2008. "On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 177-183, November.
    5. Michael Brusco & J. Cradit, 2001. "A variable-selection heuristic for K-means clustering," Psychometrika, Springer;The Psychometric Society, vol. 66(2), pages 249-270, June.
    6. Matthijs Warrens, 2008. "On Similarity Coefficients for 2×2 Tables and Correction for Chance," Psychometrika, Springer;The Psychometric Society, vol. 73(3), pages 487-502, September.
    7. James G.M. & Sugar C.A., 2003. "Clustering for Sparsely Sampled Functional Data," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 397-408, January.
    8. Ahmed N. Albatineh & Magdalena Niewiadomska-Bugaj & Daniel Mihalko, 2006. "On Similarity Indices and Correction for Chance Agreement," Journal of Classification, Springer;The Classification Society, vol. 23(2), pages 301-313, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    2. Matthijs J. Warrens, 2014. "New Interpretations of Cohen’s Kappa," Journal of Mathematics, Hindawi, vol. 2014, pages 1-9, September.
    3. Coffey, N. & Hinde, J. & Holian, E., 2014. "Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 14-29.
    4. Isabella Morlini & Sergio Zani, 2012. "Dissimilarity and similarity measures for comparing dendrograms and their applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 85-105, July.
    5. Matthijs J. Warrens & Hanneke Hoef, 2022. "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 487-509, November.
    6. Liu, Xueli & Yang, Mark C.K., 2009. "Simultaneous curve registration and clustering for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1361-1376, February.
    7. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    8. Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
    9. Theresa Ullmann & Anna Beer & Maximilian Hünemörder & Thomas Seidl & Anne-Laure Boulesteix, 2023. "Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 211-238, March.
    10. Wu, Han-Ming, 2011. "On biological validity indices for soft clustering algorithms for gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1969-1979, May.
    11. Aurora Torrente & Juan Romo, 2021. "Initializing k-means Clustering by Bootstrap and Data Depth," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 232-256, July.
    12. Martina Sundqvist & Julien Chiquet & Guillem Rigaill, 2023. "Adjusting the adjusted Rand Index," Computational Statistics, Springer, vol. 38(1), pages 327-347, March.
    13. José E. Chacón & Ana I. Rastrojo, 2023. "Minimum adjusted Rand index for two clusterings of a given size," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 125-133, March.
    14. Shaikh Mateen & McNicholas Paul D & Desmond Anthony F, 2010. "A Pseudo-EM Algorithm for Clustering Incomplete Longitudinal Data," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-17, March.
    15. J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
    16. De la Cruz-Mesia, Rolando & Quintana, Fernando A. & Marshall, Guillermo, 2008. "Model-based clustering for longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1441-1457, January.
    17. Michael Brusco & Douglas Steinley, 2015. "Affinity Propagation and Uncapacitated Facility Location Problems," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 443-480, October.
    18. Galimberti, Giuliano & Soffritti, Gabriele, 2014. "A multivariate linear regression analysis using finite mixtures of t distributions," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 138-150.
    19. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    20. Michael C. Thrun & Alfred Ultsch, 2021. "Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 280-312, July.

    More about this item

    Keywords

    similarity indices; Rand index; mixture models; bootstrap.;
    All these keywords.

    JEL classification:

    • C39 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Other
    • C46 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Specific Distributions
    • C49 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Other

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ven:wpaper:2013:05. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Geraldine Ludbrook (email available below). General contact details of provider: https://edirc.repec.org/data/dsvenit.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.