IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v38y2021i2d10.1007_s00357-020-09370-5.html
   My bibliography  Save this article

k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Author

Listed:
  • Andrzej Młodak

    (Statistical Office in Poznań
    The President Stanisław Wojciechowski State University of Applied Sciences in Kalisz, Inter-Faculty Department of Mathematics and Statistics)

Abstract

We analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k-means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). That is, some special two-stage algorithms being the kinds of clustering with relational constraint are proposed. They optimize division of set of objects into clusters respecting the requirement that neighbours have to belong to the same cluster. In the case of the probabilistic d-clustering, relevant modification of its target function is suggested and studied. Versatile simulation study and empirical analysis verify the practical efficiency of these methods. The quality of clustering is assessed on the basis of indices of homogeneity, heterogeneity and correctness of clusters as well as the silhouette index. Using these tools and similarity indices (Rand, Peirce and Sokal and Sneath), it was shown that the probabilistic d-clustering can produce better results than Ward’s algorithm. In comparison with the k-means approach, the probabilistic d-clustering—although gives rather similar results—is more robust to creation of trivial (of which empty) clusters and produces less diversified (in replications, in terms of correctness) results than k-means approach, i.e. is more predictable from the point of view of the clustering quality.

Suggested Citation

  • Andrzej Młodak, 2021. "k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 313-352, July.
  • Handle: RePEc:spr:jclass:v:38:y:2021:i:2:d:10.1007_s00357-020-09370-5
    DOI: 10.1007/s00357-020-09370-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-020-09370-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-020-09370-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Anuška Ferligoj & Vladimir Batagelj, 1982. "Clustering with relational constraint," Psychometrika, Springer;The Psychometric Society, vol. 47(4), pages 413-426, December.
    2. Kelejian, Harry H & Prucha, Ingmar R, 1998. "A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances," The Journal of Real Estate Finance and Economics, Springer, vol. 17(1), pages 99-121, July.
    3. Jan Kubacki & Alina Jędrzejczak, 2016. "Small Area Estimation Of Income Under Spatial Sar Model," Statistics in Transition New Series, Polish Statistical Association, vol. 17(3), pages 365-390, September.
    4. Adi Ben-Israel & Cem Iyigun, 2008. "Probabilistic D-Clustering," Journal of Classification, Springer;The Classification Society, vol. 25(1), pages 5-26, June.
    5. Monica Pratesi & Nicola Salvati, 2008. "Small area estimation: the EBLUP estimator based on spatially correlated random area effects," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 17(1), pages 113-141, February.
    6. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    7. Ahmed N. Albatineh & Magdalena Niewiadomska-Bugaj & Daniel Mihalko, 2006. "On Similarity Indices and Correction for Chance Agreement," Journal of Classification, Springer;The Classification Society, vol. 23(2), pages 301-313, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antonio D’Ambrosio & Sonia Amodio & Carmela Iorio & Giuseppe Pandolfo & Roberta Siciliano, 2021. "Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 112-128, April.
    2. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    3. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    4. Martina Sundqvist & Julien Chiquet & Guillem Rigaill, 2023. "Adjusting the adjusted Rand Index," Computational Statistics, Springer, vol. 38(1), pages 327-347, March.
    5. Isabella Morlini & Sergio Zani, 2012. "Dissimilarity and similarity measures for comparing dendrograms and their applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 85-105, July.
    6. Matthijs J. Warrens & Hanneke Hoef, 2022. "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 487-509, November.
    7. Jeffrey L. Andrews & Ryan Browne & Chelsey D. Hvingelby, 2022. "On Assessments of Agreement Between Fuzzy Partitions," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 326-342, July.
    8. Ekaterina Kovaleva & Boris Mirkin, 2015. "Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 414-442, October.
    9. Jonathon J. O’Brien & Michael T. Lawson & Devin K. Schweppe & Bahjat F. Qaqish, 2020. "Suboptimal Comparison of Partitions," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 435-461, July.
    10. Jędrzejczak Alina & Kubacki Jan, 2019. "Estimation Of Income Characteristics For Regions In Poland Using Spatio-Temporal Small Area Models," Statistics in Transition New Series, Statistics Poland, vol. 20(4), pages 113-134, December.
    11. Theresa Ullmann & Anna Beer & Maximilian Hünemörder & Thomas Seidl & Anne-Laure Boulesteix, 2023. "Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 211-238, March.
    12. José E. Chacón & Ana I. Rastrojo, 2023. "Minimum adjusted Rand index for two clusterings of a given size," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 125-133, March.
    13. Alina Jędrzejczak & Jan Kubacki, 2019. "Estimation Of Income Characteristics For Regions In Poland Using Spatio-Temporal Small Area Models," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 113-134, December.
    14. Matthijs Warrens, 2008. "On Similarity Coefficients for 2×2 Tables and Correction for Chance," Psychometrika, Springer;The Psychometric Society, vol. 73(3), pages 487-502, September.
    15. Valerie Robert & Yann Vasseur & Vincent Brault, 2021. "Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 158-186, April.
    16. Carmela Iorio & Gianluca Frasso & Antonio D’Ambrosio & Roberta Siciliano, 2023. "Boosted-oriented probabilistic smoothing-spline clustering of series," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(4), pages 1123-1140, October.
    17. Johann Kraus & Christoph Müssel & Günther Palm & Hans Kestler, 2011. "Multi-objective selection for collecting cluster alternatives," Computational Statistics, Springer, vol. 26(2), pages 341-353, June.
    18. Ahmed Albatineh & Magdalena Niewiadomska-Bugaj, 2011. "Correcting Jaccard and other similarity indices for chance agreement in cluster analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(3), pages 179-200, October.
    19. Isabella Morlini & Sergio Zani, 2012. "A New Class of Weighted Similarity Indices Using Polytomous Variables," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 199-226, July.
    20. Tiziano Arduini & Eleonora Patacchini & Edoardo Rainone, 2020. "Treatment Effects With Heterogeneous Externalities," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(4), pages 826-838, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:38:y:2021:i:2:d:10.1007_s00357-020-09370-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.