IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0300358.html
   My bibliography  Save this article

SillyPutty: Improved clustering by optimizing the silhouette width

Author

Listed:
  • Polina Bombina
  • Dwayne Tally
  • Zachary B Abrams
  • Kevin R Coombes

Abstract

Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered. Inspired by this observation, we developed a novel clustering method, called SillyPutty. Unlike existing methods, SillyPutty uses the silhouette width for individual elements as a tool to optimize the mean silhouette width. This shift in perspective allows for a more granular evaluation of clustering quality, potentially addressing limitations in current methodologies. To test the SillyPutty algorithm, we first simulated a series of data sets using the Umpire R package and then used real-workd data from The Cancer Genome Atlas. Using these data sets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.Availability: The SillyPutty R package can be downloaded from the Comprehensive R Archive Network (CRAN).

Suggested Citation

  • Polina Bombina & Dwayne Tally & Zachary B Abrams & Kevin R Coombes, 2024. "SillyPutty: Improved clustering by optimizing the silhouette width," PLOS ONE, Public Library of Science, vol. 19(6), pages 1-17, June.
  • Handle: RePEc:plo:pone00:0300358
    DOI: 10.1371/journal.pone.0300358
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0300358
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300358&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0300358?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mayra Z Rodriguez & Cesar H Comin & Dalcimar Casanova & Odemir M Bruno & Diego R Amancio & Luciano da F Costa & Francisco A Rodrigues, 2019. "Clustering algorithms: A comparative approach," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-34, January.
    2. Batool, Fatima & Hennig, Christian, 2021. "Clustering with the Average Silhouette Width," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    3. Karatzoglou, Alexandros & Smola, Alexandros & Hornik, Kurt & Zeileis, Achim, 2004. "kernlab - An S4 Package for Kernel Methods in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i09).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sayantan Kumar & Inez Y Oh & Suzanne E Schindler & Nupur Ghoshal & Zachary Abrams & Philip R O Payne, 2024. "Examining heterogeneity in dementia using data-driven unsupervised clustering of cognitive profiles," PLOS ONE, Public Library of Science, vol. 19(11), pages 1-19, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tsukioka, Yasutomo & Yanagi, Junya & Takada, Teruko, 2018. "Investor sentiment extracted from internet stock message boards and IPO puzzles," International Review of Economics & Finance, Elsevier, vol. 56(C), pages 205-217.
    2. Salvatore Barbaro & Anna-Sophie Kurella, 2025. "Dichotomous Preferences: Concepts, Measurement, and Evidence," Working Papers 2506, Gutenberg School of Management and Economics, Johannes Gutenberg-Universität Mainz.
    3. Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
    4. Águeda Pacheco Melo Barreto & Cristiano Silva Moura & Maurício José Silva Cunha & Hasheem Mannan & Marcos Flávio Silveira Vasconcelos D´Angelo & Matheus Pereira Libório, 2025. "Mapping of Vulnerability Factors of Brazilian Adolescents on the Internet," Child Indicators Research, Springer;The International Society of Child Indicators (ISCI), vol. 18(4), pages 1657-1683, August.
    5. Jihane El Ouadi & Hanae Errousso & Nicolas Malhene & Siham Benhadou & Hicham Medromi, 2022. "A machine-learning based hybrid algorithm for strategic location of urban bundling hubs to support shared public transport," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(5), pages 3215-3258, October.
    6. Andrea S Martinez-Vernon & James A Covington & Ramesh P Arasaradnam & Siavash Esfahani & Nicola O’Connell & Ioannis Kyrou & Richard S Savage, 2018. "An improved machine learning pipeline for urinary volatiles disease detection: Diagnosing diabetes," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-20, September.
    7. Madhumita Sahoo & Aman Kasot & Anirban Dhar & Amlanjyoti Kar, 2018. "On Predictability of Groundwater Level in Shallow Wells Using Satellite Observations," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 32(4), pages 1225-1244, March.
    8. P. J. Zarco-Tejada & T. Poblete & C. Camino & V. Gonzalez-Dugo & R. Calderon & A. Hornero & R. Hernandez-Clemente & M. Román-Écija & M. P. Velasco-Amo & B. B. Landa & P. S. A. Beck & M. Saponari & D. , 2021. "Divergent abiotic spectral pathways unravel pathogen stress signals across species," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    9. Roy Cerqueti & Antonio Iovanella & Raffaele Mattera, 2024. "Clustering networked funded European research activities through rank-size laws," Annals of Operations Research, Springer, vol. 342(3), pages 1707-1735, November.
    10. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    11. Uwe Ligges & Sebastian Krey, 2011. "Feature clustering for instrument classification," Computational Statistics, Springer, vol. 26(2), pages 279-291, June.
    12. Arnout Van Messem & Andreas Christmann, 2010. "A review on consistency and robustness properties of support vector machines for heavy-tailed distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 199-220, September.
    13. Ana Patrícia Rocha & Hugo Miguel Pereira Choupina & Maria do Carmo Vilas-Boas & José Maria Fernandes & João Paulo Silva Cunha, 2018. "System for automatic gait analysis based on a single RGB-D camera," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-24, August.
    14. Huisheng Wu & Maogui Hu & Yaping Zhang & Yuan Han, 2021. "An Empirical Mode Decomposition for Establishing Spatiotemporal Air Quality Trends in Shandong Province, China," Sustainability, MDPI, vol. 13(22), pages 1-10, November.
    15. Shaobo Jin & Sebastian Ankargren, 2019. "Frequentist Model Averaging in Structural Equation Modelling," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 84-104, March.
    16. repec:plo:pone00:0176339 is not listed on IDEAS
    17. Tyler C Shimko & Erik C Andersen, 2014. "COPASutils: An R Package for Reading, Processing, and Visualizing Data from COPAS Large-Particle Flow Cytometers," PLOS ONE, Public Library of Science, vol. 9(10), pages 1-5, October.
    18. Zulj, Valentin & Jin, Shaobo, 2024. "Can model averaging improve propensity score based estimation of average treatment effects?," Working Paper Series 2024:1, IFAU - Institute for Evaluation of Labour Market and Education Policy.
    19. Torgunn Aslaug Skjerve & Gunnar Klemetsdal & Bente Aspeholen Åby & Jon Kristian Sommerseth & Ulf Geir Indahl & Hanne Fjerdingby Olsen, 2025. "Using Density and Fuzzy Clustering for Data Cleaning and Segmental Description of Livestock Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 30(3), pages 870-885, September.
    20. Narjes Vara & Mahdieh Mirzabeigi & Hajar Sotudeh & Seyed Mostafa Fakhrahmad, 2022. "Application of k-means clustering algorithm to improve effectiveness of the results recommended by journal recommender system," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3237-3252, June.
    21. Tobias Rentschler & Philipp Gries & Thorsten Behrens & Helge Bruelheide & Peter Kühn & Steffen Seitz & Xuezheng Shi & Stefan Trogisch & Thomas Scholten & Karsten Schmidt, 2019. "Comparison of catchment scale 3D and 2.5D modelling of soil organic carbon stocks in Jiangxi Province, PR China," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-23, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0300358. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.