Author
Listed:
- Polina Bombina
- Dwayne Tally
- Zachary B Abrams
- Kevin R Coombes
Abstract
Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered. Inspired by this observation, we developed a novel clustering method, called SillyPutty. Unlike existing methods, SillyPutty uses the silhouette width for individual elements as a tool to optimize the mean silhouette width. This shift in perspective allows for a more granular evaluation of clustering quality, potentially addressing limitations in current methodologies. To test the SillyPutty algorithm, we first simulated a series of data sets using the Umpire R package and then used real-workd data from The Cancer Genome Atlas. Using these data sets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.Availability: The SillyPutty R package can be downloaded from the Comprehensive R Archive Network (CRAN).
Suggested Citation
Polina Bombina & Dwayne Tally & Zachary B Abrams & Kevin R Coombes, 2024.
"SillyPutty: Improved clustering by optimizing the silhouette width,"
PLOS ONE, Public Library of Science, vol. 19(6), pages 1-17, June.
Handle:
RePEc:plo:pone00:0300358
DOI: 10.1371/journal.pone.0300358
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0300358. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.