Author
Listed:
- Roberto Douglas G. de Aquino
(Department of Computer Systems, University of Sao Paulo, Sao Carlos 13566-590, SP, Brazil)
- Filipe A. N. Verri
(Computer Science Division, Aeronautics Institute of Technology, Sao Jose dos Campos 12228-900, SP, Brazil)
- Renato Cordeiro de Amorim
(School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK)
- Vitor V. Curtis
(Computer Science Division, Aeronautics Institute of Technology, Sao Jose dos Campos 12228-900, SP, Brazil)
Abstract
Cluster validation for categorical data remains a critical challenge in unsupervised learning, where traditional distance-based indices often fail to capture meaningful structures. This paper introduces SLEDgeHammer (SLEDgeH), an enhanced internal validation index that addresses these limitations through the optimized weighting of semantic descriptors derived from frequent patterns. Building upon the SLEDge framework, the proposed method systematically combines four indicators—Support, Length, Exclusivity, and Difference—using weight optimization to improve cluster discrimination, particularly in sparse and imbalanced scenarios. Unlike conventional methods, SLEDgeH does not rely on distance metrics; instead, it leverages the statistical prevalence and uniqueness of feature combinations within clusters. Through extensive experiments on 3600 synthetic categorical data sets and 18 real-world data sets, we demonstrate that SLEDgeH achieves significantly higher accuracy in identifying the optimal number of clusters and exhibits greater robustness, with lower standard deviation, compared to existing indices. Additionally, the index provides inherent interpretability by generating semantic cluster descriptions, making it a practical tool for supporting decision making in categorical data analysis.
Suggested Citation
Roberto Douglas G. de Aquino & Filipe A. N. Verri & Renato Cordeiro de Amorim & Vitor V. Curtis, 2025.
"SLEDgeHammer: A Frequent Pattern-Based Cluster Validation Index for Categorical Data,"
Mathematics, MDPI, vol. 13(17), pages 1-31, September.
Handle:
RePEc:gam:jmathe:v:13:y:2025:i:17:p:2832-:d:1740875
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:17:p:2832-:d:1740875. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.