Author
Listed:
- Abdul Muqtasid bin Rushdi
- Mohammad bin Hossin
- Suhaila binti Saee
- Norita binti Md Norwawi
Abstract
Background: The k-nearest neighbours (k-NN) is a well-established classifier in machine learning. Yet, its performance drops and computational costs rise with extensive or redundant datasets. Furthermore, current instance selection (IS) approaches often face scalability problems and are sensitive to parameter settings.Objective: This study seeks to design a straightforward and efficient IS algorithm that reduces both dataset size and computational demands, yet preserves or enhances the accuracy of k-NN classification.Methods: We propose Euclidean ranking-based instance selection (ERbIS), a novel IS approach that prioritises samples based on their Euclidean distance from a single anchor point. In this study, two anchor points are introduced: the first data anchor point (FD) and the mean of each column anchor point (MEC). Both ERbIS models (ERbIS-FD and ERbIS-MEC) are evaluated across 21 KEEL datasets. For performance comparison, the ERbIS models are benchmarked against current and state-of-the-art methods, including condensed nearest neighbour rule (CNN), edited nearest neighbour rule (ENN), adaptive threshold-based instance selection algorithm (ATISA1), decremental reduction optimization procedure (DROP3) and ranking-based instance selection (RIS1). The evaluation focuses on reduction speed, reduction rate and k-NN classification accuracy.Results: The ERbIS models reduce dataset size by an average of 35 to 40% without compromising accuracy compared to the original k-NN and state-of-the-art IS models. Both ERbIS models also demonstrate superior computational efficiency in the reduction process relative to ENN and CNN. Notably, the ERbIS-MEC variant, which utilises the mean of each column as the anchor point, achieves the highest generalisation accuracy among all current and state-of-the-art models.Conclusion: ERbIS offers an efficient and scalable approach for instance selection in k-NN classification, achieving significant data reduction and enhanced predictive accuracy with minimal parameter tuning. The model demonstrates strong potential for application to large datasets and may be further improved by investigating alternative distance metrics or integrating hybrid instance selection strategies.
Suggested Citation
Abdul Muqtasid bin Rushdi & Mohammad bin Hossin & Suhaila binti Saee & Norita binti Md Norwawi, .
"Similarity Ranking-Based Instance Selection for Enhancing k-NN Classification Performances,"
Acta Informatica Pragensia, Prague University of Economics and Business, vol. 0.
Handle:
RePEc:prg:jnlaip:v:preprint:id:310
DOI: 10.18267/j.aip.310
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaip:v:preprint:id:310. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.