Author
Listed:
- Mohd Nazrien Zaraini
(Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka)
- Noorrezam Yusop
(Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka)
Abstract
The classification of educational websites has become increasingly challenging, as traditional indicators such as domain extensions no longer reliably reflect a site’s purpose. This study investigates the optimal number of TF-IDF-ranked keywords (K) required to balance classification accuracy and computational efficiency in a one-class setting. Using a curated dataset sourced from the DMOZ directory and verified educational websites, multiple Top-K keyword subsets (K = 10–200) were evaluated. A One-Class Support Vector Machine (SVM) was employed, with performance assessed through cross-validation and separate positive/negative test sets. Results indicate that classification accuracy peaks within the range of K = 30–100, with diminished performance beyond this range due to the inclusion of irrelevant or noisy terms. These findings offer a practical and scalable framework for content-based educational website classification, particularly for applications in low-resource environments, and challenge the default reliance on exhaustive keyword feature sets.
Suggested Citation
Mohd Nazrien Zaraini & Noorrezam Yusop, 2025.
"How Many Keywords are Enough? Determining the Optimal Top-K for Educational Website Classification,"
International Journal of Research and Innovation in Social Science, International Journal of Research and Innovation in Social Science (IJRISS), vol. 9(6), pages 1574-1586, June.
Handle:
RePEc:bcp:journl:v:9:y:2025:issue-6:p:1574-1586
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bcp:journl:v:9:y:2025:issue-6:p:1574-1586. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Dr. Pawan Verma (email available below). General contact details of provider: https://rsisinternational.org/journals/ijriss/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.