Author
Listed:
- Said El Kafhali
(Computer, Networks, Modeling, and Mobility Laboratory (IR2M), Faculty of Sciences and Techniques, Hassan First, University of Settat, Settat 26000, Morocco)
- Zakaria Soufiane Hafdi
(Computer, Networks, Modeling, and Mobility Laboratory (IR2M), Faculty of Sciences and Techniques, Hassan First, University of Settat, Settat 26000, Morocco)
Abstract
Early prediction of academic outcomes is vital to enabling timely intervention, supporting at-risk students, and improving educational planning and institutional performance. However, this task becomes particularly challenging when data availability is limited, such as in small or graduate-level programs. This study explores the potential of data augmentation techniques, specifically the Synthetic Minority Oversampling Technique, to enhance the performance of machine learning models applied to such constrained educational datasets. We conduct a comparative analysis using four datasets derived from prior research, each representing a distinct educational use case: one focused on predicting academic success in graduate programs, another on student dropout in virtual learning environments, a third on dissertation performance prediction, and a fourth addressing multi-class performance prediction in undergraduate coding courses. By applying consistent machine learning methods in the original and augmented datasets, we systematically evaluate the impact of data augmentation on classification performance using accuracy, precision, recall, and the F1 score. The results demonstrate marked improvements, with accuracy increases up to 21% and precision gains exceeding 25% in some models, notably with KNN and MLP. While not all algorithms benefit equally, our findings highlight data augmentation as a practical and impactful strategy for improving early prediction capabilities in Educational Data Mining (EDM). By leveraging multiple datasets and diverse educational contexts, this contribution provides robust evidence supporting the broader goal of enhancing decision-making and personalized support in digital learning environments.
Suggested Citation
Said El Kafhali & Zakaria Soufiane Hafdi, 2026.
"Enhancing Early Academic Outcome Prediction in Small Educational Datasets Through Data Augmentation Techniques,"
Data, MDPI, vol. 11(7), pages 1-28, July.
Handle:
RePEc:gam:jdataj:v:11:y:2026:i:7:p:161-:d:1980109
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:11:y:2026:i:7:p:161-:d:1980109. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager The email address of this maintainer does not seem to be valid anymore. Please ask MDPI Indexing Manager to update the entry or send us the correct address
(email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.