IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-04274684.html
   My bibliography  Save this paper

A decision support framework to incorporate textual data for early student dropout prediction in higher education

Author

Listed:
  • Minh Phan

    (LEM - Lille économie management - UMR 9221 - UA - Université d'Artois - UCL - Université catholique de Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique)

  • Arno de Caigny

    (LEM - Lille économie management - UMR 9221 - UA - Université d'Artois - UCL - Université catholique de Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique)

  • Kristof Coussement

    (LEM - Lille économie management - UMR 9221 - UA - Université d'Artois - UCL - Université catholique de Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique)

Abstract

Managing student dropout in higher education is critical, considering its substantial impacts on students' lives, academic institutions, and society as a whole. Using predictive modeling can be instrumental for this task, as a means to identify dropouts proactively on the basis of student characteristics and their academic performance. To enhance these predictions, textual student feedback also might be relevant; this article proposes a hybrid decision support framework that combines predictive modeling with student segmentation efforts. A real-life data set from a French higher education institution, containing information of 14,391 students and 62,545 feedback documents, confirms the superior performance of the proposed framework, in terms of the area under the curve and top decile lift, compared with various benchmarks. In contributing to decision support system research, this study (1) proposes a new framework for automatic, data-driven segmentation of students based on textual data; (2) compares multiple text representation methods and confirms that incorporating student textual feedback data improves the predictive performance of student dropout models; and (3) establishes useful insights to help decision-makers anticipate and manage student dropout behaviors.

Suggested Citation

  • Minh Phan & Arno de Caigny & Kristof Coussement, 2023. "A decision support framework to incorporate textual data for early student dropout prediction in higher education," Post-Print hal-04274684, HAL.
  • Handle: RePEc:hal:journl:hal-04274684
    DOI: 10.1016/j.dss.2023.113940
    Note: View the original document on HAL open archive server: https://hal.science/hal-04274684v1
    as

    Download full text from publisher

    File URL: https://hal.science/hal-04274684v1/document
    Download Restriction: no

    File URL: https://libkey.io/10.1016/j.dss.2023.113940?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Cindi Mason & Janet Twomey & David Wright & Lawrence Whitman, 2018. "Predicting Engineering Student Attrition Risk Using a Probabilistic Neural Network and Comparing Results with a Backpropagation Neural Network and Logistic Regression," Research in Higher Education, Springer;Association for Institutional Research, vol. 59(3), pages 382-400, May.
    2. Arno de Caigny & Kristof Coussement & Koen W. de Bock & Stefan Lessmann, 2019. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," Post-Print hal-02275958, HAL.
    3. Gandomi, Amir & Haider, Murtaza, 2015. "Beyond the hype: Big data concepts, methods, and analytics," International Journal of Information Management, Elsevier, vol. 35(2), pages 137-144.
    4. Arno de Caigny & Kristof Coussement & Koen W. de Bock, 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," Post-Print hal-01741661, HAL.
    5. K. Coussement & D. Van Den Poel, 2008. "Integrating the Voice of Customers through Call Center Emails into a Decision Support System for Churn Prediction," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 08/502, Ghent University, Faculty of Economics and Business Administration.
    6. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    7. K. Coussement & D. Van Den Poel, 2007. "Improving Customer Complaint Management by Automatic Email Classification Using Linguistic Style Features as Predictors," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 07/481, Ghent University, Faculty of Economics and Business Administration.
    8. Zhu, Mu & Ghodsi, Ali, 2006. "Automatic dimensionality selection from the scree plot via the use of profile likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 918-930, November.
    9. Kristof Coussement & Stefan Lessmann & Geert Verstraeten, 2017. "A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry," Post-Print hal-01745261, HAL.
    10. Michael W. Dorrity & Lauren M. Saunders & Christine Queitsch & Stanley Fields & Cole Trapnell, 2020. "Dimensionality reduction by UMAP to visualize physical and genetic interactions," Nature Communications, Nature, vol. 11(1), pages 1-6, December.
    11. Delen, Dursun & Topuz, Kazim & Eryarsoy, Enes, 2020. "Development of a Bayesian Belief Network-based DSS for predicting and understanding freshmen student attrition," European Journal of Operational Research, Elsevier, vol. 281(3), pages 575-587.
    12. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W., 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," European Journal of Operational Research, Elsevier, vol. 269(2), pages 760-772.
    13. Tsai, Ming-Feng & Wang, Chuan-Ju, 2017. "On the risk prediction and analysis of soft information in finance reports," European Journal of Operational Research, Elsevier, vol. 257(1), pages 243-250.
    14. Cédric Beaulac & Jeffrey S. Rosenthal, 2019. "Predicting University Students’ Academic Success and Major Using Random Forests," Research in Higher Education, Springer;Association for Institutional Research, vol. 60(7), pages 1048-1064, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Badiee, Aghdas & Moshtari, Mohammad & Berenguer, Gemma, 2024. "A systematic review of operations research and management science modeling techniques in the study of higher education institutions," Socio-Economic Planning Sciences, Elsevier, vol. 93(C).
    2. Alaa Marshan & Farah Nasreen Mohamed Nizar & Athina Ioannou & Konstantina Spanaki, 2025. "Comparing Machine Learning and Deep Learning Techniques for Text Analytics: Detecting the Severity of Hate Comments Online," Information Systems Frontiers, Springer, vol. 27(2), pages 487-505, April.
    3. Thuy, Arthur & Benoit, Dries F., 2024. "Explainability through uncertainty: Trustworthy decision-making with neural networks," European Journal of Operational Research, Elsevier, vol. 317(2), pages 330-340.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arno de Caigny & Kristof Coussement & Koen W. de Bock & Stefan Lessmann, 2019. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," Post-Print hal-02275958, HAL.
    2. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W. & Lessmann, Stefan, 2020. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1563-1578.
    3. Arno de Caigny & Kristof Coussement & Koen de Bock, 2020. "Leveraging fine-grained transaction data for customer life event predictions," Post-Print hal-02507998, HAL.
    4. Borchert, Philipp & Coussement, Kristof & De Caigny, Arno & De Weerdt, Jochen, 2023. "Extending business failure prediction models with textual website content using deep learning," European Journal of Operational Research, Elsevier, vol. 306(1), pages 348-357.
    5. De Caigny, Arno & Coussement, Kristof & Hoornaert, Steven & Meire, Matthijs, 2025. "Life event-based marketing using AI," Journal of Business Research, Elsevier, vol. 193(C).
    6. Philipp Borchert & Kristof Coussement & Arno de Caigny & Jochen de Weerdt, 2023. "Extending business failure prediction models with textual website content using deep learning," Post-Print hal-03976762, HAL.
    7. Kazim Topuz & Akhilesh Bajaj & Kristof Coussement & Timothy L. Urban, 2025. "Interpretable machine learning and explainable artificial intelligence," Annals of Operations Research, Springer, vol. 347(2), pages 775-782, April.
    8. Jean Robert Kala Kamdjoug & Hyacinthe Djanan Sando & Jules Raymond Kala & Arielle Ornela Ndassi Teutio & Sunil Tiwari & Samuel Fosso Wamba, 2024. "Data analytics-based auditing: a case study of fraud detection in the banking context," Annals of Operations Research, Springer, vol. 340(2), pages 1161-1188, September.
    9. Arno Caigny & Kristof Coussement & Matthijs Meire & Steven Hoornaert, 2025. "Investigating the impact of undersampling and bagging: an empirical investigation for customer attrition modeling," Annals of Operations Research, Springer, vol. 346(3), pages 2401-2421, March.
    10. Koen W. de Bock & Arno de Caigny, 2021. "Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling," Post-Print hal-03391564, HAL.
    11. Matthias Bogaert & Lex Delaere, 2023. "Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art," Mathematics, MDPI, vol. 11(5), pages 1-28, February.
    12. Johannes Habel & Sascha Alavi & Nicolas Heinitz, 2023. "A theory of predictive sales analytics adoption," AMS Review, Springer;Academy of Marketing Science, vol. 13(1), pages 34-54, June.
    13. Liu, Zhenkun & Zhang, Ying & Abedin, Mohammad Zoynul & Wang, Jianzhou & Yang, Hufang & Gao, Yuyang & Chen, Yinghao, 2024. "Profit-driven fusion framework based on bagging and boosting classifiers for potential purchaser prediction," Journal of Retailing and Consumer Services, Elsevier, vol. 79(C).
    14. Liu, Zhenkun & Jiang, Ping & De Bock, Koen W. & Wang, Jianzhou & Zhang, Lifang & Niu, Xinsong, 2024. "Extreme gradient boosting trees with efficient Bayesian optimization for profit-driven customer churn prediction," Technological Forecasting and Social Change, Elsevier, vol. 198(C).
    15. Lewlisa Saha & Hrudaya Kumar Tripathy & Tarek Gaber & Hatem El-Gohary & El-Sayed M. El-kenawy, 2023. "Deep Churn Prediction Method for Telecommunication Industry," Sustainability, MDPI, vol. 15(5), pages 1-21, March.
    16. Ebru Pekel Ozmen & Tuncay Ozcan, 2022. "A novel deep learning model based on convolutional neural networks for employee churn prediction," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(3), pages 539-550, April.
    17. De Bock, Koen W. & Coussement, Kristof & Caigny, Arno De & Słowiński, Roman & Baesens, Bart & Boute, Robert N. & Choi, Tsan-Ming & Delen, Dursun & Kraus, Mathias & Lessmann, Stefan & Maldonado, Sebast, 2024. "Explainable AI for Operational Research: A defining framework, methods, applications, and a research agenda," European Journal of Operational Research, Elsevier, vol. 317(2), pages 249-272.
    18. Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
    19. Thuy, Arthur & Benoit, Dries F., 2024. "Explainability through uncertainty: Trustworthy decision-making with neural networks," European Journal of Operational Research, Elsevier, vol. 317(2), pages 330-340.
    20. Rahman, Shimanto & Janssens, Bram & Bogaert, Matthias, 2025. "Profit-driven pre-processing in B2B customer churn modeling using fairness techniques," Journal of Business Research, Elsevier, vol. 189(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-04274684. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.