IDEAS home Printed from https://ideas.repec.org/a/spr/jcsosc/v6y2023i1d10.1007_s42001-022-00195-3.html
   My bibliography  Save this article

School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach

Author

Listed:
  • Hazal Colak Oz

    (Development Analytics)

  • Çiçek Güven

    (Tilburg University)

  • Gonzalo Nápoles

    (Tilburg University)

Abstract

Designing early warning systems through machine learning (ML) models to identify students at risk of dropout can improve targeting mechanisms and lead to efficient social policy interventions in education. School dropout is a culmination of various factors that drive children to leave school, and timely policy responses are most needed to address these underlying factors and improve school retention of children over time. However, applying ML approaches to school dropout prediction is an important challenge, especially in low-income countries, where data collection and management systems are relatively more prone to financial and technical constraints. For this reason, this study suggests using already collected household panel data to predict the probability of school dropout and explore feature importance for primary school children in Malawi through ML models. A rich set of variables is obtained in this study from the household data and used to build Random Forest (RF), least absolute shrinkage and selection operator (LASSO), Ridge and multilayer neural network (MNN) models. The study further explores how performance metrics differ when we embed the training samples' weights representing frequency in sampling design into the cost function of these ML models to discuss the implications of using household data in computational social science. LASSO and MNN models trained with sample weights become more prominent due to their higher recall rates of 80.6% and 78.8%. Compared to the baseline model trained with sample weights, the recall rate gained is roughly 56 percentage points using LASSO and 54 percentage points using MNN. Also, comparing LASSO and MNN trained with and without sample weights reveals that training models with sample weights increase the recall rate roughly by 11 percentage points for LASSO and 12 percentage points for MNN. Lastly, the paper provides a comprehensive and unified approach to better interpret the models using a game-theoretic approach – SHapley Additive exPlanations (SHAP) – to quantify feature importance. As a result, socio-economic characteristics of children, such as working in household farming and father's education level, are among the most important features contributing to the probability of school dropout in ML models. This study argues that the weighted sample structure of household data and its wide range of variables explored through the SHAP method for feature importance can enrich the literature and yield valuable results to harness data science for society.

Suggested Citation

  • Hazal Colak Oz & Çiçek Güven & Gonzalo Nápoles, 2023. "School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach," Journal of Computational Social Science, Springer, vol. 6(1), pages 245-287, April.
  • Handle: RePEc:spr:jcsosc:v:6:y:2023:i:1:d:10.1007_s42001-022-00195-3
    DOI: 10.1007/s42001-022-00195-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s42001-022-00195-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s42001-022-00195-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Breton, Theodore R., 2004. "Can institutions or education explain world poverty? An augmented Solow model provides some insights," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 33(1), pages 45-69, March.
    2. Bjerk, David, 2012. "Re-examining the impact of dropping out on criminal and labor outcomes in early adulthood," Economics of Education Review, Elsevier, vol. 31(1), pages 110-122.
    3. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    4. Haimovich Paz,Francisco & Vazquez,Emmanuel Jose & Adelman,Melissa Ann, 2021. "Scalable Early Warning Systems for School Dropout Prevention : Evidence from a 4.000-School Randomized Controlled Trial," Policy Research Working Paper Series 9685, The World Bank.
    5. Melissa Adelman & Francisco Haimovich & Andres Ham & Emmanuel Vazquez, 2018. "Predicting school dropout with administrative data: new evidence from Guatemala and Honduras," Education Economics, Taylor & Francis Journals, vol. 26(4), pages 356-372, July.
    6. Dario Sansone, 2019. "Beyond Early Warning Indicators: High School Dropout and Machine Learning," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 81(2), pages 456-485, April.
    7. Boccanfuso, Dorothée & Larouche, Alexandre & Trandafir, Mircea, 2015. "Quality of Higher Education and the Labor Market in Developing Countries: Evidence from an Education Reform in Senegal," World Development, Elsevier, vol. 74(C), pages 412-424.
    8. Dragone, Davide & Migali, Giuseppe & Zucchelli, Eugenio, 2021. "High School Dropout and the Intergenerational Transmission of Crime," IZA Discussion Papers 14129, Institute of Labor Economics (IZA).
    9. Kate Bird & Kate Higgins & Andy McKay, 2010. "Conflict, education and the intergenerational transmission of poverty in Northern Uganda," Journal of International Development, John Wiley & Sons, Ltd., vol. 22(8), pages 1183-1196, November.
    10. Tanveer Ahmed Naveed & David Gordon & Sami Ullah & Mary Zhang, 2021. "The Construction of an Asset Index at Household Level and Measurement of Economic Disparities in Punjab (Pakistan) by using MICS-Micro Data," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 155(1), pages 73-95, May.
    11. Kazutaka Sekine & Marian Ellen Hodgkin, 2017. "Effect of child marriage on girls' school dropout in Nepal: Analysis of data from the Multiple Indicator Cluster Survey 2014," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-13, July.
    12. Michele Campolieti & Tony Fang & Morley Gunderson, 2010. "Labour Market Outcomes and Skill Acquisition of High-School Dropouts," Journal of Labor Research, Springer, vol. 31(1), pages 39-52, March.
    13. Chen Gao & Chengcheng J. Fei & Bruce A. McCarl & David J. Leatham, 2020. "Identifying Vulnerable Households Using Machine Learning," Sustainability, MDPI, vol. 12(15), pages 1-18, July.
    14. Sunny, Bindu S. & Elze, Markus & Chihana, Menard & Gondwe, Levie & Crampin, Amelia C. & Munkhondya, Masoyaona & Kondowe, Scotch & Glynn, Judith R., 2017. "Failing to progress or progressing to fail? Age-for-grade heterogeneity and grade repetition in primary schools in Karonga district, northern Malawi," International Journal of Educational Development, Elsevier, vol. 52(C), pages 68-80.
    15. Nancy Duong Nguyen & Patrick Murphy, 2015. "To Weight or Not To Weight? A Statistical Analysis of How Weights Affect the Reliability of the Quarterly National Household Survey for Immigration Research in Ireland," The Economic and Social Review, Economic and Social Studies, vol. 46(4), pages 567-603.
    16. Charles R. Harris & K. Jarrod Millman & Stéfan J. Walt & Ralf Gommers & Pauli Virtanen & David Cournapeau & Eric Wieser & Julian Taylor & Sebastian Berg & Nathaniel J. Smith & Robert Kern & Matti Picu, 2020. "Array programming with NumPy," Nature, Nature, vol. 585(7825), pages 357-362, September.
    17. Mussida, Chiara & Sciulli, Dario & Signorelli, Marcello, 2019. "Secondary school dropout and work outcomes in ten developing countries," Journal of Policy Modeling, Elsevier, vol. 41(4), pages 547-567.
    18. Eldridge Moses, 2011. "Quality of education and the labour market: A conceptual and literature overview," Working Papers 07/2011, Stellenbosch University, Department of Economics.
    19. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    20. Shimamura, Yasuharu & Lastarria-Cornhiel, Susana, 2010. "Credit Program Participation and Child Schooling in Rural Malawi," World Development, Elsevier, vol. 38(4), pages 567-580, April.
    21. Wydick, Bruce, 1999. "The Effect of Microenterprise Lending on Child Schooling in Guatemala," Economic Development and Cultural Change, University of Chicago Press, vol. 47(4), pages 853-869, July.
    22. Cristian Crespo, 2021. "Two Become One: Improving the Targeting of Conditional Cash Transfers with a Predictive Model of School Dropout," Economía Journal, The Latin American and Caribbean Economic Association - LACEA, vol. 0(Fall 2020), pages 1-45, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Filmer,Deon P. & Nahata,Vatsal & Sabarwal,Shwetlena, 2021. "Preparation, Practice, and Beliefs : A Machine Learning Approach to Understanding Teacher Effectiveness," Policy Research Working Paper Series 9847, The World Bank.
    2. Delogu, Marco & Lagravinese, Raffaele & Paolini, Dimitri & Resce, Giuliano, 2024. "Predicting dropout from higher education: Evidence from Italy," Economic Modelling, Elsevier, vol. 130(C).
    3. McKenzie, David & Sansone, Dario, 2019. "Predicting entrepreneurial success is hard: Evidence from a business plan competition in Nigeria," Journal of Development Economics, Elsevier, vol. 141(C).
    4. Mark Musumba & Naureen Fatema & Shahriar Kibriya, 2021. "Prevention Is Better Than Cure: Machine Learning Approach to Conflict Prediction in Sub-Saharan Africa," Sustainability, MDPI, vol. 13(13), pages 1-18, July.
    5. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    6. Naguib, Costanza, 2019. "Estimating the Heterogeneous Impact of the Free Movement of Persons on Relative Wage Mobility," Economics Working Paper Series 1903, University of St. Gallen, School of Economics and Political Science.
    7. Vimefall, Elin, 2015. "Income diversification and working children," Working Papers 2015:8, Örebro University, School of Business.
    8. Erik Heilmann & Janosch Henze & Heike Wetzel, 2021. "Machine learning in energy forecasts with an application to high frequency electricity consumption data," MAGKS Papers on Economics 202135, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    9. Tranos, Emmanouil & Incera, Andre Carrascal & Willis, George, 2022. "Using the web to predict regional trade flows: data extraction, modelling, and validation," OSF Preprints 9bu5z, Center for Open Science.
    10. Bryan T. Kelly & Asaf Manela & Alan Moreira, 2019. "Text Selection," NBER Working Papers 26517, National Bureau of Economic Research, Inc.
    11. Giovanni Di Franco & Michele Santurro, 2021. "Machine learning, artificial neural networks and social research," Quality & Quantity: International Journal of Methodology, Springer, vol. 55(3), pages 1007-1025, June.
    12. Isil Erel & Léa H Stern & Chenhao Tan & Michael S Weisbach, 2021. "Selecting Directors Using Machine Learning," NBER Chapters, in: Big Data: Long-Term Implications for Financial Markets and Firms, pages 3226-3264, National Bureau of Economic Research, Inc.
    13. Fumagalli, Laura & Martin, Thomas, 2023. "Child labor among farm households in Mozambique and the role of reciprocal adult labor," World Development, Elsevier, vol. 161(C).
    14. Heller, Yuval & Tubul, Itay, 2023. "Strategies in the repeated prisoner’s dilemma: A cluster analysis," MPRA Paper 117444, University Library of Munich, Germany.
    15. de Lucio, Juan, 2021. "Estimación adelantada del crecimiento regional mediante redes neuronales LSTM," INVESTIGACIONES REGIONALES - Journal of REGIONAL RESEARCH, Asociación Española de Ciencia Regional, issue 49, pages 45-64.
    16. Michael J. Weir & Thomas W. Sproul, 2019. "Identifying Drivers of Genetically Modified Seafood Demand: Evidence from a Choice Experiment," Sustainability, MDPI, vol. 11(14), pages 1-21, July.
    17. Gallin, Joshua & Molloy, Raven & Nielsen, Eric & Smith, Paul & Sommer, Kamila, 2021. "Measuring aggregate housing wealth: New insights from machine learning ☆," Journal of Housing Economics, Elsevier, vol. 51(C).
    18. Falco J. Bargagli-Dtoffi & Massimo Riccaboni & Armando Rungi, 2020. "Machine Learning for Zombie Hunting. Firms Failures and Financial Constraints," Working Papers 01/2020, IMT School for Advanced Studies Lucca, revised Jun 2020.
    19. Chakraborty, Chiranjit & Joseph, Andreas, 2017. "Machine learning at central banks," Bank of England working papers 674, Bank of England.
    20. Jinnat Ara & Dipanwita Sarkar & Jayanta Sarkar, 2021. "Like mother like daughter? Occupational mobility among children under asset transfer program in Bangladesh," QuBE Working Papers 061, QUT Business School.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcsosc:v:6:y:2023:i:1:d:10.1007_s42001-022-00195-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.