IDEAS home Printed from https://ideas.repec.org/a/eee/deveco/v172y2025ics0304387824001342.html
   My bibliography  Save this article

Combining survey and census data for improved poverty prediction using semi-supervised deep learning

Author

Listed:
  • Echevin, Damien
  • Fotso, Guy
  • Bouroubi, Yacine
  • Coulombe, Harold
  • Li, Qing

Abstract

This paper presents a methodology for predicting poverty using semi-supervised learning techniques, specifically pseudo-labeling, and deep learning algorithms. Standard poverty prediction models rely on limited household survey data, whereas our approach exploits large amounts of unlabeled census data to improve prediction accuracy. By applying pseudo-labeling, we improve key performance metrics across various African regions, where our models outperform conventional approaches to identifying poor individuals. Deep neural networks (DNNs) trained on pseudo-labeled data exhibited area under the curve (AUC) scores ranging from 0.8 to over 0.9, a notable improvement over previous machine learning survey-based methods. Furthermore, random undersampling was key to refining model performance, balancing higher coverage with some reduction in precision. These findings have significant implications for poverty targeting, enabling more accurate identification of poor individuals and supporting better resource allocation.

Suggested Citation

  • Echevin, Damien & Fotso, Guy & Bouroubi, Yacine & Coulombe, Harold & Li, Qing, 2025. "Combining survey and census data for improved poverty prediction using semi-supervised deep learning," Journal of Development Economics, Elsevier, vol. 172(C).
  • Handle: RePEc:eee:deveco:v:172:y:2025:i:c:s0304387824001342
    DOI: 10.1016/j.jdeveco.2024.103385
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304387824001342
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jdeveco.2024.103385?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Brown, Caitlin & Ravallion, Martin & van de Walle, Dominique, 2018. "A poor means test? Econometric targeting in Africa," Journal of Development Economics, Elsevier, vol. 134(C), pages 109-124.
    2. Thomas Pave Sohnesen & Niels Stender, 2017. "Is Random Forest a Superior Methodology for Predicting Poverty? An Empirical Assessment," Poverty & Public Policy, John Wiley & Sons, vol. 9(1), pages 118-133, March.
    3. Verme, Paolo, 2020. "Which Model for Poverty Predictions?," GLO Discussion Paper Series 468, Global Labor Organization (GLO).
    4. Aziza Usmanova & Ahmed Aziz & Dilshodjon Rakhmonov & Walid Osamy, 2022. "Utilities of Artificial Intelligence in Poverty Prediction: A Review," Sustainability, MDPI, vol. 14(21), pages 1-39, October.
    5. Abhijit V. Banerjee & Esther Duflo, 2007. "The Economic Lives of the Poor," Journal of Economic Perspectives, American Economic Association, vol. 21(1), pages 141-168, Winter.
    6. Ravallion, Martin, 2016. "The Economics of Poverty: History, Measurement, and Policy," OUP Catalogue, Oxford University Press, number 9780190212773, Decembrie.
    7. Russell Davidson & Jean-Yves Duclos, 2000. "Statistical Inference for Stochastic Dominance and for the Measurement of Poverty and Inequality," Econometrica, Econometric Society, vol. 68(6), pages 1435-1464, November.
    8. Alessandro Tarozzi & Angus Deaton, 2009. "Using Census and Survey Data to Estimate Poverty and Inequality for Small Areas," The Review of Economics and Statistics, MIT Press, vol. 91(4), pages 773-792, November.
    9. McKenzie, David & Sansone, Dario, 2019. "Predicting entrepreneurial success is hard: Evidence from a business plan competition in Nigeria," Journal of Development Economics, Elsevier, vol. 141(C).
    10. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    11. Hai‐Anh Dang & Dean Jolliffe & Calogero Carletto, 2019. "Data Gaps, Data Incomparability, And Data Imputation: A Review Of Poverty Measurement Methods For Data‐Scarce Environments," Journal of Economic Surveys, Wiley Blackwell, vol. 33(3), pages 757-797, July.
    12. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    13. Chris Elbers & Jean O. Lanjouw & Peter Lanjouw, 2003. "Micro--Level Estimation of Poverty and Inequality," Econometrica, Econometric Society, vol. 71(1), pages 355-364, January.
    14. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    15. Li, Qing & Yu, Shuai & Échevin, Damien & Fan, Min, 2022. "Is poverty predictable with machine learning? A study of DHS data from Kyrgyzstan," Socio-Economic Planning Sciences, Elsevier, vol. 81(C).
    16. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    17. Linden McBride & Austin Nichols, 2018. "Retooling Poverty Targeting Using Out-of-Sample Validation and Machine Learning," The World Bank Economic Review, World Bank, vol. 32(3), pages 531-550.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hai‐Anh H. Dang & Talip Kilic & Kseniya Abanokova & Calogero Carletto, 2025. "Poverty Imputation in Contexts Without Consumption Data: A Revisit With Further Refinements," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 71(1), February.
    2. Dang, Hai-Anh H & Kilic, Talip & Hlasny, Vladimir & Abanokova, Kseniya & Carletto, Calogero, 2024. "Using Survey-to-Survey Imputation to Fill Poverty Data Gaps at a Low Cost: Evidence from a Randomized Survey Experiment," IZA Discussion Papers 16792, Institute of Labor Economics (IZA).
    3. de Blasio, Guido & D'Ignazio, Alessio & Letta, Marco, 2022. "Gotham city. Predicting ‘corrupted’ municipalities with machine learning," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    4. Paolo Verme, 2020. "Which Model for Poverty Predictions?," Working Papers 521, ECINEQ, Society for the Study of Economic Inequality.
    5. Hai‐Anh H. Dang, 2021. "To impute or not to impute, and how? A review of poverty‐estimation methods in the absence of consumption data," Development Policy Review, Overseas Development Institute, vol. 39(6), pages 1008-1030, November.
    6. Dang, Hai-Anh H & Lanjouw, Peter F., 2021. "Data Scarcity and Poverty Measurement," IZA Discussion Papers 14631, Institute of Labor Economics (IZA).
    7. Guido de Blasio & Alessio D'Ignazio & Marco Letta, 2020. "Predicting Corruption Crimes with Machine Learning. A Study for the Italian Municipalities," Working Papers 16/20, Sapienza University of Rome, DISS.
    8. Beltramo, Theresa P. & Calvi, Rossella & De Giorgi, Giacomo & Sarr, Ibrahima, 2023. "Child poverty among refugees," World Development, Elsevier, vol. 171(C).
    9. Hai-Anh H. Dang & Talip Kilic & Ksenia Abanokova & Gero Carletto, 2024. "Imputing Poverty Indicators without Consumption Data : An Exploratory Analysis," Policy Research Working Paper Series 10867, The World Bank.
    10. Paolo Verme, 2023. "Predicting Poverty with Missing Incomes," Working Papers 642, ECINEQ, Society for the Study of Economic Inequality.
    11. Yulin Liu & Luyao Zhang, 2022. "Cryptocurrency Valuation: An Explainable AI Approach," Papers 2201.12893, arXiv.org, revised Jul 2023.
    12. Kristof Lommers & Ouns El Harzli & Jack Kim, 2021. "Confronting Machine Learning With Financial Research," Papers 2103.00366, arXiv.org, revised Mar 2021.
    13. Matthew A. Cole & Robert J R Elliott & Bowen Liu, 2020. "The Impact of the Wuhan Covid-19 Lockdown on Air Pollution and Health: A Machine Learning and Augmented Synthetic Control Approach," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 76(4), pages 553-580, August.
    14. Tatiana de Macedo Nogueira Lima, 2022. "Documento de Trabalho 03/2022 - Aprendizado de máquina e antitruste," Documentos de Trabalho 2022030, Conselho Administrativo de Defesa Econômica (Cade), Departamento de Estudos Econômicos.
    15. Dang,Hai-Anh H., 2018. "To impute or not to impute ? a review of alternative poverty estimation methods in the context of unavailable consumption data," Policy Research Working Paper Series 8403, The World Bank.
    16. Hai-Anh H. Dang, 2019. "To impute or not to impute, and how? A review of alternative poverty estimation methods in the context of unavailable consumption data," Working Papers 507, ECINEQ, Society for the Study of Economic Inequality.
    17. Alessandra Garbero & Marco Letta, 2022. "Predicting household resilience with machine learning: preliminary cross-country tests," Empirical Economics, Springer, vol. 63(4), pages 2057-2070, October.
    18. Hai-Anh H. Dang & Paolo Verme, 2023. "Estimating poverty for refugees in data-scarce contexts: an application of cross-survey imputation," Journal of Population Economics, Springer;European Society for Population Economics, vol. 36(2), pages 653-679, April.
    19. Daniel Wochner, 2020. "Dynamic Factor Trees and Forests – A Theory-led Machine Learning Framework for Non-Linear and State-Dependent Short-Term U.S. GDP Growth Predictions," KOF Working papers 20-472, KOF Swiss Economic Institute, ETH Zurich.
    20. Khudri, Md Mohsan & Hussey, Andrew, 2024. "Breastfeeding and Child Development Outcomes across Early Childhood and Adolescence: Doubly Robust Estimation with Machine Learning," IZA Discussion Papers 17080, Institute of Labor Economics (IZA).

    More about this item

    Keywords

    Poverty prediction; Machine learning; Deep learning; Pseudo-labeling; Semi-supervised learning;
    All these keywords.

    JEL classification:

    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • I32 - Health, Education, and Welfare - - Welfare, Well-Being, and Poverty - - - Measurement and Analysis of Poverty
    • O1 - Economic Development, Innovation, Technological Change, and Growth - - Economic Development

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:deveco:v:172:y:2025:i:c:s0304387824001342. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/devec .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.