IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/129032.html
   My bibliography  Save this paper

Generative data modelling for diverse populations in Africa: insights from South Africa

Author

Listed:
  • Simmons, Sally Sonia
  • Hagan Jr, John Elvis
  • Schack, Thomas

Abstract

Studies on the demography and health of racially diverse African populations are scarce, particularly due to lingering data challenges. Generative data modelling has emerged as a valuable solution to this burden. The study, therefore, examined the efficacy of Conditional Tabular GAN (CTGAN), CopulaGAN, and Tabula Variational Autoencoder (TVAE) for generating synthetic but realistic demographic and health data. This study employed the World Health Organisation stigy on global ageing and adult health survey (SAGE) Wave 1 South African data (n = 4227). Information missing from SAGE Wave 1, including demographic (e.g., race, age) and health (e.g., hypertension, blood pressure) indicators, were imputed using Generative Adversarial Imputation Nets (GAIN). CopulaGAN, CTGAN, and TVAE, sourced from the sdv 1.24.1 python library, generated 104,227 synthetic records based on the SAGE data constituents. The outcomes were accessed with similarity and machine learning (XGBoost) augmentation metrics (sourced from the sdmetrics 0.21.0 python library), including column shapes and overall and precision ratio scores. Generally, the GAIN imputations resulted in data with properties that were comparable to original and with no missing information. CTGAN’s (89.20%) overall quality of performance was above that of TVAE (86.50%) and CopulaGAN (88.45%). These findings underscore the usefulness of generative data modelling in addressing data quality challenges in diverse populations to enhance actionable health research and policy implementation.

Suggested Citation

  • Simmons, Sally Sonia & Hagan Jr, John Elvis & Schack, Thomas, 2025. "Generative data modelling for diverse populations in Africa: insights from South Africa," LSE Research Online Documents on Economics 129032, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:129032
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/129032/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sally Sonia Simmons & John Elvis Hagan Jr. & Thomas Schack, 2021. "The Influence of Anthropometric Indices and Intermediary Determinants of Hypertension in Bangladesh," IJERPH, MDPI, vol. 18(11), pages 1-12, May.
    2. Otsu, Taisuke & Taniguchi, Go, 2020. "Kolmogorov–Smirnov type test for generated variables," Economics Letters, Elsevier, vol. 195(C).
    3. Simmons, Sally Sonia, 2023. "Strikes and gutters: biomarkers and anthropometric measures for predicting diagnosed diabetes mellitus in adults in low- and middle-income countries," LSE Research Online Documents on Economics 120395, London School of Economics and Political Science, LSE Library.
    4. Otsu, Taisuke & Taniguchi, Go, 2020. "Kolmogorov-Smirnov type test for generated variables," LSE Research Online Documents on Economics 105571, London School of Economics and Political Science, LSE Library.
    5. David Mhlanga & Rufaro Garidzirai, 2020. "The Influence of Racial Differences in the Demand for Healthcare in South Africa: A Case of Public Healthcare," IJERPH, MDPI, vol. 17(14), pages 1-10, July.
    6. Noah Hollmann & Samuel Müller & Lennart Purucker & Arjun Krishnakumar & Max Körfer & Shi Bin Hoo & Robin Tibor Schirrmeister & Frank Hutter, 2025. "Accurate predictions on small data with a tabular foundation model," Nature, Nature, vol. 637(8045), pages 319-326, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carmen Aina & Irene Brunetti & Chiara Mussida & Sergio Scicchitano, 2023. "Distributional effects of COVID-19," Eurasian Business Review, Springer;Eurasia Business and Economics Society, vol. 13(1), pages 221-256, March.
    2. David Mhlanga & Adewale Samuel Hassan, 2022. "Financial Participation Among Smallholder - Farmers in Zimbabwe: What Are the Driving Factors?," Academic Journal of Interdisciplinary Studies, Richtmann Publishing Ltd, vol. 11, July.
    3. Tianyu Li & Yizheng Zhao & Xiang Kong, 2022. "Spatio-Temporal Characteristics and Influencing Factors of Basic Public Service Levels in the Yangtze River Delta Region, China," Land, MDPI, vol. 11(9), pages 1-23, September.
    4. Dai, Shangze & Fan, Fei & Zhang, Keke, 2022. "Creative Destruction and Stock Price Informativeness in Emerging Economies," MPRA Paper 113661, University Library of Munich, Germany.
    5. Simmons, Sally Sonia, 2023. "Strikes and gutters: biomarkers and anthropometric measures for predicting diagnosed diabetes mellitus in adults in low- and middle-income countries," LSE Research Online Documents on Economics 120395, London School of Economics and Political Science, LSE Library.
    6. Abdulaziz Hamid & Aprill Z. Dawson & Yilin Xu & Leonard E. Egede, 2024. "Independent Correlates of Glycemic Control among Adults with Diabetes in South Africa," IJERPH, MDPI, vol. 21(4), pages 1-12, April.
    7. Jill Fortuin & Innocent Karangwa & Nongcebo Mahlalela & Cleeve Robertson, 2022. "A South African Epidemiological Study of Fatal Drownings: 2016–2021," IJERPH, MDPI, vol. 19(22), pages 1-10, November.
    8. Monica Ewomazino Akokuwebe & Erhabor Sunday Idemudia, 2022. "A Comparative Cross-Sectional Study of the Prevalence and Determinants of Health Insurance Coverage in Nigeria and South Africa: A Multi-Country Analysis of Demographic Health Surveys," IJERPH, MDPI, vol. 19(3), pages 1-26, February.
    9. Liu, Runxi & Huang, Runyao & Shen, Ziheng & Wang, Hongtao & Xu, Jin, 2021. "Optimizing the recovery pathway of a net-zero energy wastewater treatment model by balancing energy recovery and eco-efficiency," Applied Energy, Elsevier, vol. 298(C).
    10. David Mhlanga, 2022. "The Role of Artificial Intelligence and Machine Learning Amid the COVID-19 Pandemic: What Lessons Are We Learning on 4IR and the Sustainable Development Goals," IJERPH, MDPI, vol. 19(3), pages 1-22, February.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques
    • I10 - Health, Education, and Welfare - - Health - - - General
    • J11 - Labor and Demographic Economics - - Demographic Economics - - - Demographic Trends, Macroeconomic Effects, and Forecasts

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:129032. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.