IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/129032.html
   My bibliography  Save this paper

Generative data modelling for diverse populations in Africa: insights from South Africa

Author

Listed:
  • Simmons, Sally Sonia
  • Hagan Jr, John Elvis
  • Schack, Thomas

Abstract

Studies on the demography and health of racially diverse African populations are scarce, particularly due to lingering data challenges. Generative data modelling has emerged as a valuable solution to this burden. The study, therefore, examined the efficacy of Conditional Tabular GAN (CTGAN), CopulaGAN, and Tabula Variational Autoencoder (TVAE) for generating synthetic but realistic demographic and health data. This study employed the World Health Organisation stigy on global ageing and adult health survey (SAGE) Wave 1 South African data (n = 4227). Information missing from SAGE Wave 1, including demographic (e.g., race, age) and health (e.g., hypertension, blood pressure) indicators, were imputed using Generative Adversarial Imputation Nets (GAIN). CopulaGAN, CTGAN, and TVAE, sourced from the sdv 1.24.1 python library, generated 104,227 synthetic records based on the SAGE data constituents. The outcomes were accessed with similarity and machine learning (XGBoost) augmentation metrics (sourced from the sdmetrics 0.21.0 python library), including column shapes and overall and precision ratio scores. Generally, the GAIN imputations resulted in data with properties that were comparable to original and with no missing information. CTGAN’s (89.20%) overall quality of performance was above that of TVAE (86.50%) and CopulaGAN (88.45%). These findings underscore the usefulness of generative data modelling in addressing data quality challenges in diverse populations to enhance actionable health research and policy implementation.

Suggested Citation

  • Simmons, Sally Sonia & Hagan Jr, John Elvis & Schack, Thomas, 2025. "Generative data modelling for diverse populations in Africa: insights from South Africa," LSE Research Online Documents on Economics 129032, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:129032
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/129032/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • J1 - Labor and Demographic Economics - - Demographic Economics

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:129032. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.