IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000082.html
   My bibliography  Save this article

Synthetic data in health care: A narrative review

Author

Listed:
  • Aldren Gonzales
  • Guruprabha Guruswamy
  • Scott R Smith

Abstract

Data are central to research, public health, and in developing health information technology (IT) systems. Nevertheless, access to most data in health care is tightly controlled, which may limit innovation, development, and efficient implementation of new research, products, services, or systems. Using synthetic data is one of the many innovative ways that can allow organizations to share datasets with broader users. However, only a limited set of literature is available that explores its potentials and applications in health care. In this review paper, we examined existing literature to bridge the gap and highlight the utility of synthetic data in health care. We searched PubMed, Scopus, and Google Scholar to identify peer-reviewed articles, conference papers, reports, and thesis/dissertations articles related to the generation and use of synthetic datasets in health care. The review identified seven use cases of synthetic data in health care: a) simulation and prediction research, b) hypothesis, methods, and algorithm testing, c) epidemiology/public health research, d) health IT development, e) education and training, f) public release of datasets, and g) linking data. The review also identified readily and publicly accessible health care datasets, databases, and sandboxes containing synthetic data with varying degrees of utility for research, education, and software development. The review provided evidence that synthetic data are helpful in different aspects of health care and research. While the original real data remains the preferred choice, synthetic data hold possibilities in bridging data access gaps in research and evidence-based policymaking.Author summary: Synthetic data or data that are artificially generated is gaining more attention in the recent years because of its potential in making timely health care data more accessible for analysis and technology development. In this paper, we explored how synthetic data are being used by reviewing published literature and by looking at known synthetic datasets that are available to the public. Based on the available literature, it was identified that synthetic data address three challenges in making health care data accessible: it protects the privacy of individuals in datasets, it allows increased and faster access of researchers to health care research data, and it addresses the lack of realistic data for software development and testing. Users should also be aware of its limitations that may include recognized risk for data leakage, dependency on imputation model, and not all synthetic data replicate precisely the content and properties of the original dataset. By explaining the utility and value of synthetic data, we hope that this review helps to improve understanding of synthetic data for different applications in research and software development.

Suggested Citation

  • Aldren Gonzales & Guruprabha Guruswamy & Scott R Smith, 2023. "Synthetic data in health care: A narrative review," PLOS Digital Health, Public Library of Science, vol. 2(1), pages 1-16, January.
  • Handle: RePEc:plo:pdig00:0000082
    DOI: 10.1371/journal.pdig.0000082
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000082
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000082&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000082?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Davis, Peter & Lay-Yee, Roy & Pearson, Janet, 2010. "Using micro-simulation to create a synthesised data set and test policy options: The case of health service effects under demographic ageing," Health Policy, Elsevier, vol. 97(2-3), pages 267-274, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mielczarek, Bożena & Zabawa, Jacek, 2021. "Modelling demographic changes using simulation: Supportive analyses for socioeconomic studies," Socio-Economic Planning Sciences, Elsevier, vol. 74(C).
    2. Oliver Mannion & Roy Lay-Yee & Wendy Wrapson & Peter Davis & Janet Pearson, 2012. "JAMSIM: a Microsimulation Modelling Policy Tool," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 15(1), pages 1-8.
    3. Lay-Yee, Roy & Milne, Barry & Davis, Peter & Pearson, Janet & McLay, Jessica, 2015. "Determinants and disparities: A simulation approach to the case of child health care," Social Science & Medicine, Elsevier, vol. 128(C), pages 202-211.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000082. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.