Author
Listed:
- Aldren Gonzales
- Guruprabha Guruswamy
- Scott R Smith
Abstract
Data are central to research, public health, and in developing health information technology (IT) systems. Nevertheless, access to most data in health care is tightly controlled, which may limit innovation, development, and efficient implementation of new research, products, services, or systems. Using synthetic data is one of the many innovative ways that can allow organizations to share datasets with broader users. However, only a limited set of literature is available that explores its potentials and applications in health care. In this review paper, we examined existing literature to bridge the gap and highlight the utility of synthetic data in health care. We searched PubMed, Scopus, and Google Scholar to identify peer-reviewed articles, conference papers, reports, and thesis/dissertations articles related to the generation and use of synthetic datasets in health care. The review identified seven use cases of synthetic data in health care: a) simulation and prediction research, b) hypothesis, methods, and algorithm testing, c) epidemiology/public health research, d) health IT development, e) education and training, f) public release of datasets, and g) linking data. The review also identified readily and publicly accessible health care datasets, databases, and sandboxes containing synthetic data with varying degrees of utility for research, education, and software development. The review provided evidence that synthetic data are helpful in different aspects of health care and research. While the original real data remains the preferred choice, synthetic data hold possibilities in bridging data access gaps in research and evidence-based policymaking.Author summary: Synthetic data or data that are artificially generated is gaining more attention in the recent years because of its potential in making timely health care data more accessible for analysis and technology development. In this paper, we explored how synthetic data are being used by reviewing published literature and by looking at known synthetic datasets that are available to the public. Based on the available literature, it was identified that synthetic data address three challenges in making health care data accessible: it protects the privacy of individuals in datasets, it allows increased and faster access of researchers to health care research data, and it addresses the lack of realistic data for software development and testing. Users should also be aware of its limitations that may include recognized risk for data leakage, dependency on imputation model, and not all synthetic data replicate precisely the content and properties of the original dataset. By explaining the utility and value of synthetic data, we hope that this review helps to improve understanding of synthetic data for different applications in research and software development.
Suggested Citation
Aldren Gonzales & Guruprabha Guruswamy & Scott R Smith, 2023.
"Synthetic data in health care: A narrative review,"
PLOS Digital Health, Public Library of Science, vol. 2(1), pages 1-16, January.
Handle:
RePEc:plo:pdig00:0000082
DOI: 10.1371/journal.pdig.0000082
Download full text from publisher
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.
- Mielczarek, Bożena & Zabawa, Jacek, 2021.
"Modelling demographic changes using simulation: Supportive analyses for socioeconomic studies,"
Socio-Economic Planning Sciences, Elsevier, vol. 74(C).
- Oliver Mannion & Roy Lay-Yee & Wendy Wrapson & Peter Davis & Janet Pearson, 2012.
"JAMSIM: a Microsimulation Modelling Policy Tool,"
Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 15(1), pages 1-8.
- Lay-Yee, Roy & Milne, Barry & Davis, Peter & Pearson, Janet & McLay, Jessica, 2015.
"Determinants and disparities: A simulation approach to the case of child health care,"
Social Science & Medicine, Elsevier, vol. 128(C), pages 202-211.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000082. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.