Author
Listed:
- Hiroshi Maruyama
- Kotatsu Bito
- Yuki Saito
- Masanobu Hibi
- Shun Katada
- Aya Kawakami
- Kenta Oono
- Nontawat Charoenphakdee
- Zhengyan Gao
- Hideyoshi Igata
- Masashi Yoshikawa
- Yoshiaki Ota
- Hiroki Okui
- Kei Akita
- Shoichiro Yamaguchi
- Yohei Sugawara
- Shin-ichi Maeda
Abstract
Data for healthcare applications are typically customized for specific purposes but are often difficult to access due to high costs and privacy concerns. Rather than prepare separate datasets for individual applications, we propose a novel approach: building a general-purpose generative model applicable to virtually any type of healthcare application. This generative model encompasses a broad range of human attributes, including age, sex, anthropometric measurements, blood components, physical performance metrics, and numerous healthcare-related questionnaire responses. To achieve this goal, we integrated the results of multiple clinical studies into a unified training dataset and developed a generative model to replicate its characteristics. The model can estimate missing attribute values from known attribute values and generate synthetic datasets for various applications. Our analysis confirmed that the model captures key statistical properties of the training dataset, including univariate distributions and bivariate relationships. We demonstrate the model’s practical utility through multiple real-world applications, illustrating its potential impact on predictive, preventive, and personalized medicine.Author summary: Digital technologies are expected to revolutionize healthcare, yet digital healthcare has not reached its full potential. A major bottleneck is the poor data availability. Due to concerns regarding privacy and cost, healthcare data is very difficult to access. Here, our aim was to provide a general-purpose statistical model that can be used in place of actual data. Recent advancements in machine-learning technology, especially in generative models, make this challenging goal possible. We built a model that captures complex statistical interactions among more than 2000 human attributes and made it available as a software service on the Internet. The model can be used for estimating unknown attributes from known attributes and generating synthetic data. We believe that this model significantly lowers the barrier to entry into digital healthcare and will stimulate future innovations.
Suggested Citation
Hiroshi Maruyama & Kotatsu Bito & Yuki Saito & Masanobu Hibi & Shun Katada & Aya Kawakami & Kenta Oono & Nontawat Charoenphakdee & Zhengyan Gao & Hideyoshi Igata & Masashi Yoshikawa & Yoshiaki Ota & H, 2025.
"Creating a general-purpose generative model for healthcare data based on multiple clinical studies,"
PLOS Digital Health, Public Library of Science, vol. 4(11), pages 1-25, November.
Handle:
RePEc:plo:pdig00:0001059
DOI: 10.1371/journal.pdig.0001059
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0001059. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.