IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0314995.html
   My bibliography  Save this article

Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan

Author

Listed:
  • Chuan-Sheng Hung
  • Chun-Hung Richard Lin
  • Jain-Shing Liu
  • Shi-Huang Chen
  • Tsung-Chi Hung
  • Chih-Min Tsai

Abstract

Kawasaki Disease (KD) is a rare febrile illness affecting infants and young children, potentially leading to coronary artery complications and, in severe cases, mortality if untreated. However, KD is frequently misdiagnosed as a common fever in clinical settings, and the inherent data imbalance further complicates accurate prediction when using traditional machine learning and statistical methods. This paper introduces two advanced approaches to address these challenges, enhancing prediction accuracy and generalizability. The first approach proposes a stacking model termed the Disease Classifier (DC), specifically designed to recognize minority class samples within imbalanced datasets, thereby mitigating the bias commonly observed in traditional models toward the majority class. Secondly, we introduce a combined model, the Disease Classifier with CTGAN (CTGAN-DC), which integrates DC with Conditional Tabular Generative Adversarial Network (CTGAN) technology to improve data balance and predictive performance further. Utilizing CTGAN-based oversampling techniques, this model retains the original data characteristics of KD while expanding data diversity. This effectively balances positive and negative KD samples, significantly reducing model bias toward the majority class and enhancing both predictive accuracy and generalizability. Experimental evaluations indicate substantial performance gains, with the DC and CTGAN-DC models achieving notably higher predictive accuracy than individual machine learning models. Specifically, the DC model achieves sensitivity and specificity rates of 95%, while the CTGAN-DC model achieves 95% sensitivity and 97% specificity, demonstrating superior recognition capability. Furthermore, both models exhibit strong generalizability across diverse KD datasets, particularly the CTGAN-DC model, which surpasses the JAMA model with a 3% increase in sensitivity and a 95% improvement in generalization sensitivity and specificity, effectively resolving the model collapse issue observed in the JAMA model. In sum, the proposed DC and CTGAN-DC architectures demonstrate robust generalizability across multiple KD datasets from various healthcare institutions and significantly outperform other models, including XGBoost. These findings lay a solid foundation for advancing disease prediction in the context of imbalanced medical data.

Suggested Citation

  • Chuan-Sheng Hung & Chun-Hung Richard Lin & Jain-Shing Liu & Shi-Huang Chen & Tsung-Chi Hung & Chih-Min Tsai, 2024. "Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-29, December.
  • Handle: RePEc:plo:pone00:0314995
    DOI: 10.1371/journal.pone.0314995
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314995
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0314995&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0314995?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. repec:plo:pone00:0244721 is not listed on IDEAS
    2. Tengyang Wang & Guanghua Liu & Hongye Lin, 2020. "A machine learning approach to predict intravenous immunoglobulin resistance in Kawasaki disease patients: A study based on a Southeast China population," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-15, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0314995. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.