IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v8y2023i9p135-d1223090.html
   My bibliography  Save this article

Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP

Author

Listed:
  • Winston Wang

    (Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan)

  • Tun-Wen Pai

    (Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan)

Abstract

This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures.

Suggested Citation

  • Winston Wang & Tun-Wen Pai, 2023. "Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP," Data, MDPI, vol. 8(9), pages 1-20, August.
  • Handle: RePEc:gam:jdataj:v:8:y:2023:i:9:p:135-:d:1223090
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/8/9/135/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/8/9/135/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Feng Hu & Hang Li, 2013. "A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE," Mathematical Problems in Engineering, Hindawi, vol. 2013, pages 1-10, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kim, Jongwoo & Kim, Hongil & Geum, Youngjung, 2023. "How to succeed in the market? Predicting startup success using a machine learning approach," Technological Forecasting and Social Change, Elsevier, vol. 193(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:8:y:2023:i:9:p:135-:d:1223090. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.