Author
Abstract
With over a century of development, electrocardiogram (ECG) diagnostics has become the preferred tool for healthcare professionals in cardiovascular disease diagnosis and monitoring. As wearable devices and mobile monitoring technologies become widespread, ECG data are trending toward diversity and long-term collection, making traditional manual annotation methods inadequate for massive data analysis demands. This research addresses core challenges in ECG signal classification—extremely imbalanced data, significant individual physiological differences, and difficulties in long sequence fitting—by proposing a Principal Component Analysis-based Conditional Generative Adversarial Network (PCA-CGAN). Through in-depth analysis of ECG signal principal component distribution characteristics, we discovered that just a few principal components can explain over 90% of signal variance, revealing the inherent inefficiency and limitations of traditional complete waveform generation methods. Based on this theoretical foundation, we shift the data augmentation paradigm from generating surface waveforms to generating high information density principal component features, resolving waveform jitter and heterogeneity issues present in traditional methods. Simultaneously, we designed a two-stage conditional encoding-decoding architecture that builds category-independent feature spaces from early training stages, fundamentally breaking the feature space bias caused by the “Matthew effect” and effectively preventing majority classes from compressing minority class features during generation. Using the Transformer’s global attention mechanism, the model precisely captures key diagnostic features of various arrhythmias, maximizing inter-class differences while maintaining intra-class consistency. Experiments demonstrate that PCA-CGAN not only achieves stable convergence on a large-scale heterogeneous dataset comprising 43 patients for the first time but also resolves the “dilution effect” problem in data augmentation, avoiding the asymmetric phenomenon where Precision increases while Recall decreases. After data augmentation, the ResNet model’s average F1 score improved significantly, with particularly outstanding performance on rare categories such as atrial premature beats, far surpassing traditional methods like SigCWGAN and TD-GAN. This research redefines the objectives and methods of ECG signal generation from the theoretical perspectives of information entropy and feature manifolds, providing a systematic solution to data imbalance problems in the medical field while establishing a theoretical foundation for the application of ECG-assisted diagnostic systems in real clinical environments.
Suggested Citation
Chao Tang, 2025.
"Principal component conditional generative adversarial networks for imbalanced ECG classification enhancement,"
PLOS ONE, Public Library of Science, vol. 20(8), pages 1-38, August.
Handle:
RePEc:plo:pone00:0330707
DOI: 10.1371/journal.pone.0330707
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0330707. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.