Author
Listed:
- Keenan Saleh
- Raaif Hadadi
- Yixiu Liang
- Hong Wong
- Arunashis Sau
- James Howard
- Evan Brittain
- Jeffrey Annis
- Majd El-Harasis
- Matthew Shun-Shin
- Jagdeep Mohal
- Akriti Naraen
- Jack Samways
- Jessica Artico
- James Ware
- Prapa Kanagaratnam
- Fu Siong Ng
- Massoud Zolgharni
- Wenjia Bai
- Amanda Varnava
- Zachary Whinnett
- Ahran Arnold
Abstract
Deep neural networks can classify ECGs with high accuracy when training data is abundant. Rare conditions like Brugada syndrome, an inherited arrhythmia syndrome predisposing to sudden death, pose challenges due to data scarcity hindering model training. We evaluated multiple machine learning (ML) approaches to optimise a Brugada ECG classification model using limited training data. The baseline model was trained on a dataset comprising 176 Brugada, 176 right bundle branch block (RBBB) and 352 normal ECGs from Zhongshan Hospital (Zhongshan-baseline dataset), framed as a binary classification task to distinguish Brugada from non-Brugada ECGs. A 25%-75% train-test split was used to exacerbate data scarcity. To enhance training, we incorporated three additional datasets: (i) a different, labelled ECG dataset from Zhongshan Hospital including normal and RBBB ECGs (Zhongshan-pretrain), (ii) an unlabelled ECG dataset from Hammersmith Hospital including Brugada and non-Brugada ECGs (Imperial), (iii) an open-access labelled ECG dataset (PTB-XL). Three strategies were tested: (1) supervised pretraining, (2) self-supervised pretraining with data augmentation, and (3) oversampling using SMOTE (synthetic minority oversampling technique). Each model was evaluated on the unseen internal test set and an external Brugada mimic dataset. The models were re-trained using an 80%-20% train-test split as a secondary analysis. The baseline model achieved 92.2% accuracy, F1-score 0.837, and area under the Receiver Operating Characteristic curve (AUC) 0.962. Supervised pretraining significantly improved performance when training data was scarce, with the best model pretrained on the Zhongshan-pretrain dataset boosting accuracy (+3.2%), F1-score (+0.071) and AUC + 0.019), with consistent cross-validation performance. Self-supervised pretraining produced smaller and more variable gains, although select models better mitigated against false positives on the Brugada mimic dataset. SMOTE oversampling showed inconsistent effects on performance. Incorporating pretraining and oversampling may facilitate the development of more accurate AI-ECG models for rare diseases when training data is limited but provides diminishing returns when adequate labelled data is available.Author summary: AI applied to ECG interpretation (AI-ECG) is an emerging tool in the field of cardiac diagnostics that can rapidly automate ECG analysis, improving clinical resource utilisation and efficiency. However, rare conditions have limited available data to train AI-ECG models, hindering their performance. In this study, we developed a baseline AI-ECG model for Brugada syndrome using a severely restricted dataset and investigated three strategies to address data scarcity: supervised pretraining, self-supervised pretraining and oversampling. Pretraining involves training the model on a broader dataset before refining it for Brugada classification. This can be supervised, where ECG diagnoses or labels are known to the model, or self-supervised, where the model must learn patterns autonomously without labelled ECG data. Oversampling generates synthetic ECGs to supplement model training. Our results indicate that configurations of each approach provided incremental improvements in model performance and could be applied to the development of AI-ECG models for other rare cardiac diseases. Notably, we highlight the strongest improvements are achieved using supervised pretraining and the potential value of self-supervised pretraining using unlabelled datasets, reducing reliance on resource-intensive manual labelling. Together, these findings show how data-efficient training strategies can support the development of AI-ECG models for rare cardiac diseases and help ensure that advances in AI-driven healthcare do not exacerbate existing health inequalities.
Suggested Citation
Keenan Saleh & Raaif Hadadi & Yixiu Liang & Hong Wong & Arunashis Sau & James Howard & Evan Brittain & Jeffrey Annis & Majd El-Harasis & Matthew Shun-Shin & Jagdeep Mohal & Akriti Naraen & Jack Samway, 2026.
"AI-ECG classification for Brugada syndrome: A study of machine learning techniques to optimise for limited datasets,"
PLOS Digital Health, Public Library of Science, vol. 5(2), pages 1-20, February.
Handle:
RePEc:plo:pdig00:0001222
DOI: 10.1371/journal.pdig.0001222
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0001222. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.