Author
Listed:
- Alemu Kumilachew Tegegnie
- Kibrom Tewolde
Abstract
Cardiovascular diseases (CVDs) are leading causes of morbidity and mortality globally, with a growing burden in low- and middle-income countries such as Ethiopia. Early detection is limited by resource constraints, low screening uptake, and a lack of predictive tools tailored to local healthcare systems. This study presents an interpretable ensemble machine learning framework for predicting CVD risk via structured electronic medical record (EMR) data from public hospitals in Addis Ababa. We trained an XGBoost classifier on 20,960 anonymized records containing demographic, clinical, and physiological attributes. Preprocessing involves handling missing values, outlier capping, one-hot encoding, rare-category grouping, and dimensionality reduction. SHapley additive explanations (SHAPs) were used for feature attribution, and a large language model (Gemini) was used to translate SHAP outputs into plain-language narratives to enhance interpretability. The model achieved an accuracy of 0.99, with strong precision (0.99), recall (0.98), and F1-scores across both classes. SHAP analysis identified general_plan, history of present illness (HPI), musculoskeletal system (MSS) and diagnosis as key predictors. The integration of SHAP and LLMs provided transparent, clinician-friendly insights into model outputs, supporting adoption in resource-limited settings. This study demonstrates that combining ensemble learning with explainability techniques can yield highly accurate and interpretable CVD prediction models, offering potential for integration into clinical decision-support systems in Ethiopia.
Suggested Citation
Alemu Kumilachew Tegegnie & Kibrom Tewolde, 2026.
"Interpretable ensemble machine learning framework for cardiovascular disease prediction using EMR data and large language models in Ethiopia,"
PLOS ONE, Public Library of Science, vol. 21(2), pages 1-13, February.
Handle:
RePEc:plo:pone00:0342256
DOI: 10.1371/journal.pone.0342256
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0342256. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.