Author
Listed:
- Liwen Mo
- Hua Lin
- Chengxuan Li
- Lifei Yu
- Decheng Lu
Abstract
Background: Risk of coronary heart disease (CHD) in a specific period of years can be assessed using scores calculated by models, such as pooled cohort equations (PCEs) and Framingham Risk Score. However, there are few studies on on-site estimation of CHD risk quantitatively with score calculation as auxiliary diagnosis. Nowadays, researchers introduce new technologies, such as machine learning, as effective CHD risk prediction models, but these models still need to be validated using real clinical data before promoting their use in real clinical settings. Objective: The aim of this study is to predict CHD risk for high-risk population only using clinical data consisting of vital traits, lab measurement, diagnosis, medical device testing and medications. The prediction model can serve as an on-site quantitative indicator for the CHD risk of potential patients before diagnosis using coronary arteriography. Methods: This work is designed as a retrospective study of a hospital-based cohort (The Second Affiliated Hospital of Guangxi Medical University), comprising 20,821 patients with CHD and 9,796 controls from 2017 to 2024. A two-layer machine learning model (TLML) is developed on the prediction results of the random forest and the gradient boosting decision tree to combine the merits of both models. The models were trained and validated with the clinical data in the cohort. Results: The TLML presented in this study can have a good accuracy (0.79, 95% CI 0.79–0.80), sensitivity (0.79, 95% CI 0.79–0.80) and specificity (0.79, 95% CI 0.79–0.79) for on-site CHD prediction. Compared with the PCEs (accuracy = 0.59, sensitivity = 0.58 and specificity = 0.60), the TLML shows remarkably better on-site CHD prediction performance. Predictor importance analysis results show that age, diabetes, antihypertensive medications, total bilirubin, hypertension, obstructive sleep apnea-hypopnea syndrome, red cell count, hemoglobin, cystatin C, retinol-binding protein, gender and low-density lipoprotein cholesterol level are the most important variables for on-site CHD prediction. All the features mentioned were reported to have relationship with CHD on some levels in previous studies. A reduced complexity model is also presented to provide decent CHD prediction with only 20 predictors to increase model practicality, achieving a prediction accuracy of 0.73. Conclusions: The machine learning models presented in this study have the potential to become auxiliary on-site diagnostics tool of CHD because of its capability for accurate prediction and easy availability of all the required prediction variables.
Suggested Citation
Liwen Mo & Hua Lin & Chengxuan Li & Lifei Yu & Decheng Lu, 2025.
"Development and validation of a machine learning model for on-site prediction of coronary heart disease in high-risk adults using clinical data,"
PLOS ONE, Public Library of Science, vol. 20(11), pages 1-14, November.
Handle:
RePEc:plo:pone00:0334881
DOI: 10.1371/journal.pone.0334881
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0334881. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.