IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v17y2020i6p1828-d331462.html
   My bibliography  Save this article

Stroke Prediction with Machine Learning Methods among Older Chinese

Author

Listed:
  • Yafei Wu

    (The State Key Laboratory of Molecular Vaccine and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen 361102, China
    Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen 361102, China
    National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361102, China)

  • Ya Fang

    (The State Key Laboratory of Molecular Vaccine and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen 361102, China
    Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen 361102, China
    National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361102, China)

Abstract

Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3.25%). The three machine learning methods performed poorly in the imbalanced data set with extremely low sensitivity (approximately 0.00) and AUC (approximately 0.50). After using data balancing techniques, the sensitivity and AUC considerably improved with moderate accuracy and specificity, and the maximum values for sensitivity and AUC reached 0.78 (95% CI, 0.73–0.83) for RF and 0.72 (95% CI, 0.71–0.73) for RLR. Using AUCs for RLR, SVM, and RF in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods ( p < 0.05) in the balanced data sets. Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the ROS-balanced data set were superior to RLR in terms of AUC. Sex, hypertension, and uric acid were common predictors in all three machine learning methods. Blood glucose level was included in both RLR and RF. Drinking, age and high-sensitivity C-reactive protein level, and low-density lipoprotein cholesterol level were also included in RLR, SVM, and RF, respectively. Our study suggests that machine learning methods with data balancing techniques are effective tools for stroke prediction with imbalanced data.

Suggested Citation

  • Yafei Wu & Ya Fang, 2020. "Stroke Prediction with Machine Learning Methods among Older Chinese," IJERPH, MDPI, vol. 17(6), pages 1-11, March.
  • Handle: RePEc:gam:jijerp:v:17:y:2020:i:6:p:1828-:d:331462
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/17/6/1828/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/17/6/1828/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yanhong Luo & Zhi Li & Husheng Guo & Hongyan Cao & Chunying Song & Xingping Guo & Yanbo Zhang, 2017. "Predicting congenital heart defects: A comparison of three data mining methods," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-14, May.
    2. Wenfa Li & Hongzhe Liu & Peng Yang & Wei Xie, 2016. "Supporting Regularized Logistic Regression Privately and Efficiently," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-19, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oleg E. Karpov & Elena N. Pitsik & Semen A. Kurkin & Vladimir A. Maksimenko & Alexander V. Gusev & Natali N. Shusharina & Alexander E. Hramov, 2023. "Analysis of Publication Activity and Research Trends in the Field of AI Medical Applications: Network Approach," IJERPH, MDPI, vol. 20(7), pages 1-17, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:17:y:2020:i:6:p:1828-:d:331462. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.