IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0300785.html
   My bibliography  Save this article

Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics

Author

Listed:
  • Md Abdus Sahid
  • Mozaddid Ul Hoque Babar
  • Md Palash Uddin

Abstract

Diabetes is a persistent metabolic disorder linked to elevated levels of blood glucose, commonly referred to as blood sugar. This condition can have detrimental effects on the heart, blood vessels, eyes, kidneys, and nerves as time passes. It is a chronic ailment that arises when the body fails to produce enough insulin or is unable to effectively use the insulin it produces. When diabetes is not properly managed, it often leads to hyperglycemia, a condition characterized by elevated blood sugar levels or impaired glucose tolerance. This can result in significant harm to various body systems, including the nerves and blood vessels. In this paper, we propose a multiclass diabetes mellitus detection and classification approach using an extremely imbalanced Laboratory of Medical City Hospital data dynamics. We also formulate a new dataset that is moderately imbalanced based on the Laboratory of Medical City Hospital data dynamics. To correctly identify the multiclass diabetes mellitus, we employ three machine learning classifiers namely support vector machine, logistic regression, and k-nearest neighbor. We also focus on dimensionality reduction (feature selection—filter, wrapper, and embedded method) to prune the unnecessary features and to scale up the classification performance. To optimize the classification performance of classifiers, we tune the model by hyperparameter optimization with 10-fold grid search cross-validation. In the case of the original extremely imbalanced dataset with 70:30 partition and support vector machine classifier, we achieved maximum accuracy of 0.964, precision of 0.968, recall of 0.964, F1-score of 0.962, Cohen kappa of 0.835, and AUC of 0.99 by using top 4 feature according to filter method. By using the top 9 features according to wrapper-based sequential feature selection, the k-nearest neighbor provides an accuracy of 0.935 and 1.0 for the other performance metrics. For our created moderately imbalanced dataset with an 80:20 partition, the SVM classifier achieves a maximum accuracy of 0.938, and 1.0 for other performance metrics. For the multiclass diabetes mellitus detection and classification, our experiments outperformed conducted research based on the Laboratory of Medical City Hospital data dynamics.

Suggested Citation

  • Md Abdus Sahid & Mozaddid Ul Hoque Babar & Md Palash Uddin, 2024. "Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-54, May.
  • Handle: RePEc:plo:pone00:0300785
    DOI: 10.1371/journal.pone.0300785
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0300785
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300785&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0300785?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Muhammad Mazhar Bukhari & Bader Fahad Alkhamees & Saddam Hussain & Abdu Gumaei & Adel Assiri & Syed Sajid Ullah & Michela Gelfusa, 2021. "An Improved Artificial Neural Network Model for Effective Diabetes Prediction," Complexity, Hindawi, vol. 2021, pages 1-10, April.
    2. Gunjeet Kaur & P V M Lakshmi & Ashu Rastogi & Anil Bhansali & Sanjay Jain & Yot Teerawattananon & Henna Bano & Shankar Prinja, 2020. "Diagnostic accuracy of tests for type 2 diabetes and prediabetes: A systematic review and meta-analysis," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-19, November.
    3. Manal Alghamdi & Mouaz Al-Mallah & Steven Keteyian & Clinton Brawner & Jonathan Ehrman & Sherif Sakr, 2017. "Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-15, July.
    4. Mehrbakhsh Nilashi & Othman Ibrahim & Mohammad Dalvi & Hossein Ahmadi & Leila Shahmoradi, 2017. "Accuracy Improvement for Diabetes Disease Classification: A Case on a Public Medical Dataset," Fuzzy Information and Engineering, Taylor & Francis Journals, vol. 9(3), pages 345-357, September.
    5. Michael Buckland & Fredric Gey, 1994. "The relationship between Recall and Precision," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 45(1), pages 12-19, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ram D. Joshi & Chandra K. Dhakal, 2021. "Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches," IJERPH, MDPI, vol. 18(14), pages 1-17, July.
    2. Ramsés Noguez Imm & Julio Muñoz-Benitez & Diego Medina & Everardo Barcenas & Guillermo Molero-Castillo & Pamela Reyes-Ortega & Jorge Armando Hughes-Cano & Leticia Medrano-Gracia & Manuel Miranda-Anaya, 2023. "Preventable risk factors for type 2 diabetes can be detected using noninvasive spontaneous electroretinogram signals," PLOS ONE, Public Library of Science, vol. 18(1), pages 1-26, January.
    3. Zhang, Fan & Bales, Chris & Fleyeh, Hasan, 2021. "Night setback identification of district heat substations using bidirectional long short term memory with attention mechanism," Energy, Elsevier, vol. 224(C).
    4. Wei-Ming Luo & Jing-Yang Su & Tong Xu & Zhong-Ze Fang, 2023. "Prevalence of Diabetic Retinopathy and Use of Common Oral Hypoglycemic Agents Increase the Risk of Diabetic Nephropathy—A Cross-Sectional Study in Patients with Type 2 Diabetes," IJERPH, MDPI, vol. 20(5), pages 1-13, March.
    5. Massaro, Alessandro & Magaletti, Nicola & Cosoli, Gabriele & Giardinelli, Vito O. M. & Leogrande, Angelo, 2022. "The Prediction of Diabetes," MPRA Paper 113372, University Library of Munich, Germany.
    6. repec:plo:pone00:0195344 is not listed on IDEAS
    7. Matthew J. Jacobson, 2022. "Archaeological Evidence for Community Resilience and Sustainability: A Bibliometric and Quantitative Review," Sustainability, MDPI, vol. 14(24), pages 1-24, December.
    8. Ying-Jen Chang & Kuo-Chuan Hung & Li-Kai Wang & Chia-Hung Yu & Chao-Kun Chen & Hung-Tze Tay & Jhi-Joung Wang & Chung-Feng Liu, 2021. "A Real-Time Artificial Intelligence-Assisted System to Predict Weaning from Ventilator Immediately after Lung Resection Surgery," IJERPH, MDPI, vol. 18(5), pages 1-14, March.
    9. Yueyong Wang & Xuebing Gao & Yu Sun & Yuanyuan Liu & Libin Wang & Mengqi Liu, 2024. "Sh-DeepLabv3+: An Improved Semantic Segmentation Lightweight Network for Corn Straw Cover Form Plot Classification," Agriculture, MDPI, vol. 14(4), pages 1-19, April.
    10. Sanjay K. Arora & Alan L. Porter & Jan Youtie & Philip Shapira, 2013. "Capturing new developments in an emerging technology: an updated search strategy for identifying nanotechnology research outputs," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(1), pages 351-370, April.
    11. Yi-Ching Lynn Ho & Vivian Shu Yi Lee & Moon-Ho Ringo Ho & Gladis Jing Lin & Julian Thumboo, 2021. "Towards a Parsimonious Pathway Model of Modifiable and Mediating Risk Factors Leading to Diabetes Risk," IJERPH, MDPI, vol. 18(20), pages 1-20, October.
    12. Zhang, Yan & Zhu, Degang & Wang, Menglin & Li, Junhan & Zhang, Jie, 2024. "A comparative study of cyber security intrusion detection in healthcare systems," International Journal of Critical Infrastructure Protection, Elsevier, vol. 44(C).
    13. Zeqing Yang & Mingxuan Zhang & Yingshu Chen & Ning Hu & Lingxiao Gao & Libing Liu & Enxu Ping & Jung Il Song, 2024. "Surface defect detection method for air rudder based on positive samples," Journal of Intelligent Manufacturing, Springer, vol. 35(1), pages 95-113, January.
    14. Rajesh Ranjan & Jitender Kumar Chhabra, 2025. "An Effective Crow Search Algorithm and Its Application in Data Clustering," Journal of Classification, Springer;The Classification Society, vol. 42(1), pages 134-162, March.
    15. Song Yingze & Song Yingxu & Zhang Xin & Zhou Jie & Yang Degang, 2024. "Comparative analysis of the TabNet algorithm and traditional machine learning algorithms for landslide susceptibility assessment in the Wanzhou Region of China," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 120(8), pages 7627-7652, June.
    16. Nirajan Budhathoki & Ramesh Bhandari & Suraj Bashyal & Carl Lee, 2023. "Predicting asthma using imbalanced data modeling techniques: Evidence from 2019 Michigan BRFSS data," PLOS ONE, Public Library of Science, vol. 18(12), pages 1-17, December.
    17. Fei Zhu & Quan Liu & Yuchen Fu & Bairong Shen, 2014. "Segmentation of Neuronal Structures Using SARSA (λ)-Based Boundary Amendment with Reinforced Gradient-Descent Curve Shape Fitting," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-19, March.
    18. Martin Wieland & Juan Gorraiz, 2020. "The rivalry between Bernini and Borromini from a scientometric perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1643-1663, November.
    19. Matteo Manca & Ludovico Boratto & Salvatore Carta, 2018. "Behavioral data mining to produce novel and serendipitous friend recommendations in a social bookmarking system," Information Systems Frontiers, Springer, vol. 20(4), pages 825-839, August.
    20. Ngoc Uyen Phuong Nguyen & Martin G. Moehrle, 2019. "Technological Drivers of Urban Innovation: A T-DNA Analysis Based on US Patent Data," Sustainability, MDPI, vol. 11(24), pages 1-26, December.
    21. Sharan Srinivas, 2020. "A Machine Learning-Based Approach for Predicting Patient Punctuality in Ambulatory Care Centers," IJERPH, MDPI, vol. 17(10), pages 1-15, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0300785. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.