IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v19y2022i6p3211-d767137.html
   My bibliography  Save this article

Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis

Author

Listed:
  • Abdur Rasool

    (University of Chinese Academy of Sciences, Beijing 101408, China
    Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
    These authors contributed equally to this work.)

  • Chayut Bunterngchit

    (University of Chinese Academy of Sciences, Beijing 101408, China
    State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    These authors contributed equally to this work.)

  • Luo Tiejian

    (University of Chinese Academy of Sciences, Beijing 101408, China)

  • Md. Ruhul Islam

    (Department of Electrical Engineering and Computer Science, University of Stavanger, 4044 Stavanger, Norway)

  • Qiang Qu

    (Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China)

  • Qingshan Jiang

    (Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China)

Abstract

Breast cancer death rates are higher than any other cancer in American women. Machine learning-based predictive models promise earlier detection techniques for breast cancer diagnosis. However, making an evaluation for models that efficiently diagnose cancer is still challenging. In this work, we proposed data exploratory techniques (DET) and developed four different predictive models to improve breast cancer diagnostic accuracy. Prior to models, four-layered essential DET, e.g., feature distribution, correlation, elimination, and hyperparameter optimization, were deep-dived to identify the robust feature classification into malignant and benign classes. These proposed techniques and classifiers were implemented on the Wisconsin Diagnostic Breast Cancer (WDBC) and Breast Cancer Coimbra Dataset (BCCD) datasets. Standard performance metrics, including confusion matrices and K-fold cross-validation techniques, were applied to assess each classifier’s efficiency and training time. The models’ diagnostic capability improved with our DET, i.e., polynomial SVM gained 99.3%, LR with 98.06%, KNN acquired 97.35%, and EC achieved 97.61% accuracy with the WDBC dataset. We also compared our significant results with previous studies in terms of accuracy. The implementation procedure and findings can guide physicians to adopt an effective model for a practical understanding and prognosis of breast cancer tumors.

Suggested Citation

  • Abdur Rasool & Chayut Bunterngchit & Luo Tiejian & Md. Ruhul Islam & Qiang Qu & Qingshan Jiang, 2022. "Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis," IJERPH, MDPI, vol. 19(6), pages 1-19, March.
  • Handle: RePEc:gam:jijerp:v:19:y:2022:i:6:p:3211-:d:767137
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/19/6/3211/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/19/6/3211/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Diva Cristina Morett Romano Leão & Eliane Ramos Pereira & María Nieves Pérez-Marfil & Rose Mary Costa Rosa Andrade Silva & Angelo Braga Mendonça & Renata Carla Nencetti Pereira Rocha & María Paz Garcí, 2021. "The Importance of Spirituality for Women Facing Breast Cancer Diagnosis: A Qualitative Study," IJERPH, MDPI, vol. 18(12), pages 1-11, June.
    2. Wang, Haifeng & Zheng, Bichen & Yoon, Sang Won & Ko, Hoo Sang, 2018. "A support vector machine-based ensemble algorithm for breast cancer diagnosis," European Journal of Operational Research, Elsevier, vol. 267(2), pages 687-699.
    3. Kwang Ho Park & Erdenebileg Batbaatar & Yongjun Piao & Nipon Theera-Umpon & Keun Ho Ryu, 2021. "Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification," IJERPH, MDPI, vol. 18(4), pages 1-24, February.
    4. Eun Young Park & Myungsun Yi & Hye Sook Kim & Haejin Kim, 2021. "A Decision Tree Model for Breast Reconstruction of Women with Breast Cancer: A Mixed Method Approach," IJERPH, MDPI, vol. 18(7), pages 1-13, March.
    5. Giulia Bicchierai & Federica Di Naro & Diego De Benedetto & Diletta Cozzi & Silvia Pradella & Vittorio Miele & Jacopo Nori, 2021. "A Review of Breast Imaging for Timely Diagnosis of Disease," IJERPH, MDPI, vol. 18(11), pages 1-16, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tim Hulsen, 2022. "Data Science in Healthcare: COVID-19 and Beyond," IJERPH, MDPI, vol. 19(6), pages 1-4, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joanna Błajda & Edyta Barnaś & Anna Kucab, 2022. "Application of Personalized Education in the Mobile Medical App for Breast Self-Examination," IJERPH, MDPI, vol. 19(8), pages 1-21, April.
    2. Astorino, Annabella & Avolio, Matteo & Fuduli, Antonio, 2022. "A maximum-margin multisphere approach for binary Multiple Instance Learning," European Journal of Operational Research, Elsevier, vol. 299(2), pages 642-652.
    3. Meshwa Rameshbhai Savalia & Jaiprakash Vinodkumar Verma, 2023. "Classifying Malignant and Benign Tumors of Breast Cancer: A Comparative Investigation Using Machine Learning Techniques," International Journal of Reliable and Quality E-Healthcare (IJRQEH), IGI Global, vol. 12(1), pages 1-19, January.
    4. Baldomero-Naranjo, Marta & Martínez-Merino, Luisa I. & Rodríguez-Chía, Antonio M., 2020. "Tightening big Ms in integer programming formulations for support vector machines with ramp loss," European Journal of Operational Research, Elsevier, vol. 286(1), pages 84-100.
    5. Onur Demiray & Evrim D. Gunes & Ercan Kulak & Emrah Dogan & Seyma Gorcin Karaketir & Serap Cifcili & Mehmet Akman & Sibel Sakarya, 2023. "Classification of patients with chronic disease by activation level using machine learning methods," Health Care Management Science, Springer, vol. 26(4), pages 626-650, December.
    6. Blanquero, R. & Carrizosa, E. & Jiménez-Cordero, A. & Martín-Barragán, B., 2019. "Functional-bandwidth kernel for Support Vector Machine with Functional Data: An alternating optimization algorithm," European Journal of Operational Research, Elsevier, vol. 275(1), pages 195-207.
    7. P. K. Viswanathan & Sandeep Srivathsan & Wayne L. Winston, 2022. "Multiclass Discriminant Analysis using Ensemble Technique: Case Illustration from the Banking Industry," Journal of Emerging Market Finance, Institute for Financial Management and Research, vol. 21(1), pages 92-115, March.
    8. Golmohammadi, Davood & Zhao, Lingyu & Dreyfus, David, 2023. "Using machine learning techniques to reduce uncertainty for outpatient appointment scheduling practices in outpatient clinics," Omega, Elsevier, vol. 120(C).
    9. Liang, Xijun & Zhang, Zhipeng & Song, Yunquan & Jian, Ling, 2022. "Kernel-based online regression with canal loss," European Journal of Operational Research, Elsevier, vol. 297(1), pages 268-279.
    10. Kamyab Karimi & Ali Ghodratnama & Reza Tavakkoli-Moghaddam, 2023. "Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis," Annals of Operations Research, Springer, vol. 328(1), pages 665-700, September.
    11. Sarah N. Alyami & Sunday O. Olatunji, 2020. "Application of Support Vector Machine for Arabic Sentiment Classification Using Twitter-Based Dataset," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 1-13, April.
    12. Che Xu & Wenjun Chang & Weiyong Liu, 2023. "Data-driven decision model based on local two-stage weighted ensemble learning," Annals of Operations Research, Springer, vol. 325(2), pages 995-1028, June.
    13. Li, Yanying & Che, Jinxing & Yang, Youlong, 2018. "Subsampled support vector regression ensemble for short term electric load forecasting," Energy, Elsevier, vol. 164(C), pages 160-170.
    14. Qifa Xu & Zezhou Wang & Cuixia Jiang & Yezheng Liu, 2023. "Deep learning on mixed frequency data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(8), pages 2099-2120, December.
    15. Chen, Weiyi & Zhang, Limao, 2022. "An automated machine learning approach for earthquake casualty rate and economic loss prediction," Reliability Engineering and System Safety, Elsevier, vol. 225(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:19:y:2022:i:6:p:3211-:d:767137. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.