IDEAS home Printed from https://ideas.repec.org/a/tec/eximia/v4y2022i1p19-32.html
   My bibliography  Save this article

Improving the Accuracy of Misclassified Breast Cancer Data using Machine Learning

Author

Listed:
  • Rong-Ho Lin

    (National Taipei University of Technology Department of Industrial Engineering and Management. 1, Sec. 3, Zhongxiao E. Rd., Taipei 10608 Taiwan, ROC. Taipei, Taiwan)

  • Benjamin Kofi Kujabi

    (National Taipei University of Technology Department of Industrial Engineering and Management. 1, Sec. 3, Zhongxiao E. Rd., Taipei 10608 Taiwan, ROC. Taipei, Taiwan)

  • Chun-Ling Chuang

    (Kainan University Department of Information Management)

  • Yueh-Chung Chen

    (Division of Cardiology, Department of Internal Medicine, Taipei City Hospital, Renai Branch, Taipei)

  • Chang-Ming Chen

    (Radiation Oncology Department Tri-Service General Hospital)

Abstract

Background: Breast cancer is the most common cancer among women. Many studies have made significant gains to classify breast cancer tumors with much emphasis on the best algorithm and highest classification accuracy but with limited interest in correcting misclassified data (Type 1 and Type 2 errors). Objective: This research proposes a novel hybrid integrated system of WEKA (Waikato Environment for Knowledge Analysis) and case-based reasoning (CBR) using myCBR plugin with protege for the classification of breast cancer tumors and correction of misclassified data (Type 1 and Type 2 errors) of breast cancer tumors. Methods: The Wisconsin breast cancer dataset retrieved from the Wisconsin university repository was used in this research. The dataset contained 699 instances, 2 classes (malignant and benign), and 9 integer-valued attributes. To determine the breast cancer tumors, we applied the J48, IBK, LibSVM, JRip, and Multi-Layer Perceptron (MLP) classifiers to classify the breast cancer tumors. Next, the myCBR plugin with protege was used as an advanced modeling technique to correct the misclassified data and enhance its accuracy. Results: The proposed model performance evaluation was based on sensitivity, specificity, precision, and accuracy. Interestingly, based on the analyses, the IBK classifier had the highest misclassified data and the integrated system improved its classification accuracy from 95.61% to 98.53%. Conclusion: The findings demonstrated that the integration of WEKA and myCBR plugin with protege had unprecedented results with misclassified data. Thus, providing accurate diagnostics procedures for distinguishing between benign and malignant.

Suggested Citation

  • Rong-Ho Lin & Benjamin Kofi Kujabi & Chun-Ling Chuang & Yueh-Chung Chen & Chang-Ming Chen, 2022. "Improving the Accuracy of Misclassified Breast Cancer Data using Machine Learning," Eximia Journal, Plus Communication Consulting SRL, vol. 4(1), pages 19-32, April.
  • Handle: RePEc:tec:eximia:v:4:y:2022:i:1:p:19-32
    as

    Download full text from publisher

    File URL: https://eximiajournal.com/index.php/eximia/article/view/100/53
    Download Restriction: no

    File URL: https://eximiajournal.com/index.php/eximia/article/view/100
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zhuang, Zoe Y. & Churilov, Leonid & Burstein, Frada & Sikaris, Ken, 2009. "Combining data mining and case-based reasoning for intelligent decision support for pathology ordering by general practitioners," European Journal of Operational Research, Elsevier, vol. 195(3), pages 662-675, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huirong Zhang & Zhenyu Zhang & Lixin Zhou & Shuangsheng Wu, 2021. "Case-Based Reasoning for Hidden Property Analysis of Judgment Debtors," Mathematics, MDPI, vol. 9(13), pages 1-17, July.
    2. Wei Li & Wolfgang Karl Hardle & Stefan Lessmann, 2022. "A Data-driven Case-based Reasoning in Bankruptcy Prediction," Papers 2211.00921, arXiv.org.
    3. Misiunas, Nicholas & Oztekin, Asil & Chen, Yao & Chandra, Kavitha, 2016. "DEANN: A healthcare analytic methodology of data envelopment analysis and artificial neural networks for the prediction of organ recipient functional status," Omega, Elsevier, vol. 58(C), pages 46-54.
    4. Li, Hui & Hong, Lu-Yao & He, Jia-Xun & Xu, Xuan-Guo & Sun, Jie, 2013. "Small sample-oriented case-based kernel predictive modeling and its economic forecasting applications under n-splits-k-times hold-out assessment," Economic Modelling, Elsevier, vol. 33(C), pages 747-761.

    More about this item

    Keywords

    Misclassified data; Classifiers; WEKA; myCBR; protege;
    All these keywords.

    JEL classification:

    • R00 - Urban, Rural, Regional, Real Estate, and Transportation Economics - - General - - - General
    • Z0 - Other Special Topics - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tec:eximia:v:4:y:2022:i:1:p:19-32. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tanase Tasente (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.