IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i4p792-d1057729.html
   My bibliography  Save this article

A Novel Approach to Decision-Making on Diagnosing Oncological Diseases Using Machine Learning Classifiers Based on Datasets Combining Known and/or New Generated Features of a Different Nature

Author

Listed:
  • Liliya A. Demidova

    (Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia)

Abstract

This paper deals with the problem of diagnosing oncological diseases based on blood protein markers. The goal of the study is to develop a novel approach in decision-making on diagnosing oncological diseases based on blood protein markers by generating datasets that include various combinations of features: both known features corresponding to blood protein markers and new features generated with the help of mathematical tools, particularly with the involvement of the non-linear dimensionality reduction algorithm UMAP, formulas for various entropies and fractal dimensions. These datasets were used to develop a group of multiclass kNN and SVM classifiers using oversampling algorithms to solve the problem of class imbalance in the dataset, which is typical for medical diagnostics problems. The results of the experimental studies confirmed the feasibility of using the UMAP algorithm and approximation entropy, as well as Katz and Higuchi fractal dimensions to generate new features based on blood protein markers. Various combinations of these features can be used to expand the set of features from the original dataset in order to improve the quality of the received classification solutions for diagnosing oncological diseases. The best kNN and SVM classifiers were developed based on the original dataset augmented respectively with a feature based on the approximation entropy and features based on the UMAP algorithm and the approximation entropy. At the same time, the average values of the metric MacroF 1 - score used to assess the quality of classifiers during cross-validation increased by 16.138% and 4.219%, respectively, compared to the average values of this metric in the case when the original dataset was used in the development of classifiers of the same name.

Suggested Citation

  • Liliya A. Demidova, 2023. "A Novel Approach to Decision-Making on Diagnosing Oncological Diseases Using Machine Learning Classifiers Based on Datasets Combining Known and/or New Generated Features of a Different Nature," Mathematics, MDPI, vol. 11(4), pages 1-39, February.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:4:p:792-:d:1057729
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/4/792/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/4/792/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Syed Furqan Qadri & Linlin Shen & Mubashir Ahmad & Salman Qadri & Syeda Shamaila Zareen & Muhammad Azeem Akbar, 2022. "SVseg: Stacked Sparse Autoencoder-Based Patch Classification Modeling for Vertebrae Segmentation," Mathematics, MDPI, vol. 10(5), pages 1-19, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Feng Liu & Fangfang Gou & Jia Wu, 2022. "An Attention-Preserving Network-Based Method for Assisted Segmentation of Osteosarcoma MRI Images," Mathematics, MDPI, vol. 10(10), pages 1-25, May.
    2. Chung Feng Jeffrey Kuo & Zheng-Xun Yang & Wen-Sen Lai & Shao-Cheng Liu, 2022. "Application of Image Processing and 3D Printing Technique to Development of Computer Tomography System for Automatic Segmentation and Quantitative Analysis of Pulmonary Bronchus," Mathematics, MDPI, vol. 10(18), pages 1-25, September.
    3. Chung-Feng Jeffrey Kuo & Shao-Cheng Liu, 2022. "Fully Automatic Segmentation, Identification and Preoperative Planning for Nasal Surgery of Sinuses Using Semi-Supervised Learning and Volumetric Reconstruction," Mathematics, MDPI, vol. 10(7), pages 1-32, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:4:p:792-:d:1057729. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.