IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i20p10670-d654096.html
   My bibliography  Save this article

Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction

Author

Listed:
  • Nahúm Cueto López

    (Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, 24071 León, Spain)

  • María Teresa García-Ordás

    (Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, 24071 León, Spain)

  • Facundo Vitelli-Storelli

    (Centro de Investigación Biomédica en Red (CIBER), Grupo Investigación Interacciones Gen-Ambiente y Salud (GIIGAS), Instituto de Biomedicina (IBIOMED), Universidad de León, 24071 León, Spain)

  • Pablo Fernández-Navarro

    (Cancer and Environmental Epidemiology Unit, National Center for Epidemiology, Carlos III Institute of Health, 28903 Madrid, Spain
    Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), 28029 Madrid, Spain)

  • Camilo Palazuelos

    (Department of Mathematics, Statistics, and Computing, University of Cantabria-IDIVAL, 39005 Santander, Spain)

  • Rocío Alaiz-Rodríguez

    (Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, 24071 León, Spain)

Abstract

This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.

Suggested Citation

  • Nahúm Cueto López & María Teresa García-Ordás & Facundo Vitelli-Storelli & Pablo Fernández-Navarro & Camilo Palazuelos & Rocío Alaiz-Rodríguez, 2021. "Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction," IJERPH, MDPI, vol. 18(20), pages 1-28, October.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:20:p:10670-:d:654096
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/20/10670/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/20/10670/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:20:p:10670-:d:654096. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.