IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v7y2022i9p127-d908972.html
   My bibliography  Save this article

Are Source Code Metrics “Good Enough” in Predicting Security Vulnerabilities?

Author

Listed:
  • Sundarakrishnan Ganesh

    (Department of Computer Science and Media Technology, Linnaeus University, 351 95 Växjö, Sweden)

  • Francis Palma

    (Department of Computer Science and Media Technology, Linnaeus University, 351 95 Växjö, Sweden)

  • Tobias Olsson

    (Department of Computer Science and Media Technology, Linnaeus University, 351 95 Växjö, Sweden)

Abstract

Modern systems produce and handle a large volume of sensitive enterprise data. Therefore, security vulnerabilities in the software systems must be identified and resolved early to prevent security breaches and failures. Predicting security vulnerabilities is an alternative to identifying them as developers write code. In this study, we studied the ability of several machine learning algorithms to predict security vulnerabilities. We created two datasets containing security vulnerability information from two open-source systems: (1) Apache Tomcat (versions 4.x and five 2.5.x minor versions). We also computed source code metrics for these versions of both systems. We examined four classifiers, including Naive Bayes, Decision Tree, XGBoost Classifier, and Logistic Regression, to show their ability to predict security vulnerabilities. Moreover, an ensemble learner was introduced using a stacking classifier to see whether the prediction performance could be improved. We performed cross-version and cross-project predictions to assess the effectiveness of the best-performing model. Our results showed that the XGBoost classifier performed best compared to other learners, i.e., with an average accuracy of 97% in both datasets. The stacking classifier performed with an average accuracy of 92% in Struts and 71% in Tomcat. Our best-performing model—XGBoost—could predict with an average accuracy of 87% in Tomcat and 99% in Struts in a cross-version setup.

Suggested Citation

  • Sundarakrishnan Ganesh & Francis Palma & Tobias Olsson, 2022. "Are Source Code Metrics “Good Enough” in Predicting Security Vulnerabilities?," Data, MDPI, vol. 7(9), pages 1-38, September.
  • Handle: RePEc:gam:jdataj:v:7:y:2022:i:9:p:127-:d:908972
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/7/9/127/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/7/9/127/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:7:y:2022:i:9:p:127-:d:908972. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.