IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0314512.html
   My bibliography  Save this article

Predicting software reuse using machine learning techniques—A case study on open-source Java software systems

Author

Listed:
  • Matthew Yit Hang Yeow
  • Chun Yong Chong
  • Mei Kuan Lim
  • Yuen Yee Yen

Abstract

Software reuse is an essential practice to increase efficiency and reduce costs in software production. Software reuse practices range from reusing artifacts, libraries, components, packages, and APIs. Identifying suitable software for reuse requires pinpointing potential candidates. However, there are no objective methods in place to measure software reuse. This makes it challenging to identify highly reusable software. Software reuse research mainly addresses two hurdles: 1) identifying reusable candidates effectively and efficiently, and 2) selecting high-quality software components that improve maintainability and extensibility. This paper proposes automating software reuse prediction by leveraging machine learning (ML) algorithms, enabling future research and practitioners to better identify highly reusable software. Our approach uses cross-project code clone detection to establish the ground truth for software reuse, identifying code clones across popular GitHub projects as indicators of potential reuse candidates. Software metrics were extracted from Maven artifacts and used to train classification and regression models to predict and estimate software reuse. The average F1-score of the ML classification models is 77.19%. The best-performing model, Ridge Regression, achieved an F1-score of 79.17%. Additionally, this research aims to assist developers by identifying key metrics that significantly impact software reuse. Our findings suggest that the file-level PUA (Public Undocumented API) metric is the most important factor influencing software reuse. We also present suitable value ranges for the top five important metrics that developers can follow to create highly reusable software. Furthermore, we developed a tool that utilizes the trained models to predict the reuse potential of existing GitHub projects and rank Maven artifacts by their domain.

Suggested Citation

  • Matthew Yit Hang Yeow & Chun Yong Chong & Mei Kuan Lim & Yuen Yee Yen, 2025. "Predicting software reuse using machine learning techniques—A case study on open-source Java software systems," PLOS ONE, Public Library of Science, vol. 20(2), pages 1-30, February.
  • Handle: RePEc:plo:pone00:0314512
    DOI: 10.1371/journal.pone.0314512
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314512
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0314512&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0314512?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0314512. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.