IDEAS home Printed from https://ideas.repec.org/a/eee/reensy/v204y2020ics0951832020306712.html
   My bibliography  Save this article

Improved software defect prediction using Pruned Histogram-based isolation forest

Author

Listed:
  • Ding, Zhiguo
  • Xing, Liudong

Abstract

Software defect prediction (SDP) is a hot topic in the modern software engineering research community. It has been used for evaluating software quality and reliability and allocating limited testing resources effectively. Based on analyzing the software source code and development process and extracting critical metrics, many data mining and machine learning methods have been used for SDP. However, these existing learning methods have difficulty with handling the imbalanced data distribution of accumulated training dataset. Isolation forest, an anomaly detection method based on the ensemble learning, has been studied to deal with the imbalanced data distribution issue for obtaining high prediction performance. However, the isolation forest method suffers from a main drawback of slow convergence, which is caused by selecting the feature value at random during the process of building isolation trees. To conquer this problem, in this paper histogram is constructed for the value set of selected isolation feature helping identify feature values preferable to build isolation trees. Motivated by the “many could be better than all†principle in the ensemble learning, the ensemble pruning strategy is further employed to optimize the obtained isolation forest, leading to a novel SDP method named PHIForest (Pruned Histogram-based Isolation Forest) in this work. The proposed method can provide fast convergence through the histogram-based splitting feature value selection, and decrease the ensemble scale and improve prediction performance through the ensemble pruning. Comprehensive experiments conducted on ten real datasets are performed to demonstrate effectiveness of the proposed SDP method.

Suggested Citation

  • Ding, Zhiguo & Xing, Liudong, 2020. "Improved software defect prediction using Pruned Histogram-based isolation forest," Reliability Engineering and System Safety, Elsevier, vol. 204(C).
  • Handle: RePEc:eee:reensy:v:204:y:2020:i:c:s0951832020306712
    DOI: 10.1016/j.ress.2020.107170
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0951832020306712
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ress.2020.107170?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wang, Jinyong & Zhang, Ce, 2018. "Software reliability prediction using a deep learning model based on the RNN encoder–decoder," Reliability Engineering and System Safety, Elsevier, vol. 170(C), pages 73-82.
    2. Heydari, Mohammadhossein & Sullivan, Kelly M., 2019. "Robust allocation of testing resources in reliability growth," Reliability Engineering and System Safety, Elsevier, vol. 192(C).
    3. Lee, Sang Hun & Lee, Seung Jun & Shin, Sung Min & Lee, Eun-chan & Kang, Hyun Gook, 2020. "Exhaustive testing of safety-critical software for reactor protection system," Reliability Engineering and System Safety, Elsevier, vol. 193(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gao, Lu & Lu, Pan & Ren, Yihao, 2021. "A deep learning approach for imbalanced crash data in predicting highway-rail grade crossings accidents," Reliability Engineering and System Safety, Elsevier, vol. 216(C).
    2. Yinsheng Fu & Jullius Kumar & Bibhu Prasad Ganthia & Rahul Neware, 2022. "Nonlinear dynamic measurement method of software reliability based on data mining," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(1), pages 273-280, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shin, Sung-Min & Lee, Sang Hun & Shin, Seung Ki, 2022. "A novel approach for quantitative importance analysis of safety DI&C systems in the nuclear field," Reliability Engineering and System Safety, Elsevier, vol. 228(C).
    2. Wang, Peipei & Zheng, Xinqi & Ai, Gang & Liu, Dongya & Zhu, Bangren, 2020. "Time series prediction for the epidemic trends of COVID-19 using the improved LSTM deep learning method: Case studies in Russia, Peru and Iran," Chaos, Solitons & Fractals, Elsevier, vol. 140(C).
    3. Xu, Zhaoyi & Saleh, Joseph Homer, 2021. "Machine learning for reliability engineering and safety applications: Review of current status and future opportunities," Reliability Engineering and System Safety, Elsevier, vol. 211(C).
    4. Da Hye Lee & In Hong Chang & Hoang Pham, 2020. "Software Reliability Model with Dependent Failures and SPRT," Mathematics, MDPI, vol. 8(8), pages 1-14, August.
    5. Kyawt Kyawt San & Hironori Washizaki & Yoshiaki Fukazawa & Kiyoshi Honda & Masahiro Taga & Akira Matsuzaki, 2021. "Deep Cross-Project Software Reliability Growth Model Using Project Similarity-Based Clustering," Mathematics, MDPI, vol. 9(22), pages 1-22, November.
    6. Modibbo, Umar Muhammad & Arshad, Mohd. & Abdalghani, Omer & Ali, Irfan, 2021. "Optimization and estimation in system reliability allocation problem," Reliability Engineering and System Safety, Elsevier, vol. 212(C).
    7. Dahye Lee & Inhong Chang & Hoang Pham, 2023. "Study of a New Software Reliability Growth Model under Uncertain Operating Environments and Dependent Failures," Mathematics, MDPI, vol. 11(18), pages 1-17, September.
    8. Li, Xingyu & Krivtsov, Vasiliy & Arora, Karunesh, 2022. "Attention-based deep survival model for time series data," Reliability Engineering and System Safety, Elsevier, vol. 217(C).
    9. Murray, Brian & Perera, Lokukaluge Prasad, 2021. "An AIS-based deep learning framework for regional ship behavior prediction," Reliability Engineering and System Safety, Elsevier, vol. 215(C).
    10. Dehghani, Nariman L. & Zamanian, Soroush & Shafieezadeh, Abdollah, 2021. "Adaptive network reliability analysis: Methodology and applications to power grid," Reliability Engineering and System Safety, Elsevier, vol. 216(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:reensy:v:204:y:2020:i:c:s0951832020306712. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/reliability-engineering-and-system-safety .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.