IDEAS home Printed from https://ideas.repec.org/a/eee/tefoso/v169y2021ics0040162521002286.html
   My bibliography  Save this article

A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems

Author

Listed:
  • Ebrahimi Shahabadi, Mohammad Saleh
  • Tabrizchi, Hamed
  • Kuchaki Rafsanjani, Marjan
  • Gupta, B.B.
  • Palmieri, Francesco

Abstract

Nowadays, most real-world datasets suffer from the problem of imbalanced distribution of data samples in classes, especially when the number of data representing the larger class (majority) is much greater than that of the smaller class (minority). In order to solve this problem, various types of undersampling or oversampling techniques have been proposed to create a dataset with equal number of samples in each class by reducing or increasing the number of samples in majority or minority classes, respectively. Ensemble classifiers use multiple learning algorithms to enhance the accuracy of classification. Based on the results, combining undersampling or oversampling methods with ensemble classifiers can result in models with better performance. By using both clustering and new undersampling methods, the present study aimed to propose a novel clustering-based undersampling method to create a balanced dataset. This method uses k-means clustering algorithm for clustering the data, Mahalanobis distance to analyze samples distance in each cluster to centroid, and a selection method that preserves the pattern of data distribution in each cluster. Regarding the experimental results obtained by 44 benchmark datasets from KEEL repository, the proposed approach performed better than that of seven state-of-the-art approaches.

Suggested Citation

  • Ebrahimi Shahabadi, Mohammad Saleh & Tabrizchi, Hamed & Kuchaki Rafsanjani, Marjan & Gupta, B.B. & Palmieri, Francesco, 2021. "A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems," Technological Forecasting and Social Change, Elsevier, vol. 169(C).
  • Handle: RePEc:eee:tefoso:v:169:y:2021:i:c:s0040162521002286
    DOI: 10.1016/j.techfore.2021.120796
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0040162521002286
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.techfore.2021.120796?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kumar, Nikhil & Poonia, Vikas & Gupta, B.B. & Goyal, Manish Kumar, 2021. "A novel framework for risk assessment and resilience of critical infrastructure towards climate change," Technological Forecasting and Social Change, Elsevier, vol. 165(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vibha Pratap & Amit Prakash Singh, 2023. "Novel fuzzy clustering-based undersampling framework for class imbalance problem," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 14(3), pages 967-976, June.
    2. Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qayyum, Abdul & Razzak, Imran & Malik, Aamir Saeed & Anwar, Sajid, 2021. "Fusion of CNN and sparse representation for threat estimation near power lines and poles infrastructure using aerial stereo imagery," Technological Forecasting and Social Change, Elsevier, vol. 168(C).
    2. Seol A. Kwon, 2022. "Where Does an Individual’s Willingness to Act on Alleviating the Climate Crisis in Korea Arise from?," Sustainability, MDPI, vol. 14(11), pages 1-17, May.
    3. Yu, Hongxin & Zhao, Yuanjun & Liu, Zheng & Liu, Wei & Zhang, Shuai & Wang, Fatao & Shi, Lihua, 2021. "Research on the financing income of supply chains based on an E-commerce platform," Technological Forecasting and Social Change, Elsevier, vol. 169(C).
    4. Pornpit Wongthongtham & Bilal Abu-Salih & Jeff Huang & Hemixa Patel & Komsun Siripun, 2023. "A Multi-Criteria Analysis Approach to Identify Flood Risk Asset Damage Hotspots in Western Australia," Sustainability, MDPI, vol. 15(7), pages 1-30, March.
    5. Hinge, Gilbert & Surampalli, Rao Y. & Goyal, Manish Kumar & Gupta, Brij B. & Chang, Xiaojun, 2021. "Soil carbon and its associate resilience using big data analytics: For food Security and environmental management," Technological Forecasting and Social Change, Elsevier, vol. 169(C).
    6. Jha, Srinidhi & Goyal, Manish Kumar & Gupta, Brij & Gupta, Anil Kumar, 2021. "A novel analysis of COVID 19 risk in India incorporating climatic and socioeconomic Factors," Technological Forecasting and Social Change, Elsevier, vol. 167(C).
    7. Qingmu Su & Hsueh-Sheng Chang & Shin-En Pai, 2022. "A Comparative Study of the Resilience of Urban and Rural Areas under Climate Change," IJERPH, MDPI, vol. 19(15), pages 1-14, July.
    8. Khan Babar, Abdul Haseeb & Ali, Yousaf, 2022. "Framework construction for augmentation of resilience in critical infrastructure: Developing countries a case in point," Technology in Society, Elsevier, vol. 68(C).
    9. Xu, Min & Li, Guoyuan & Chen, Anthony, 2024. "Resilience-driven post-disaster restoration of interdependent infrastructure systems under different decision-making environments," Reliability Engineering and System Safety, Elsevier, vol. 241(C).
    10. Kumar, Aman & Shankar, Amit & Behl, Abhishek & Arya, Varsha & Gupta, Nakul, 2023. "Should I share it? Factors influencing fake news-sharing behaviour: A behavioural reasoning theory perspective," Technological Forecasting and Social Change, Elsevier, vol. 193(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:tefoso:v:169:y:2021:i:c:s0040162521002286. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.sciencedirect.com/science/journal/00401625 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.