IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i14p2379-d857076.html
   My bibliography  Save this article

Towards Explainable Machine Learning for Bank Churn Prediction Using Data Balancing and Ensemble-Based Methods

Author

Listed:
  • Stéphane C. K. Tékouabou

    (Center of Urban Systems (CUS), Mohammed VI Polytechnic University (UM6P), Hay Moulay Rachid, Ben Guerir 43150, Morocco
    Laboratory LAROSERI, Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, El Jadida 24000, Morocco)

  • Ștefan Cristian Gherghina

    (Department of Finance, Bucharest University of Economic Studies, 6 Piata Romana, 010374 Bucharest, Romania)

  • Hamza Toulni

    (EIGSICA, 282 Route of the Oasis, Mâarif, Casablanca 20140, Morocco
    LIMSAD Laboratory, Faculty of Sciences Ain Chock, Hassan II University of Casablanca, Casablanca 20000, Morocco)

  • Pedro Neves Mata

    (ISCAL-Instituto Superior de Contabilidade e Administraçäo de Lisboa, Instituto Politécnico de Lisboa, Avenida Miguel Bombarda 20, 1069-035 Lisboa, Portugal
    Microsoft (CSS-Microsoft Customer Service and Support Department), Rua Do Fogo de Santelmo, Lote 2.07.02, 1990-110 Lisboa, Portugal)

  • José Moleiro Martins

    (ISCAL-Instituto Superior de Contabilidade e Administraçäo de Lisboa, Instituto Politécnico de Lisboa, Avenida Miguel Bombarda 20, 1069-035 Lisboa, Portugal
    Business Research Unit (BRU-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal)

Abstract

The diversity of data collected on both social networks and digital interfaces is extremely increased, raising the problem of heterogeneous variables that are not often favourable to classification algorithms. Despite the significant improvement in machine learning (ML) and predictive analysis efficiency for classification in customer relationship management systems (CRM), their performance remains very limited by heterogeneous data processing, class imbalance, and feature scales. This impact turned out to be more important for simple ML methods which in addition often suffer from over-fitting. This paper proposes a succinct and detailed ML model building process including cross-validation of the combination of SMOTE to balance data and ensemble methods for modelling. From the conducted experiments, the random forest (RF) model yielded the best performance of 0.86 in terms of accuracy and f1-scoreusing balanced data. It confirms the literature summary about this topic which shows that RF was among the most effective algorithms for customer predictive classification issues. The constructed and optimized models were interpreted by Shapley values and feature importance analysis which shows that the “age” feature was the most significant while “HasCrCard” was the less one. This process has proven effective in bridging previously reported research gaps and the resulting model should be used for supporting bank customer loyalty decision-making.

Suggested Citation

  • Stéphane C. K. Tékouabou & Ștefan Cristian Gherghina & Hamza Toulni & Pedro Neves Mata & José Moleiro Martins, 2022. "Towards Explainable Machine Learning for Bank Churn Prediction Using Data Balancing and Ensemble-Based Methods," Mathematics, MDPI, vol. 10(14), pages 1-16, July.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:14:p:2379-:d:857076
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/14/2379/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/14/2379/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Georgios Marinakos & Sophia Daskalaki, 2017. "Imbalanced customer classification for bank direct marketing," Journal of Marketing Analytics, Palgrave Macmillan, vol. 5(1), pages 14-30, March.
    2. Arjunan, Pandarasamy & Poolla, Kameshwar & Miller, Clayton, 2020. "EnergyStar++: Towards more accurate and explanatory building energy benchmarking," Applied Energy, Elsevier, vol. 276(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Siti Nurasyikin Shamsuddin & Noriszura Ismail & R. Nur-Firyal, 2023. "Life Insurance Prediction and Its Sustainability Using Machine Learning Approach," Sustainability, MDPI, vol. 15(13), pages 1-20, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Ao & Xiao, Fu & Zhang, Chong & Fan, Cheng, 2021. "Attention-based interpretable neural network for building cooling load prediction," Applied Energy, Elsevier, vol. 299(C).
    2. Chen, Xia & Geyer, Philipp, 2022. "Machine assistance in energy-efficient building design: A predictive framework toward dynamic interaction with human decision-making under uncertainty," Applied Energy, Elsevier, vol. 307(C).
    3. Stéphane Cédric Koumétio Tékouabou & Ştefan Cristian Gherghina & Hamza Toulni & Pedro Neves Mata & Mário Nuno Mata & José Moleiro Martins, 2022. "A Machine Learning Framework towards Bank Telemarketing Prediction," JRFM, MDPI, vol. 15(6), pages 1-19, June.
    4. Wenninger, Simon & Kaymakci, Can & Wiethe, Christian, 2022. "Explainable long-term building energy consumption prediction using QLattice," Applied Energy, Elsevier, vol. 308(C).
    5. Omar H. Fares & Irfan Butt & Seung Hwan Mark Lee, 2023. "Utilization of artificial intelligence in the banking sector: a systematic literature review," Journal of Financial Services Marketing, Palgrave Macmillan, vol. 28(4), pages 835-852, December.
    6. Luca Gugliermetti & Fabrizio Cumo & Sofia Agostinelli, 2024. "A Future Direction of Machine Learning for Building Energy Management: Interpretable Models," Energies, MDPI, vol. 17(3), pages 1-27, February.
    7. Salah Vaisi & Saleh Mohammadi & Benedetto Nastasi & Kavan Javanroodi, 2020. "A New Generation of Thermal Energy Benchmarks for University Buildings," Energies, MDPI, vol. 13(24), pages 1-18, December.
    8. Ali, Aliyuda & Aliyuda, Kachalla & Elmitwally, Nouh & Muhammad Bello, Abdulwahab, 2022. "Towards more accurate and explainable supervised learning-based prediction of deliverability for underground natural gas storage," Applied Energy, Elsevier, vol. 327(C).
    9. Vaisi, Salah & Varmazyari, Pouya & Esfandiari, Masoud & Sharbaf, Sara A., 2023. "Developing a multi-level energy benchmarking and certification system for office buildings in a cold climate region," Applied Energy, Elsevier, vol. 336(C).
    10. Jaqueline Litardo & Ruben Hidalgo-Leon & Guillermo Soriano, 2021. "Energy Performance and Benchmarking for University Classrooms in Hot and Humid Climates," Energies, MDPI, vol. 14(21), pages 1-17, October.
    11. Marco Vriens & Nathan Bosch & Chad Vidden & Jason Talwar, 2022. "Prediction and profitability in market segmentation typing tools," Journal of Marketing Analytics, Palgrave Macmillan, vol. 10(4), pages 360-389, December.
    12. Andrews, Abigail & Jain, Rishee K., 2022. "Beyond Energy Efficiency: A clustering approach to embed demand flexibility into building energy benchmarking," Applied Energy, Elsevier, vol. 327(C).
    13. Zhang, Chaobo & Li, Junyang & Zhao, Yang & Li, Tingting & Chen, Qi & Zhang, Xuejun & Qiu, Weikang, 2021. "Problem of data imbalance in building energy load prediction: Concept, influence, and solution," Applied Energy, Elsevier, vol. 297(C).
    14. Branden M. Deiss & Mallori Herishko & Lauren Wright & Michelle Maliborska & J. Patrick Abulencia, 2021. "Analysis of Energy Consumption in Commercial and Residential Buildings in New York City before and during the COVID-19 Pandemic," Sustainability, MDPI, vol. 13(21), pages 1-14, October.
    15. Jin, Xiaoyu & Xiao, Fu & Zhang, Chong & Chen, Zhijie, 2022. "Semi-supervised learning based framework for urban level building electricity consumption prediction," Applied Energy, Elsevier, vol. 328(C).
    16. Jinping Hu, 2023. "Customer feature selection from high-dimensional bank direct marketing data for uplift modeling," Journal of Marketing Analytics, Palgrave Macmillan, vol. 11(2), pages 160-171, June.
    17. Pritha Ghosh & Subrata Saha & Shamindra Nath Sanyal & Swati Mukherjee, 2021. "Positioning of private label brands of men’s apparel against national brands," Journal of Marketing Analytics, Palgrave Macmillan, vol. 9(3), pages 210-227, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:14:p:2379-:d:857076. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.