IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i22p4173-d966292.html
   My bibliography  Save this article

Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering

Author

Listed:
  • Nebojsa Bacanin

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • Miodrag Zivkovic

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • Catalin Stoean

    (Human Language Technologies Center, Faculty of Mathematics and Computer Science, University of Bucharest, Academiei 14, 010014 Bucharest, Romania
    Department of Computer Science, Faculty of Sciences, University of Craiova, A.I.Cuza, 13, 200585 Craiova, Romania)

  • Milos Antonijevic

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • Stefana Janicijevic

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • Marko Sarac

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • Ivana Strumberger

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

Abstract

Spam represents a genuine irritation for email users, since it often disturbs them during their work or free time. Machine learning approaches are commonly utilized as the engine of spam detection solutions, as they are efficient and usually exhibit a high degree of classification accuracy. Nevertheless, it sometimes happens that good messages are labeled as spam and, more often, some spam emails enter into the inbox as good ones. This manuscript proposes a novel email spam detection approach by combining machine learning models with an enhanced sine cosine swarm intelligence algorithm to counter the deficiencies of the existing techniques. The introduced novel sine cosine was adopted for training logistic regression and for tuning XGBoost models as part of the hybrid machine learning-metaheuristics framework. The developed framework has been validated on two public high-dimensional spam benchmark datasets (CSDMC2010 and TurkishEmail), and the extensive experiments conducted have shown that the model successfully deals with high-degree data. The comparative analysis with other cutting-edge spam detection models, also based on metaheuristics, has shown that the proposed hybrid method obtains superior performance in terms of accuracy, precision, recall, f1 score, and other relevant classification metrics. Additionally, the empirically established superiority of the proposed method is validated using rigid statistical tests.

Suggested Citation

  • Nebojsa Bacanin & Miodrag Zivkovic & Catalin Stoean & Milos Antonijevic & Stefana Janicijevic & Marko Sarac & Ivana Strumberger, 2022. "Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering," Mathematics, MDPI, vol. 10(22), pages 1-31, November.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:22:p:4173-:d:966292
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/22/4173/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/22/4173/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rajendra Akerkar, 2019. "Artificial Intelligence for Business," SpringerBriefs in Business, Springer, number 978-3-319-97436-1, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jani Dugonik & Mirjam Sepesy Maučec & Domen Verber & Janez Brest, 2023. "Reduction of Neural Machine Translation Failures by Incorporating Statistical Machine Translation," Mathematics, MDPI, vol. 11(11), pages 1-22, May.
    2. Dušan S. Radivojević & Ivan M. Lazović & Nikola S. Mirkov & Uzahir R. Ramadani & Dušan P. Nikezić, 2023. "A Comparative Evaluation of Self-Attention Mechanism with ConvLSTM Model for Global Aerosol Time Series Forecasting," Mathematics, MDPI, vol. 11(7), pages 1-13, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shrutika Mishra & A. R. Tripathi, 2021. "AI business model: an integrative business approach," Journal of Innovation and Entrepreneurship, Springer, vol. 10(1), pages 1-21, December.
    2. Zhisheng Chen, 2023. "Artificial Intelligence-Virtual Trainer: Innovative Didactics Aimed at Personalized Training Needs," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 14(2), pages 2007-2025, June.
    3. Justyna Łapińska & Iwona Escher & Joanna Górka & Agata Sudolska & Paweł Brzustewicz, 2021. "Employees’ Trust in Artificial Intelligence in Companies: The Case of Energy and Chemical Industries in Poland," Energies, MDPI, vol. 14(7), pages 1-20, April.
    4. Gerda Zigiene & Egidijus Rybakovas & Rimgaile Vaitkiene, 2020. "Challenges in Applying Artificial Intelligence for Supply Chain Risk Management," International Journal of Economics & Business Administration (IJEBA), International Journal of Economics & Business Administration (IJEBA), vol. 0(4), pages 299-318.
    5. Steve J. Bickley & Alison Macintyre & Benno Torgler, 2021. "Artificial Intelligence and Big Data in Sustainable Entrepreneurship," CREMA Working Paper Series 2021-11, Center for Research in Economics, Management and the Arts (CREMA).
    6. Mónica Santana & Mirta Díaz-Fernández, 2023. "Competencies for the artificial intelligence age: visualisation of the state of the art and future perspectives," Review of Managerial Science, Springer, vol. 17(6), pages 1971-2004, August.
    7. Omar H. Fares & Irfan Butt & Seung Hwan Mark Lee, 2023. "Utilization of artificial intelligence in the banking sector: a systematic literature review," Journal of Financial Services Marketing, Palgrave Macmillan, vol. 28(4), pages 835-852, December.
    8. Malik, Ashish & De Silva, M.T. Thedushika & Budhwar, Pawan & Srikanth, N.R., 2021. "Elevating talents' experience through innovative artificial intelligence-mediated knowledge sharing: Evidence from an IT-multinational enterprise," Journal of International Management, Elsevier, vol. 27(4).
    9. Neştian Andrei Ștefan & Tiţă SilviuMihail & Guţă Alexandra Luciana, 2020. "Incorporating artificial intelligence in knowledge creation processes in organizations," Proceedings of the International Conference on Business Excellence, Sciendo, vol. 14(1), pages 597-606, July.
    10. Gerda Žigienė & Egidijus Rybakovas & Rimgailė Vaitkienė & Vaidas Gaidelys, 2022. "Setting the Grounds for the Transition from Business Analytics to Artificial Intelligence in Solving Supply Chain Risk," Sustainability, MDPI, vol. 14(19), pages 1-23, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:22:p:4173-:d:966292. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.