IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0305657.html
   My bibliography  Save this article

Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models

Author

Listed:
  • Ali Alhazmi
  • Rohana Mahmud
  • Norisma Idris
  • Mohamed Elhag Mohamed Abo
  • Christopher Ifeanyi Eke

Abstract

Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. The Arabic language poses particular difficulties in the detection of hate speech, despite the considerable efforts made in this area for English-language social media content. Arabic calls for particular consideration when it comes to hate speech detection because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the TF-IDF feature, when employed with the SGD model, attained the highest accuracy, reaching 98.21%. Subsequently, these results were contrasted with outcomes from three existing studies, and the proposed method outperformed them, underscoring the significance of the proposed method. Consequently, our study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.

Suggested Citation

  • Ali Alhazmi & Rohana Mahmud & Norisma Idris & Mohamed Elhag Mohamed Abo & Christopher Ifeanyi Eke, 2024. "Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models," PLOS ONE, Public Library of Science, vol. 19(7), pages 1-24, July.
  • Handle: RePEc:plo:pone00:0305657
    DOI: 10.1371/journal.pone.0305657
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305657
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0305657&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0305657?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mohamed Elhag Mohamed Abo & Norisma Idris & Rohana Mahmud & Atika Qazi & Ibrahim Abaker Targio Hashem & Jaafar Zubairu Maitama & Usman Naseem & Shah Khalid Khan & Shuiqing Yang, 2021. "A Multi-Criteria Approach for Arabic Dialect Sentiment Analysis for Online Reviews: Exploiting Optimal Machine Learning Algorithm Selection," Sustainability, MDPI, vol. 13(18), pages 1-20, September.
    2. David H. Wolpert & William G. Macready, 1995. "No Free Lunch Theorems for Search," Working Papers 95-02-010, Santa Fe Institute.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jui-Sheng Chou & Dinh-Nhat Truong & Chih-Fong Tsai, 2021. "Solving Regression Problems with Intelligent Machine Learner for Engineering Informatics," Mathematics, MDPI, vol. 9(6), pages 1-25, March.
    2. Sevvandi Kandanaarachchi & Mario A Munoz & Rob J Hyndman & Kate Smith-Miles, 2018. "On normalization and algorithm selection for unsupervised outlier detection," Monash Econometrics and Business Statistics Working Papers 16/18, Monash University, Department of Econometrics and Business Statistics.
    3. Aktaş, Dilay & Lokman, Banu & İnkaya, Tülin & Dejaegere, Gilles, 2024. "Cluster ensemble selection and consensus clustering: A multi-objective optimization approach," European Journal of Operational Research, Elsevier, vol. 314(3), pages 1065-1077.
    4. Kamran Zolfi, 2023. "Gold rush optimizer: A new population-based metaheuristic algorithm," Operations Research and Decisions, Wroclaw University of Science and Technology, Faculty of Management, vol. 33(1), pages 113-150.
    5. William G. Macready & David H. Wolpert, 1995. "What Makes an Optimization Problem Hard?," Working Papers 95-05-046, Santa Fe Institute.
    6. Y.C. Ho & D.L. Pepyne, 2002. "Simple Explanation of the No-Free-Lunch Theorem and Its Implications," Journal of Optimization Theory and Applications, Springer, vol. 115(3), pages 549-570, December.
    7. Murtadha Al-Kaabi & Virgil Dumbrava & Mircea Eremia, 2022. "A Slime Mould Algorithm Programming for Solving Single and Multi-Objective Optimal Power Flow Problems with Pareto Front Approach: A Case Study of the Iraqi Super Grid High Voltage," Energies, MDPI, vol. 15(20), pages 1-33, October.
    8. Galioto, Francesco & Battilani, Adriano, 2021. "Agro-economic simulation for day by day irrigation scheduling optimisation," Agricultural Water Management, Elsevier, vol. 248(C).
    9. Daniyal Alghazzawi & Atika Qazi & Javaria Qazi & Khulla Naseer & Muhammad Zeeshan & Mohamed Elhag Mohamed Abo & Najmul Hasan & Shiza Qazi & Kiran Naz & Samrat Kumar Dey & Shuiqing Yang, 2021. "Prediction of the Infectious Outbreak COVID-19 and Prevalence of Anxiety: Global Evidence," Sustainability, MDPI, vol. 13(20), pages 1-16, October.
    10. Abdel-Rahman Hedar & Emad Mabrouk & Masao Fukushima, 2011. "Tabu Programming: A New Problem Solver Through Adaptive Memory Programming Over Tree Data Structures," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 10(02), pages 373-406.
    11. Murtadha Al-Kaabi & Virgil Dumbrava & Mircea Eremia, 2024. "Multi Criteria Frameworks Using New Meta-Heuristic Optimization Techniques for Solving Multi-Objective Optimal Power Flow Problems," Energies, MDPI, vol. 17(9), pages 1-39, May.
    12. Agarwal, Anurag & Colak, Selcuk & Eryarsoy, Enes, 2006. "Improvement heuristic for the flow-shop scheduling problem: An adaptive-learning approach," European Journal of Operational Research, Elsevier, vol. 169(3), pages 801-815, March.
    13. Murtadha Al-Kaabi & Virgil Dumbrava & Mircea Eremia, 2022. "Single and Multi-Objective Optimal Power Flow Based on Hunger Games Search with Pareto Concept Optimization," Energies, MDPI, vol. 15(22), pages 1-31, November.
    14. Muangkote, Nipotepat & Sunat, Khamron & Chiewchanwattana, Sirapat & Kaiwinit, Sirilak, 2019. "An advanced onlooker-ranking-based adaptive differential evolution to extract the parameters of solar cell models," Renewable Energy, Elsevier, vol. 134(C), pages 1129-1147.
    15. William G. Macready & David H. Wolpert, 1996. "On 2-Armed Gaussian Bandits and Optimization," Working Papers 96-03-009, Santa Fe Institute.
    16. Sharifian, Yeganeh & Abdi, Hamdi, 2023. "Solving multi-area economic dispatch problem using hybrid exchange market algorithm with grasshopper optimization algorithm," Energy, Elsevier, vol. 267(C).
    17. Díaz–Pachón, Daniel Andrés & Sáenz, Juan Pablo & Rao, J. Sunil, 2020. "Hypothesis testing with active information," Statistics & Probability Letters, Elsevier, vol. 161(C).
    18. Wang, Sinan & Zhao, Fuquan & Liu, Zongwei & Hao, Han, 2017. "Heuristic method for automakers' technological strategy making towards fuel economy regulations based on genetic algorithm: A China's case under corporate average fuel consumption regulation," Applied Energy, Elsevier, vol. 204(C), pages 544-559.
    19. Kimbrough, Steven Orla & Koehler, Gary J. & Lu, Ming & Wood, David Harlan, 2008. "On a Feasible-Infeasible Two-Population (FI-2Pop) genetic algorithm for constrained optimization: Distance tracing and no free lunch," European Journal of Operational Research, Elsevier, vol. 190(2), pages 310-327, October.
    20. Schirmer, Andreas & Riesenberg, Sven, 1998. "Class-based control schemes for parameterized project scheduling heuristics," Manuskripte aus den Instituten für Betriebswirtschaftslehre der Universität Kiel 471, Christian-Albrechts-Universität zu Kiel, Institut für Betriebswirtschaftslehre.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0305657. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.