IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0308862.html
   My bibliography  Save this article

Machine learning and deep learning-based approach to categorize Bengali comments on social networks using fused dataset

Author

Listed:
  • Khandaker Mohammad Mohi Uddin
  • Hasibul Hamim
  • Mst Nishat Tasnim Mim
  • Arnisha Akhter
  • Md Ashraf Uddin

Abstract

Through the advancement of the contemporary web and the rapid adoption of social media platforms such as YouTube, Twitter, and Facebook, for example, life has become much easier when dealing with certain highly personal problems. The far-reaching consequences of online harassment require immediate preventative steps to safeguard psychological wellness and scholarly achievement via detection at an earlier stage. This piece of writing aims to eliminate online harassment and create a criticism-free online environment. In the paper, we have used a variety of attributes to evaluate a large number of Bengali comments. We communicate cleansed data utilizing machine learning (ML) methods and natural language processing techniques, which must be followed using term frequency and reverse document frequency (TF-IDF) with a count vectorizer. In addition, we used tokenization with padding to feed our deep learning (DL) models. Using mathematical visualization and natural language processing, online bullying could be detected quickly. Multi-layer Perceptron (MLP), K-Nearest Neighbors (K-NN), Extreme Gradient Boosting (XGBoost), Adaptive Boosting Classifier (AdaBoost), Logistic Regression Classifier (LR), Random Forest Classifier (RF), Bagging Classifier, Stochastic Gradient Descent (SGD), Voting Classifier, and Stacking are employed in the research we conducted. We expanded our investigation to include different DL frameworks. Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Convolutional-Long Short-Term Memory (C-LSTM), and Bidirectional Long Short-Term Memory (BiLSTM) are all implemented. A large amount of data is required to precisely recognize harassing behavior. To rapidly recognize internet harassment written material, we combined two sets of data, producing 94,000 Bengali comments from different points of view. After understanding the ML and DL models, we can see that a hybrid model (MLP+SGD+LR) performed more effectively when compared to other models, its evaluation accuracy is 99.34%, precision is 99.34%, recall rate is 99.33%, and F1 score is 99.34% on multi-label class. For the binary classification model, we got 99.41% of accuracy.

Suggested Citation

  • Khandaker Mohammad Mohi Uddin & Hasibul Hamim & Mst Nishat Tasnim Mim & Arnisha Akhter & Md Ashraf Uddin, 2024. "Machine learning and deep learning-based approach to categorize Bengali comments on social networks using fused dataset," PLOS ONE, Public Library of Science, vol. 19(10), pages 1-35, October.
  • Handle: RePEc:plo:pone00:0308862
    DOI: 10.1371/journal.pone.0308862
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0308862
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0308862&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0308862?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Denny, Matthew J. & Spirling, Arthur, 2018. "Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It," Political Analysis, Cambridge University Press, vol. 26(2), pages 168-189, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
    2. Purwoko Haryadi Santoso & Edi Istiyono & Haryanto & Wahyu Hidayatulloh, 2022. "Thematic Analysis of Indonesian Physics Education Research Literature Using Machine Learning," Data, MDPI, vol. 7(11), pages 1-41, October.
    3. Noailly, Joëlle & Nowzohour, Laura & van den Heuvel, Matthias & Pla, Ireneu, 2024. "Heard the news? Environmental policy and clean investments," Journal of Public Economics, Elsevier, vol. 238(C).
    4. Camilla Salvatore & Silvia Biffignandi & Annamaria Bianchi, 2022. "Corporate Social Responsibility Activities Through Twitter: From Topic Model Analysis to Indexes Measuring Communication Characteristics," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 164(3), pages 1217-1248, December.
    5. Jason Anastasopoulos & George J. Borjas & Gavin G. Cook & Michael Lachanski, 2018. "Job Vacancies, the Beveridge Curve, and Supply Shocks: The Frequency and Content of Help-Wanted Ads in Pre- and Post-Mariel Miami," NBER Working Papers 24580, National Bureau of Economic Research, Inc.
    6. Lorien Stice-Lawrence, 2022. "Practical issues to consider when working with big data," Review of Accounting Studies, Springer, vol. 27(3), pages 1117-1124, September.
    7. Tyler Andrew Scott & Nicola Ulibarri & Omar Perez Figueroa, 2020. "NEPA and National Trends in Federal Infrastructure Siting in the United States," Review of Policy Research, Policy Studies Organization, vol. 37(5), pages 605-633, September.
    8. Seraphine F. Maerz & Carsten Q. Schneider, 2020. "Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(2), pages 517-545, April.
    9. Iasmin Goes, 2023. "Examining the effect of IMF conditionality on natural resource policy," Economics and Politics, Wiley Blackwell, vol. 35(1), pages 227-285, March.
    10. Erkan Gunes & Christoffer Koch Florczak, 2025. "Replacing or enhancing the human coder? Multiclass classification of policy documents with large language models," Journal of Computational Social Science, Springer, vol. 8(2), pages 1-20, May.
    11. Latifi, Albina & Naboka-Krell, Viktoriia & Tillmann, Peter & Winker, Peter, 2024. "Fiscal policy in the Bundestag: Textual analysis and macroeconomic effects," European Economic Review, Elsevier, vol. 168(C).
    12. Yeomans, Michael, 2021. "A concrete example of construct construction in natural language," Organizational Behavior and Human Decision Processes, Elsevier, vol. 162(C), pages 81-94.
    13. Ai-Hsuan Chiang & Silvana Trimi & Tun-Chih Kou, 2024. "Critical Factors for Implementing Smart Manufacturing: A Supply Chain Perspective," Sustainability, MDPI, vol. 16(22), pages 1-21, November.
    14. Mykola Makhortykh & Ernesto León & Clara Christner & Maryna Sydorova & Aleksandra Urman & Silke Adam & Michaela Maier & Teresa Gil-Lopez, 2025. "Is a single model enough? The systematic comparison of computational approaches for detecting populist radical right content," Quality & Quantity: International Journal of Methodology, Springer, vol. 59(2), pages 1163-1207, April.
    15. Bastiaan Bruinsma & Moa Johansson, 2024. "Finding the structure of parliamentary motions in the Swedish Riksdag 1971–2015," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(4), pages 3275-3301, August.
    16. Justyna Klejdysz & Robin L. Lumsdaine, 2023. "Shifts in ECB Communication: A Textual Analysis of the Press Conference," International Journal of Central Banking, International Journal of Central Banking, vol. 19(2), pages 473-542, June.
    17. Jaehwan LIM & Asei ITO & Hongyong ZHANG, 2023. "Policy Agenda and Trajectory of the Xi Jinping Administration: Textual Evidence from 2012 to 2022," Policy Discussion Papers 23008, Research Institute of Economy, Trade and Industry (RIETI).
    18. Wang, Yan & Luo, Ting, 2023. "Politicizing for the idol: China’s idol fandom nationalism in pandemic," LSE Research Online Documents on Economics 117741, London School of Economics and Political Science, LSE Library.
    19. Camilla Salvatore, 2023. "Inference with non-probability samples and survey data integration: a science mapping study," METRON, Springer;Sapienza Università di Roma, vol. 81(1), pages 83-107, April.
    20. Jaeho Choi & Anoop Menon & Haris Tabakovic, 2021. "Using machine learning to revisit the diversification–performance relationship," Strategic Management Journal, Wiley Blackwell, vol. 42(9), pages 1632-1661, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0308862. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.