IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0307027.html
   My bibliography  Save this article

Leveraging textual information for social media news categorization and sentiment analysis

Author

Listed:
  • Mahmudul Hasan
  • Tanver Ahmed
  • Md Rashedul Islam
  • Md Palash Uddin

Abstract

The rise of social media has changed how people view connections. Machine Learning (ML)-based sentiment analysis and news categorization help understand emotions and access news. However, most studies focus on complex models requiring heavy resources and slowing inference times, making deployment difficult in resource-limited environments. In this paper, we process both structured and unstructured data, determining the polarity of text using the TextBlob scheme to determine the sentiment of news headlines. We propose a Stochastic Gradient Descent (SGD)-based Ridge classifier (RC) for blending SGDR with an advanced string processing technique to effectively classify news articles. Additionally, we explore existing supervised and unsupervised ML algorithms to gauge the effectiveness of our SGDR classifier. The scalability and generalization capability of SGD and L2 regularization techniques in RCs to handle overfitting and balance bias and variance provide the proposed SGDR with better classification capability. Experimental results highlight that our string processing pipeline significantly boosts the performance of all ML models. Notably, our ensemble SGDR classifier surpasses all state-of-the-art ML algorithms, achieving an impressive 98.12% accuracy. McNemar’s significance tests reveal that our SGDR classifier achieves a 1% significance level improvement over K-Nearest Neighbor, Decision Tree, and AdaBoost and a 5% significance level improvement over other algorithms. These findings underscore the superior proficiency of linear models in news categorization compared to tree-based and nonlinear counterparts. This study contributes valuable insights into the efficacy of the proposed methodology, elucidating its potential for news categorization and sentiment analysis.

Suggested Citation

  • Mahmudul Hasan & Tanver Ahmed & Md Rashedul Islam & Md Palash Uddin, 2024. "Leveraging textual information for social media news categorization and sentiment analysis," PLOS ONE, Public Library of Science, vol. 19(7), pages 1-28, July.
  • Handle: RePEc:plo:pone00:0307027
    DOI: 10.1371/journal.pone.0307027
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0307027
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0307027&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0307027?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ankit Srivastava & Vijendra Singh & Gurdeep Singh Drall, 2019. "Sentiment Analysis of Twitter Data: A Hybrid Approach," International Journal of Healthcare Information Systems and Informatics (IJHISI), IGI Global, vol. 14(2), pages 1-16, April.
    2. Prabowo, Rudy & Thelwall, Mike, 2009. "Sentiment analysis: A combined approach," Journal of Informetrics, Elsevier, vol. 3(2), pages 143-157.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mark Lokanan, 2023. "The morality and tax avoidance: A sentiment and position taking analysis," PLOS ONE, Public Library of Science, vol. 18(7), pages 1-33, July.
    2. Shuyue Huang & Lena Jingen Liang & Hwansuk Chris Choi, 2022. "How We Failed in Context: A Text-Mining Approach to Understanding Hotel Service Failures," Sustainability, MDPI, vol. 14(5), pages 1-18, February.
    3. Damiano De Marchi & Rudy Becarelli & Leonardo Di Sarli, 2022. "Tourism Sustainability Index: Measuring Tourism Sustainability Based on the ETIS Toolkit, by Exploring Tourist Satisfaction via Sentiment Analysis," Sustainability, MDPI, vol. 14(13), pages 1-18, July.
    4. Hui Yuan & Wei Xu & Qian Li & Raymond Lau, 2018. "Topic sentiment mining for sales performance prediction in e-commerce," Annals of Operations Research, Springer, vol. 270(1), pages 553-576, November.
    5. Yong Shi & Luyao Zhu & Wei Li & Kun Guo & Yuanchun Zheng, 2019. "Survey on Classic and Latest Textual Sentiment Analysis Articles and Techniques," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(04), pages 1243-1287, July.
    6. Yucel, Ahmet & Dag, Ali & Oztekin, Asil & Carpenter, Mark, 2022. "A novel text analytic methodology for classification of product and service reviews," Journal of Business Research, Elsevier, vol. 151(C), pages 287-297.
    7. Brahami Menaouer & Abdeldjouad Fatma Zahra & Sabri Mohammed, 2022. "Multi-Class Sentiment Classification for Healthcare Tweets Using Supervised Learning Techniques," International Journal of Service Science, Management, Engineering, and Technology (IJSSMET), IGI Global, vol. 13(1), pages 1-23, January.
    8. Shivendra Kumar & C. Ravindranath Chowdary, 2022. "Semantic model to extract tips from hotel reviews," Electronic Commerce Research, Springer, vol. 22(4), pages 1059-1077, December.
    9. F. Schweitzer & D. Garcia, 2010. "An agent-based model of collective emotions in online communities," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 77(4), pages 533-545, October.
    10. Tidor-Vlad Pricope, 2021. "Deep Reinforcement Learning in Quantitative Algorithmic Trading: A Review," Papers 2106.00123, arXiv.org.
    11. Yaxin Bi, 2022. "Sentiment classification in social media data by combining triplet belief functions," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 968-991, July.
    12. Lima, Ana Carolina E.S. & de Castro, Leandro Nunes & Corchado, Juan M., 2015. "A polarity analysis framework for Twitter messages," Applied Mathematics and Computation, Elsevier, vol. 270(C), pages 756-767.
    13. Gang Wang & Daqing Zheng & Shanlin Yang & Jian Ma, 2018. "FCE-SVM: a new cluster based ensemble method for opinion mining from social media," Information Systems and e-Business Management, Springer, vol. 16(4), pages 721-742, November.
    14. Youngseok Choi & Habin Lee, 2017. "Data properties and the performance of sentiment classification for electronic commerce applications," Information Systems Frontiers, Springer, vol. 19(5), pages 993-1012, October.
    15. Nan Jing & Tao Jiang & Juan Du & Vijayan Sugumaran, 2018. "Personalized recommendation based on customer preference mining and sentiment assessment from a Chinese e-commerce website," Electronic Commerce Research, Springer, vol. 18(1), pages 159-179, March.
    16. Xiangfeng Luo & Yawen Yi, 2019. "Topic-Specific Emotion Mining Model for Online Comments," Future Internet, MDPI, vol. 11(3), pages 1-18, March.
    17. Chen, Long-Sheng & Liu, Cheng-Hsiang & Chiu, Hui-Ju, 2011. "A neural network based approach for sentiment classification in the blogosphere," Journal of Informetrics, Elsevier, vol. 5(2), pages 313-322.
    18. Barış-Tüzemen Özge & Tüzemen Samet & Çelik Ali Kemal, 2023. "Sentiment analysis of reviews on cappadocia: The land of beautiful horses in the eyes of tourists," European Journal of Tourism, Hospitality and Recreation, Sciendo, vol. 13(2), pages 188-197, December.
    19. Fan, Zhi-Ping & Che, Yu-Jie & Chen, Zhen-Yu, 2017. "Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis," Journal of Business Research, Elsevier, vol. 74(C), pages 90-100.
    20. A. Geethapriya & S. Valli, 2021. "An Enhanced Approach to Map Domain-Specific Words in Cross-Domain Sentiment Analysis," Information Systems Frontiers, Springer, vol. 23(3), pages 791-805, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0307027. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.