IDEAS home Printed from https://ideas.repec.org/a/spr/comaot/v25y2019i3d10.1007_s10588-018-9266-8.html
   My bibliography  Save this article

The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis

Author

Listed:
  • Saqib Alam

    (Dalian University of Technology)

  • Nianmin Yao

    (Dalian University of Technology)

Abstract

Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naïve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.

Suggested Citation

  • Saqib Alam & Nianmin Yao, 2019. "The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis," Computational and Mathematical Organization Theory, Springer, vol. 25(3), pages 319-335, September.
  • Handle: RePEc:spr:comaot:v:25:y:2019:i:3:d:10.1007_s10588-018-9266-8
    DOI: 10.1007/s10588-018-9266-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10588-018-9266-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10588-018-9266-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Intan Nurma Yulita & Victor Wijaya & Rudi Rosadi & Indra Sarathan & Yusa Djuyandi & Anton Satria Prabuwono, 2023. "Analysis of Government Policy Sentiment Regarding Vacation during the COVID-19 Pandemic Using the Bidirectional Encoder Representation from Transformers (BERT)," Data, MDPI, vol. 8(3), pages 1-17, February.
    2. Ewen Hokijuliandy & Herlina Napitupulu & Firdaniza, 2023. "Application of SVM and Chi-Square Feature Selection for Sentiment Analysis of Indonesia’s National Health Insurance Mobile Application," Mathematics, MDPI, vol. 11(17), pages 1-21, September.
    3. Zaher Salah & Esraa Abu Elsoud, 2023. "Enhancing Network Security: A Machine Learning-Based Approach for Detecting and Mitigating Krack and Kr00k Attacks in IEEE 802.11," Future Internet, MDPI, vol. 15(8), pages 1-21, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:comaot:v:25:y:2019:i:3:d:10.1007_s10588-018-9266-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.