IDEAS home Printed from https://ideas.repec.org/a/prg/jnlaip/v2021y2021i2id155p155-171.html
   My bibliography  Save this article

Sentiment Analysis for Thai Language in Hotel Domain Using Machine Learning Algorithms

Author

Listed:
  • Nattawat Khamphakdee
  • Pusadee Seresangtakul

Abstract

Sentiment analysis is one of the most frequently used aspects of Natural Language Processing (NLP), which utilizes the polarity classification of reviews expressed at the aspect, sentence or document level. Several businesses and organizations utilize this technique to improve production, as well as employee and service efficiency. However, the users' reviews in our study were expressed in an unstructured data form, which contained spelling errors, leading to complex classifications for both the users and the machine. To solve the problem, a supervised technique of Machine Learning (ML) algorithms can be applied to the data extraction, where classification polarity can be categorized into a positive, negative or neutral class. In this research, we compared nine ML algorithms to determine the most suitable ML algorithm for creating sentiment polarity classification of customer reviews in Thai, which is a low-resource language. The dataset was collected manually from two online agencies (Agoda.com and Booking.com) utilizing a special Thai language. We employed 11 preprocessing steps to clean and handle the large amount of noise data. Next, the Delta TF-IDF, TF-IDF, N-Gram, and Word2Vec techniques were applied to convert the text reviews into vectors, processed with different ML algorithms, to determine sentiment polarity classification and to make accurate comparisons. All ML algorithms were evaluated for sentiment polarity classification with ten-fold cross-validation, with which to compare the values of recall, precision, F1-score and accuracy. The experiment results show that the Support Vector Machine (SVM) using the Delta TF-IDF technique was the best ML algorithm for polarity classification of hotel reviews in the Thai language with the highest accuracy of 89.96%. The results of this research can be applied as the tool for small and medium-sized enterprises within the field of sentiment analysis of the Thai language in the hotel domain.

Suggested Citation

  • Nattawat Khamphakdee & Pusadee Seresangtakul, 2021. "Sentiment Analysis for Thai Language in Hotel Domain Using Machine Learning Algorithms," Acta Informatica Pragensia, Prague University of Economics and Business, vol. 2021(2), pages 155-171.
  • Handle: RePEc:prg:jnlaip:v:2021:y:2021:i:2:id:155:p:155-171
    DOI: 10.18267/j.aip.155
    as

    Download full text from publisher

    File URL: http://aip.vse.cz/doi/10.18267/j.aip.155.html
    Download Restriction: free of charge

    File URL: http://aip.vse.cz/doi/10.18267/j.aip.155.pdf
    Download Restriction: free of charge

    File URL: https://libkey.io/10.18267/j.aip.155?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaip:v:2021:y:2021:i:2:id:155:p:155-171. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.