Author
Listed:
- Sawitree Srianan
(Prince of Songkla University)
- Aziz Nanthaamornphong
(Prince of Songkla University)
- Chayanon Phucharoen
(Prince of Songkla University)
Abstract
Tourism sentiment analysis faces substantial challenges due to class imbalance and the complex linguistic features of user-generated content. This study systematically compares eight sentiment classification models, spanning traditional machine learning (naïve Bayes, support vector machines, logistic regression), deep learning (convolutional neural networks, long short-term memory networks [LSTMs], gated recurrent units [GRUs]), and transformer-based architectures (RoBERTa in two configurations: pretrained and fine-tuned), using a dataset of 505,980 TripAdvisor reviews. We evaluate model performance under imbalanced class conditions and examine the effectiveness of three oversampling techniques—SMOTE, ADASYN, and RandomOverSampler—in mitigating class bias. The results reveal significant performance disparities across architectures. Deep learning models, particularly LSTM (91.06% accuracy, Cohen’s kappa = 0.6846) and GRU (90.82% accuracy, Cohen’s kappa = 0.6781), consistently outperform traditional approaches. Fine-tuned RoBERTa achieved the highest performance, with 92.31% accuracy, a 95.34% F1-score, and Cohen’s kappa = 0.7321. Traditional models showed notable limitations; for example, naïve Bayes exhibited strong majority-class bias, despite an accuracy of 82.35% (Cohen’s kappa = 0.0054). Among the oversampling methods, SMOTE was the most effective in improving the fairness of traditional models, while RoBERTa’s fine-tuning process inherently mitigated class imbalance. A computational analysis highlights key trade-offs: traditional models train quickly but require oversampling, deep learning offers a balanced trade-off between performance and efficiency, and transformer models provide state-of-the-art accuracy at the cost of high computational resources. These findings offer evidence-based guidance for selecting appropriate models for tourism sentiment analysis.
Suggested Citation
Sawitree Srianan & Aziz Nanthaamornphong & Chayanon Phucharoen, 2025.
"Advancing tourism sentiment analysis: a comparative evaluation of traditional machine learning, deep learning, and transformer models on imbalanced datasets,"
Information Technology & Tourism, Springer, vol. 27(4), pages 1011-1045, December.
Handle:
RePEc:spr:infott:v:27:y:2025:i:4:d:10.1007_s40558-025-00336-0
DOI: 10.1007/s40558-025-00336-0
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infott:v:27:y:2025:i:4:d:10.1007_s40558-025-00336-0. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.