IDEAS home Printed from https://ideas.repec.org/a/aag/wpaper/v30y2026i2p114-156.html

Review Helpfulness to Support Business: Identifying Fake Reviews from User-Generated Content Using Random Forest

Author

Listed:
  • Syed Imran Abbas Qazmi

    (Lincoln University College, Malaysia)

  • Midhun Chakkaravarthy

    (Faculty of Computer Science and Multimedia, Lincoln University College, Malaysia)

  • Syed Hassan Raza

    (School of Media and Communication, Taylor's University, 47500 Subang Jaya, Selangor, Malaysia)

  • Farrah Aslam

    (Department of Information Sciences, University of Education Lahore, Jauharabad Campus, Pakistan)

  • Shahbaz Aslam

    (Department of Media and Communication Studies, Comsat University, Lahore 54000, Pakistan)

  • Moneeba Iftikhar

    (Department of Mass Communication, Lahore College for Women University, Lahore 54000, Pakistan)

Abstract

[Purpose] Valid and helpful reviews on an e-commerce platform provide important information regarding customers' perception of a product, which is crucial to the existence and growth of any business. False reviews, which are created to tarnish a product's image through spam fraudulently, continue to be a significant challenge for all e-commerce platforms. Another challenge remains in identifying helpful review content on the platform that can significantly alter a customer's opinion of a product. Hence, the increasing prevalence of fake and unhelpful reviews compromises the credibility of online reviews, resulting in information overload and a misleading consumer decision-making process. Motivated by this challenge, this study aims to develop an automated system capable of retaining only applicable and valid reviews to support the identification of customer needs, which is a valuable area of research. [Design/methodology/approach] This study involves three main aspects: helpfulness classification, fake review detection, and topic identification on various categories of the Amazon Dataset. The model leveraged a feature set that included the sentiment polarity of the review in detail, word count indicating the length of feedback, word diversity in the review, comprehension analysis of parts of speech in the review reflecting its grammatical structure and complexity, and authenticity metrics. Moreover, for helpful review classification, the utilized features included review and product metadata, review content informativeness score encoded with the help of Sentence Bidirectional Encoder Representations from Transformers (SBERT), and reviewer attributes. A topic extraction model has been implemented that leverages Gemini to extract sentiment-based topic analysis over reviews. [Findings] The study provides useful reviews classification over 6 different Amazon categories using a Random Forest classifier (RFC) by achieving 94% accuracy, precision, and F1-Score, a recall of 93%, and an AUC Score of 98%. While the Gradient Boosting classifier yielded comparable performance with an AUC Score of 98% and 94% accuracy, precision, recall, and F1-Score. For fake reviews detection in the Toys and Games category, the RFC achieved 85% accuracy, 86% precision, a 97% recall, 91% F1-Score, and 79% AUC Score. The findings indicate that combining textual, semantic, reviewer, and product-level features can improve the reliability of review quality assessment. Finally, to enhance the decision-making process for businesses, a topic extraction model utilizing the Gemini tool has been employed to extract significant topics from valid and helpful reviews, categorizing them separately into negative and positive reviews, thereby gaining nuanced insights into customer feedback. [Originality/value] Unlike prior studies that either examine review helpfulness or fake review detection in isolation, this study moves beyond single-task and small-sample-based approaches. Our proposed framework offers a comprehensive analysis of patterns in reviews across e-commerce platforms, thereby enhancing brands' ability to integrate customer needs and expectations into future marketing communications and advertising campaigns. This study contributes to Decision Sciences by proposing a data-driven two-stage framework that retains only helpful and valid reviews to enhance content quality, thereby practically supporting better decision-making by content moderation, reducing information overload, and improving consumer trust in reviews.

Suggested Citation

  • Syed Imran Abbas Qazmi & Midhun Chakkaravarthy & Syed Hassan Raza & Farrah Aslam & Shahbaz Aslam & Moneeba Iftikhar, 2026. "Review Helpfulness to Support Business: Identifying Fake Reviews from User-Generated Content Using Random Forest," Advances in Decision Sciences, Asia University, Taiwan, vol. 30(2), pages 114-156, June.
  • Handle: RePEc:aag:wpaper:v:30:y:2026:i:2:p:114-156
    as

    Download full text from publisher

    File URL: https://iads.site/review-helpfulness-to-support-business-identifying-fake-reviews-from-user-generated-content-using-random-forest/
    Download Restriction: no

    File URL: https://iads.site/wp-content/uploads/2026/03/Review-Helpfulness-to-Support-Business-Identifying-Fake-Reviews-from-User-Generated-Content-Using-Random-Forest.pdf
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • L81 - Industrial Organization - - Industry Studies: Services - - - Retail and Wholesale Trade; e-Commerce
    • O33 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Technological Change: Choices and Consequences; Diffusion Processes
    • L86 - Industrial Organization - - Industry Studies: Services - - - Information and Internet Services; Computer Software
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aag:wpaper:v:30:y:2026:i:2:p:114-156. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Vincent Pan (email available below). General contact details of provider: https://edirc.repec.org/data/dfasitw.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.