IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v32y2023i4d10.1007_s10260-023-00691-4.html
   My bibliography  Save this article

Semi-supervised sentiment clustering on natural language texts

Author

Listed:
  • Luca Frigau

    (University of Cagliari)

  • Maurizio Romano

    (University of Cagliari)

  • Marco Ortu

    (University of Cagliari)

  • Giulia Contu

    (University of Cagliari)

Abstract

In this paper, we propose a semi-supervised method to cluster unstructured textual data called semi-supervised sentiment clustering on natural language texts. The aim is to identify clusters homogeneous with respect to the overall sentiment of the texts analyzed. The method combines different techniques and methodologies: Sentiment Analysis, Threshold-based Naïve Bayes classifier, and Network-based Semi-supervised Clustering. It involves different steps. In the first step, the unstructured text is transformed into structured text, and it is categorized into positive or negative classes using a sentiment analysis algorithm. In the second step, the Threshold-based Naïve Bayes classifier is applied to identify the overall sentiment of the texts and to define a specific sentiment value for the topics. In the last step, Network-based Semi-supervised Clustering is applied to partition the instances into disjoint groups. The proposed algorithm is tested on a collection of reviews written by customers on Booking.com. The results have highlighted the capacity of the proposed algorithm to identify clusters that are distinct, non-overlapped, and homogeneous with respect to the overall sentiment. Results are also easily interpretable thanks to the network representation of the instances that helps to understand the relationship between them.

Suggested Citation

  • Luca Frigau & Maurizio Romano & Marco Ortu & Giulia Contu, 2023. "Semi-supervised sentiment clustering on natural language texts," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(4), pages 1239-1257, October.
  • Handle: RePEc:spr:stmapp:v:32:y:2023:i:4:d:10.1007_s10260-023-00691-4
    DOI: 10.1007/s10260-023-00691-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10260-023-00691-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10260-023-00691-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zeileis, Achim & Leisch, Friedrich & Hornik, Kurt & Kleiber, Christian, 2002. "strucchange: An R Package for Testing for Structural Change in Linear Regression Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 7(i02).
    2. Zeileis, Achim & Kleiber, Christian & Kramer, Walter & Hornik, Kurt, 2003. "Testing and dating of structural changes in practice," Computational Statistics & Data Analysis, Elsevier, vol. 44(1-2), pages 109-123, October.
    3. Gandomi, Amir & Haider, Murtaza, 2015. "Beyond the hype: Big data concepts, methods, and analytics," International Journal of Information Management, Elsevier, vol. 35(2), pages 137-144.
    4. Sparks, Beverley A. & Perkins, Helen E. & Buckley, Ralf, 2013. "Online travel reviews as persuasive communication: The effects of content type, source, and certification logos on consumer behavior," Tourism Management, Elsevier, vol. 39(C), pages 1-9.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Patrik Nosil & Zachariah Gompert & Daniel J. Funk, 2024. "Divergent dynamics of sexual and habitat isolation at the transition between stick insect populations and species," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Zeileis, Achim, 2006. "Implementing a class of structural change tests: An econometric computing approach," Computational Statistics & Data Analysis, Elsevier, vol. 50(11), pages 2987-3008, July.
    3. James Nolan & Zoe Laulederkind, 2022. "Plane to See? Empirical Analysis of the 1999–2006 Air Cargo Cartel," Advances in Airline Economics, in: The International Air Cargo Industry, volume 9, pages 241-262, Emerald Group Publishing Limited.
    4. Ashok Chanabasangouda Patil & Shailesh Rastogi, 2020. "Multifractal Analysis of Market Efficiency across Structural Breaks: Implications for the Adaptive Market Hypothesis," JRFM, MDPI, vol. 13(10), pages 1-18, October.
    5. João Sousa Andrade & Adelaide Duarte & Marta Simões, 2011. "Inequality and Growth in Portugal: a time series analysis," GEMF Working Papers 2011-11, GEMF, Faculty of Economics, University of Coimbra.
    6. de Silva, Ashton J & Boymal, Jonathan & Potts, Jason & Thomas, Stuart, 2015. "Does innovation in residential mortgage products explain rising house prices? No," MPRA Paper 62548, University Library of Munich, Germany.
    7. DIMA, Bogdan & DIMA, Ştefana Maria & IOAN, Roxana, 2021. "Remarks on the behaviour of financial market efficiency during the COVID-19 pandemic. The case of VIX," Finance Research Letters, Elsevier, vol. 43(C).
    8. Abhijit Sharma & Kelvin G Balcombe & Iain M Fraser, 2009. "Non-renewable resource prices: Structural breaks and long term trends," Economics Bulletin, AccessEcon, vol. 29(2), pages 805-819.
    9. Fabio Clementi & Marco Gallegati & Mauro Gallegati, 2015. "Growth and Cycles of the Italian Economy Since 1861: The New Evidence," Italian Economic Journal: A Continuation of Rivista Italiana degli Economisti and Giornale degli Economisti, Springer;Società Italiana degli Economisti (Italian Economic Association), vol. 1(1), pages 25-59, March.
    10. Bennett Kleinberg & Isabelle Vegt & Paul Gill, 2021. "The temporal evolution of a far-right forum," Journal of Computational Social Science, Springer, vol. 4(1), pages 1-23, May.
    11. Liu, Guanchun & He, Lei & Yue, Yiding & Wang, Jiying, 2014. "The linkage between insurance activity and banking credit: Some evidence from dynamic analysis," The North American Journal of Economics and Finance, Elsevier, vol. 29(C), pages 239-265.
    12. F. Peters & J. P. Mackenbach & W. J. Nusselder, 2016. "Does the Impact of the Tobacco Epidemic Explain Structural Changes in the Decline of Mortality?," European Journal of Population, Springer;European Association for Population Studies, vol. 32(5), pages 687-702, December.
    13. Pouliot, Sébastien, 2012. "On the Economics of Adulteration in Food Imports: Application to US Fish and Seafood Imports," Working Papers 148596, Structure and Performance of Agriculture and Agri-products Industry (SPAA).
    14. Christos Katris & Manolis G. Kavussanos, 2021. "Time series forecasting methods for the Baltic dry index," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1540-1565, December.
    15. Tighe, Kara & Piggott, Nicholas & Nicholas, Oscar & Mounter, Stuart & Villano, Renato, 2019. "Testing for pre-committed quantities of Australian meat demand," Australian Journal of Agricultural and Resource Economics, Australian Agricultural and Resource Economics Society, vol. 60(2), April.
    16. Kleiber, Christian, 2016. "Structural Change in (Economic) Time Series," Working papers 2016/06, Faculty of Business and Economics - University of Basel.
    17. Sinclair Davidson & Ashton de Silva, 2018. "Did Recent Tobacco Reforms Change the Cigarette Market?," Economic Papers, The Economic Society of Australia, vol. 37(1), pages 55-74, March.
    18. Aurelio Fernández Bariviera & M. Belén Guercio & Lisana B. Martinez, 2014. "Informational Efficiency in Distressed Markets: The Case of European Corporate Bonds," The Economic and Social Review, Economic and Social Studies, vol. 45(3), pages 349-369.
    19. Gary W. Brester & Kole Swanser & Brett Crosby, 2021. "Adding Weight to a Thinning Live Cattle Market," MSU Staff Papers 310364, Montana State University > Department of Agricultural Economics and Economics.
    20. Kufenko, Vadim & Prettner, Klaus & Geloso, Vincent, 2020. "Divergence, convergence, and the history-augmented Solow model," Structural Change and Economic Dynamics, Elsevier, vol. 53(C), pages 62-76.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:32:y:2023:i:4:d:10.1007_s10260-023-00691-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.