Author
Listed:
- Litvinenko Alexey
(University of Tartu, School of Economics and Business Administration, Tartu, Estonia)
- Samuli Saarinen
(Estonian Business School, Tallinn, Estonia)
- Litvinenko Anna
(Tallinn University of Technology, Department of Business Administration, Tallinn, Estonia)
Abstract
Research purpose. This study explores whether R programming can transform unstructured qualitative social media data into a quantitative format suitable for econometric modelling. It specifically examines how elements such as text, emojis, and sentiment from Reddit and X (formerly Twitter) can be converted into variables for regression analysis. With the aim to enhance the predictive power of traditional financial models using alternative data sources, the paper outlines comprehensive guidelines with specific technical steps, from scripting an API to extracting data from Reddit and X, through cleaning and tokenising to incorporating the data into regression models using R programming. The study addresses the growing need in financial economics to incorporate alternative data streams by offering a structured, replicable process for transforming high-volume, unstructured online content into statistically valid variables, thereby bridging the gap between qualitative market sentiment and quantitative modelling. Design / Methodology / Approach. Focusing on the methodology and R scripts, this research adopts a quantitative approach, transforming qualitative social media data into a format suitable for multiple linear and instrumental variable regression models to assess the effect of social media signals on asset prices, with GameStop (GME) and Best Buy (BBY) as case studies. The process ensures reproducibility and includes open-source code, enhancing transparency and applicability for both academic and professional financial data analysis contexts. Findings. The findings demonstrate that qualitative social media data can be quantified for financial analysis. It was effectively extracted, cleaned, and used for regression analysis. Results show that traditional market indicators fail to explain GME’s price shifts, while the frequency of rocket emojis (interpreted as speculative sentiment) was statistically significant. BBY’s returns, however, aligned more closely with market and industry indices, suggesting a lower influence of private sentiment. Originality / Value / Practical implications. The research provides a replicable method for integrating social media data into econometric models, contributing new tools for analysing market sentiment. By adapting classical financial models to modern data sources, the paper opens new directions for asset pricing research. The paper provides technical tools created in R for use in econometric analysis, useful both for academics and practitioners.
Suggested Citation
Litvinenko Alexey & Samuli Saarinen & Litvinenko Anna, 2025.
"The Technological Bridge: R Programming’s Utility in Converting Social Media Data for Quantitative Financial Analysis,"
Economics and Culture, Sciendo, vol. 22(1), pages 70-80.
Handle:
RePEc:vrs:ecocul:v:22:y:2025:i:1:p:70-80:n:1006
DOI: 10.2478/jec-2025-0006
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:ecocul:v:22:y:2025:i:1:p:70-80:n:1006. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.