FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

Author

Listed:

Xiao-Yang Liu
Guoxuan Wang
Hongyang Yang
Daochen Zha

Abstract

Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available, and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, Financial Generative Pre-trained Transformer (FinGPT), that automates the collection and curation of real-time financial data from 34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes have been open-sourced.

Suggested Citation

Xiao-Yang Liu & Guoxuan Wang & Hongyang Yang & Daochen Zha, 2023. "FinGPT: Democratizing Internet-scale Data for Financial Large Language Models," Papers 2307.10485, arXiv.org, revised Nov 2023.

Handle: RePEc:arx:papers:2307.10485

Download full text from publisher

References listed on IDEAS

Boyu Zhang & Hongyang Yang & Xiao-Yang Liu, 2023. "Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models," Papers 2306.12659, arXiv.org.
Zheng Tracy Ke & Bryan T. Kelly & Dacheng Xiu, 2019. "Predicting Returns With Text Data," NBER Working Papers 26186, National Bureau of Economic Research, Inc.
Xiao-Yang Liu & Ziyi Xia & Jingyang Rui & Jiechao Gao & Hongyang Yang & Ming Zhu & Christina Dan Wang & Zhaoran Wang & Jian Guo, 2022. "FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning," Papers 2211.03107, arXiv.org.
David Byrd & Antigoni Polychroniadou, 2020. "Differentially Private Secure Multi-Party Computation for Federated Learning in Financial Applications," Papers 2010.05867, arXiv.org.
Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
- Pekka Malo & Ankur Sinha & Pyry Takala & Pekka Korhonen & Jyrki Wallenius, 2013. "Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts," Papers 1307.5336, arXiv.org, revised Jul 2013.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Fernando Spadea & Oshani Seneviratne, 2025. "Aligning Language Models with Investor and Market Behavior for Financial Recommendations," Papers 2510.15993, arXiv.org.
Guojun Xiong & Zhiyang Deng & Keyi Wang & Yupeng Cao & Haohang Li & Yangyang Yu & Xueqing Peng & Mingquan Lin & Kaleb E Smith & Xiao-Yang Liu & Jimin Huang & Sophia Ananiadou & Qianqian Xie, 2025. "FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading," Papers 2502.11433, arXiv.org, revised Feb 2025.
Masanori Hirano & Kentaro Imajo, 2024. "The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging," Papers 2409.19854, arXiv.org.
Saber Talazadeh & Dragan Perakovic, 2024. "SARF: Enhancing Stock Market Prediction with Sentiment-Augmented Random Forest," Papers 2410.07143, arXiv.org.
Kefan Chen & Hussain Ahmad & Diksha Goel & Claudia Szabo, 2025. "3S-Trader: A Multi-LLM Framework for Adaptive Stock Scoring, Strategy, and Selection in Portfolio Optimization," Papers 2510.17393, arXiv.org.
Yuan Li & Bingqiao Luo & Qian Wang & Nuo Chen & Xu Liu & Bingsheng He, 2024. "A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading," Papers 2407.09546, arXiv.org.
Junzhe Jiang & Chang Yang & Aixin Cui & Sihan Jin & Ruiyu Wang & Bo Li & Xiao Huang & Dongning Sun & Xinrun Wang, 2025. "FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs," Papers 2505.13533, arXiv.org.
David Kuo Chuen Lee & Chong Guan & Yinghui Yu & Qinxu Ding, 2024. "A Comprehensive Review of Generative AI in Finance," FinTech, MDPI, vol. 3(3), pages 1-19, September.
Yichen Luo & Yebo Feng & Jiahua Xu & Paolo Tasca & Yang Liu, 2025. "LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management," Papers 2501.00826, arXiv.org, revised Jan 2025.
Haohang Li & Yupeng Cao & Yangyang Yu & Shashidhar Reddy Javaji & Zhiyang Deng & Yueru He & Yuechen Jiang & Zining Zhu & Koduvayur Subbalakshmi & Guojun Xiong & Jimin Huang & Lingfei Qian & Xueqing Pe, 2024. "INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent," Papers 2412.18174, arXiv.org.
Thanos Konstantinidis & Giorgos Iacovides & Mingxue Xu & Tony G. Constantinides & Danilo Mandic, 2024. "FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications," Papers 2403.12285, arXiv.org.
Wo Long & Wenxin Zeng & Xiaoyu Zhang & Ziyao Zhou, 2025. "Integrating Large Language Models and Reinforcement Learning for Sentiment-Driven Quantitative Trading," Papers 2510.10526, arXiv.org.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Thanos Konstantinidis & Giorgos Iacovides & Mingxue Xu & Tony G. Constantinides & Danilo Mandic, 2024. "FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications," Papers 2403.12285, arXiv.org.
Toby Barter & Zheng Gao & Eva Christodoulaki & Jing Chen & John Cartlidge, 2025. "BondBERT: What we learn when assigning sentiment in the bond market," Papers 2511.01869, arXiv.org, revised Dec 2025.
Giorgos Iacovides & Wuyang Zhou & Danilo Mandic, 2025. "FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs," Papers 2507.18417, arXiv.org.
Yuqi Nie & Yaxuan Kong & Xiaowen Dong & John M. Mulvey & H. Vincent Poor & Qingsong Wen & Stefan Zohren, 2024. "A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges," Papers 2406.11903, arXiv.org.
Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," Finance Research Letters, Elsevier, vol. 62(PB).
- Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," LSE Research Online Documents on Economics 122592, London School of Economics and Political Science, LSE Library.
- Kemal Kirtac & Guido Germano, 2024. "Sentiment trading with large language models," Papers 2412.19245, arXiv.org.
Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2022. "Media-expressed tone, option characteristics, and stock return predictability," Journal of Economic Dynamics and Control, Elsevier, vol. 134(C).
- Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2019. "Media-expressed tone, Option Characteristics, and Stock Return Predictability," IRTG 1792 Discussion Papers 2019-015, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
Eghbal Rahimikia & Stefan Zohren & Ser-Huang Poon, 2021. "Realised Volatility Forecasting: Machine Learning via Financial Word Embedding," Papers 2108.00480, arXiv.org, revised Jan 2026.
Shunyao Wang & Ming Cheng & Christina Dan Wang, 2025. "NewsNet-SDF: Stochastic Discount Factor Estimation with Pretrained Language Model News Embeddings via Adversarial Networks," Papers 2505.06864, arXiv.org.
Fengbin Zhu & Junfeng Li & Liangming Pan & Wenjie Wang & Fuli Feng & Chao Wang & Huanbo Luan & Tat-Seng Chua, 2025. "Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance," Papers 2503.05185, arXiv.org, revised Aug 2025.
Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
Travis Adams & Andrea Ajello & Diego Silva & Francisco Vazquez-Grande, 2023. "More than Words: Twitter Chatter and Financial Market Sentiment," Papers 2305.16164, arXiv.org.
Chandan Singh & Armin Askari & Rich Caruana & Jianfeng Gao, 2023. "Augmenting interpretable models with large language models during training," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
Yijia Xiao & Edward Sun & Tong Chen & Fang Wu & Di Luo & Wei Wang, 2025. "Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning," Papers 2509.11420, arXiv.org.
Liyuan Chen & Shuoling Liu & Jiangpeng Yan & Xiaoyu Wang & Henglin Liu & Chuang Li & Kecheng Jiao & Jixuan Ying & Yang Veronica Liu & Qiang Yang & Xiu Li, 2025. "Advancing Financial Engineering with Foundation Models: Progress, Applications, and Challenges," Papers 2507.18577, arXiv.org, revised Dec 2025.
Dolaeva, Aishat & Beliaeva, Uliana & Grigoriev, Dmitry & Semenov, Alexander & Rysz, Maciej, 2025. "Analyzing and forecasting P/E ratios using investor sentiment in panel data regression and LSTM models," International Review of Economics & Finance, Elsevier, vol. 98(C).
Borchert, Philipp & Coussement, Kristof & De Weerdt, Jochen & De Caigny, Arno, 2024. "Industry-sensitive language modeling for business," European Journal of Operational Research, Elsevier, vol. 315(2), pages 691-702.
Martina Halouskov'a & v{S}tefan Ly'ocsa, 2025. "Forecasting U.S. equity market volatility with attention and sentiment to the economy," Papers 2503.19767, arXiv.org.
García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
Priyank Sonkiya & Vikas Bajpai & Anukriti Bansal, 2021. "Stock price prediction using BERT and GAN," Papers 2107.09055, arXiv.org.
Longbing Cao, 2021. "AI in Finance: Challenges, Techniques and Opportunities," Papers 2107.09051, arXiv.org.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2023-08-28 (Big Data)
NEP-CMP-2023-08-28 (Computational Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2307.10485. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data