FinTextSim: Enhancing Financial Text Analysis with BERTopic
Author
Abstract
Suggested Citation
Download full text from publisher
References listed on IDEAS
- Daniel D. Lee & H. Sebastian Seung, 1999. "Learning the parts of objects by non-negative matrix factorization," Nature, Nature, vol. 401(6755), pages 788-791, October.
- Chyi-Kwei Yau & Alan Porter & Nils Newman & Arho Suominen, 2014. "Clustering scientific documents with topic modeling," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(3), pages 767-786, September.
- Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
- Gu, Hanchi & Schreyer, Marco & Moffitt, Kevin & Vasarhelyi, Miklos, 2024. "Artificial intelligence co-piloted auditing," International Journal of Accounting Information Systems, Elsevier, vol. 54(C).
- Gu, Yu & Katz, Steven & Wang, Xinxin & Vasarhelyi, Miklos & Dai, Jun, 2024. "Government ESG reporting in smart cities," International Journal of Accounting Information Systems, Elsevier, vol. 54(C).
- Dong, Mengming Michael & Stratopoulos, Theophanis C. & Wang, Victor Xiaoqi, 2024.
"A scoping review of ChatGPT research in accounting and finance,"
International Journal of Accounting Information Systems, Elsevier, vol. 55(C).
- Mengming Michael Dong & Theophanis C. Stratopoulos & Victor Xiaoqi Wang, 2024. "A Scoping Review of ChatGPT Research in Accounting and Finance," Papers 2412.05731, arXiv.org.
- Jinzhi Lu, 2022. "Limited Attention: Implications for Financial Reporting," Journal of Accounting Research, Wiley Blackwell, vol. 60(5), pages 1991-2027, December.
- Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
- Michelle Lowry & Roni Michaely & Ekaterina Volkova & Francesca Cornelli, 2020. "Information Revealed through the Regulatory Process: Interactions between the SEC and Companies ahead of Their IPO," The Review of Financial Studies, Society for Financial Studies, vol. 33(12), pages 5510-5554.
- Bhattacharya, Indranil & Mickovic, Ana, 2024. "Accounting fraud detection using contextual language learning," International Journal of Accounting Information Systems, Elsevier, vol. 53(C).
- Gao, Yang & Liu, Siqiang & Yang, Lu, 2025. "Artificial intelligence and innovation capability: A dynamic capabilities perspective," International Review of Economics & Finance, Elsevier, vol. 98(C).
- Yang Bao & Anindya Datta, 2014. "Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures," Management Science, INFORMS, vol. 60(6), pages 1371-1391, June.
- Borrero-Domínguez, Cinta & Cortijo-Gallego, Virginia & Escobar-Rodríguez, Tomás, 2024. "Digital transformation voluntary disclosure: Insights from leading European companies," International Journal of Accounting Information Systems, Elsevier, vol. 55(C).
- Shira Cohen & Igor Kadach & Gaizka Ormazabal & Stefan Reichelstein, 2023. "Executive Compensation Tied to ESG Performance: International Evidence," Journal of Accounting Research, Wiley Blackwell, vol. 61(3), pages 805-853, June.
- Fengler, Matthias & Phan, Minh Tri, 2023. "A Topic Model for 10-K Management Disclosures," Economics Working Paper Series 2307, University of St. Gallen, School of Economics and Political Science.
- Aaryan Gupta & Vinya Dengre & Hamza Abubakar Kheruwala & Manan Shah, 2020. "Comprehensive review of text-mining applications in finance," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 6(1), pages 1-25, December.
- Murphy, Brid & Feeney, Orla & Rosati, Pierangelo & Lynn, Theo, 2024. "Exploring accounting and AI using topic modelling," International Journal of Accounting Information Systems, Elsevier, vol. 55(C).
- Abukari, Kobana & Dutta, Shantanu & Li, Chen & Tang, Songlian & Zhu, Pengcheng, 2024. "Corporate communication and likelihood of data breaches," International Review of Economics & Finance, Elsevier, vol. 94(C).
- Dutta, Shantanu & Fuksa, Michel & Macaulay, Ken, 2019. "Determinants of MD&A sentiment in Canada," International Review of Economics & Finance, Elsevier, vol. 60(C), pages 130-148.
- Patricia M. Dechow & Ryan D. Erhard & Richard G. Sloan & And Mark T. Soliman, 2021. "Implied Equity Duration: A Measure of Pandemic Shutdown Risk," Journal of Accounting Research, Wiley Blackwell, vol. 59(1), pages 243-281, March.
- Gustaf Bellstam & Sanjai Bhagat & J. Anthony Cookson, 2021. "A Text-Based Analysis of Corporate Innovation," Management Science, INFORMS, vol. 67(7), pages 4004-4031, July.
- Ferriani, Fabrizio & Gazzani, Andrea, 2023.
"The invasion of Ukraine and the energy crisis: Comparative advantages in equity valuations,"
Finance Research Letters, Elsevier, vol. 58(PD).
- Fabrizio Ferriani & Andrea Gazzani, 2023. "The invasion of Ukraine and the energy crisis: comparative advantages in equity valuations," Questioni di Economia e Finanza (Occasional Papers) 789, Bank of Italy, Economic Research and International Relations Area.
- Lin, Hui & Hwang, Yujong, 2021. "The effects of personal information management capabilities and social-psychological factors on accounting professionals’ knowledge-sharing intentions: Pre and post COVID-19," International Journal of Accounting Information Systems, Elsevier, vol. 42(C).
- Miao Liu, 2022. "Assessing Human Information Processing in Lending Decisions: A Machine Learning Approach," Journal of Accounting Research, Wiley Blackwell, vol. 60(2), pages 607-651, May.
- Li, Tong & Chen, Hui & Liu, Wei & Yu, Guang & Yu, Yongtian, 2023. "Understanding the role of social media sentiment in identifying irrational herding behavior in the stock market," International Review of Economics & Finance, Elsevier, vol. 87(C), pages 163-179.
- Booker, Adam & Chiu, Victoria & Groff, Nathan & Richardson, Vernon J., 2024. "AIS research opportunities utilizing Machine Learning: From a Meta-Theory of accounting literature," International Journal of Accounting Information Systems, Elsevier, vol. 52(C).
- Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
- Nerissa C. Brown & Richard M. Crowley & W. Brooke Elliott, 2020. "What Are You Saying? Using topic to Detect Financial Misreporting," Journal of Accounting Research, Wiley Blackwell, vol. 58(1), pages 237-291, March.
- Canelli, Rosa & Fontana, Giuseppe & Realfonzo, Riccardo & Passarella, Marco Veronese, 2024. "Energy crisis, economic growth and public finance in Italy," Energy Economics, Elsevier, vol. 132(C).
- Feng Li, 2010. "The Information Content of Forward‐Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach," Journal of Accounting Research, Wiley Blackwell, vol. 48(5), pages 1049-1102, December.
- Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Fengler, Matthias & Phan, Minh Tri, 2023. "A Topic Model for 10-K Management Disclosures," Economics Working Paper Series 2307, University of St. Gallen, School of Economics and Political Science.
- Simon Fritzsch & Philipp Scharner & Gregor Weiß, 2021. "Estimating the relation between digitalization and the market value of insurers," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 88(3), pages 529-567, September.
- Richard Frankel & Jared Jennings & Joshua Lee, 2022. "Disclosure Sentiment: Machine Learning vs. Dictionary Methods," Management Science, INFORMS, vol. 68(7), pages 5514-5532, July.
- James P. Ryans, 2021. "Textual classification of SEC comment letters," Review of Accounting Studies, Springer, vol. 26(1), pages 37-80, March.
- Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
- Tri Minh Phan, 2024. "Sentiment-semantic word vectors: A new method to estimate management sentiment," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 160(1), pages 1-22, December.
- John Donovan & Jared Jennings & Kevin Koharki & Joshua Lee, 2021. "Measuring credit risk using qualitative disclosure," Review of Accounting Studies, Springer, vol. 26(2), pages 815-863, June.
- Yi Yang & Kunpeng Zhang & Yangyang Fan, 2023. "sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics," Information Systems Research, INFORMS, vol. 34(1), pages 137-156, March.
- Everett, Jeff & Shiraz Rahaman, Abu & Neu, Dean & Saxton, Gregory, 2024. "Letters to the editor, institutional experimentation, and the public accounting professional," CRITICAL PERSPECTIVES ON ACCOUNTING, Elsevier, vol. 99(C).
- Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2022.
"Media-expressed tone, option characteristics, and stock return predictability,"
Journal of Economic Dynamics and Control, Elsevier, vol. 134(C).
- Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2019. "Media-expressed tone, Option Characteristics, and Stock Return Predictability," IRTG 1792 Discussion Papers 2019-015, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
- Pastwa, Anna M. & Shrestha, Prabal & Thewissen, James & Torsin, Wouter, 2021.
"Unpacking the black box of ICO white papers: a topic modeling approach,"
LIDAM Discussion Papers LFIN
2021018, Université catholique de Louvain, Louvain Finance (LFIN).
- Pastwa, Anna M. & Shrestha, Prabal & Thewissen, James & Torsin, Wouter, 2022. "Unpacking the black box of ICO white papers: a topic modeling approach," LIDAM Reprints LFIN 2022005, Université catholique de Louvain, Louvain Finance (LFIN).
- Allen H. Huang & Jianghua Shen & Amy Y. Zang, 2022. "The unintended benefit of the risk factor mandate of 2005," Review of Accounting Studies, Springer, vol. 27(4), pages 1319-1355, December.
- Özgür Arslan‐Ayaydin & James Thewissen & Wouter Torsin, 2021. "Disclosure tone management and labor unions," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 48(1-2), pages 102-147, January.
- Borchert, Philipp & Coussement, Kristof & De Weerdt, Jochen & De Caigny, Arno, 2024. "Industry-sensitive language modeling for business," European Journal of Operational Research, Elsevier, vol. 315(2), pages 691-702.
- Berkin, Anil & Aerts, Walter & Van Caneghem, Tom, 2023. "Feasibility analysis of machine learning for performance-related attributional statements," International Journal of Accounting Information Systems, Elsevier, vol. 48(C).
- Likun Cao & Jie Ren, 2022. "Machine learning shows that the Covid-19 pandemic is impacting U.S. public companies unequally by changing risk structures," PLOS ONE, Public Library of Science, vol. 17(6), pages 1-18, June.
- Huijue Kelly Duan & Hanxin Hu & Yangin (Ben) Yoon & Miklos Vasarhelyi, 2022. "Increasing the utility of performance audit reports: Using textual analytics tools to improve government reporting," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 29(4), pages 201-218, October.
- Nerissa C. Brown & Richard M. Crowley & W. Brooke Elliott, 2020. "What Are You Saying? Using topic to Detect Financial Misreporting," Journal of Accounting Research, Wiley Blackwell, vol. 58(1), pages 237-291, March.
- Ingrid E. Fisher & Margaret R. Garnsey & Mark E. Hughes, 2016. "Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 23(3), pages 157-214, July.
- Blankespoor, Elizabeth & deHaan, Ed & Marinovic, Iván, 2020. "Disclosure processing costs, investors’ information choice, and equity market outcomes: A review," Journal of Accounting and Economics, Elsevier, vol. 70(2).
More about this item
NEP fields
This paper has been announced in the following NEP Reports:- NEP-BIG-2025-05-12 (Big Data)
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2504.15683. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.