Form 10-K Itemization

My bibliography Save this paper

Form 10-K Itemization

Author

Listed:

Yanci Zhang
Mengjia Xia
Mingyang Li
Haitao Mao
Yutong Lu
Yupeng Lan
Jinlin Ye
Rui Dai

Registered:

Abstract

Form 10-K report is a financial report disclosing the annual financial state of a public company. It is an important evidence to conduct financial analysis, i.e., asset pricing, corporate finance. Practitioners and researchers are constantly designing algorithms to better conduct analysis on information in the Form 10-K report. The vast majority of previous works focus on quantitative data. With recent advancement on natural language processing (NLP), textual data in financial filing attracts more attention. However, to incorporate textual data for analyzing, Form 10-K Itemization is a necessary pre-process step. It aims to segment the whole document into several Item sections, where each Item section focuses on a specific financial aspect of the company. With the segmented Item sections, NLP techniques can directly apply on those Item sections related to downstream tasks. In this paper, we develop a Form 10-K Itemization system which can automatically segment all the Item sections in 10-K documents. The system is both effective and efficient. It reaches a retrieval rate of 93%.

Suggested Citation

Yanci Zhang & Mengjia Xia & Mingyang Li & Haitao Mao & Yutong Lu & Yupeng Lan & Jinlin Ye & Rui Dai, 2023. "Form 10-K Itemization," Papers 2303.04688, arXiv.org.

Handle: RePEc:arx:papers:2303.04688

Download full text from publisher

References listed on IDEAS

Yanci Zhang & Tianming Du & Yujie Sun & Lawrence Donohue & Rui Dai, 2021. "Form 10-Q Itemization," Papers 2104.11783, arXiv.org, revised Oct 2021.
Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
Feng Li & Russell Lundholm & Michael Minnis, 2013. "A Measure of Competition Based on 10‐K Filings," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 51(2), pages 399-436, May.
Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
- Shihao Gu & Bryan T. Kelly & Dacheng Xiu, 2018. "Empirical Asset Pricing via Machine Learning," Swiss Finance Institute Research Paper Series 18-71, Swiss Finance Institute.
- Shihao Gu & Bryan Kelly & Dacheng Xiu, 2018. "Empirical Asset Pricing via Machine Learning," NBER Working Papers 25398, National Bureau of Economic Research, Inc.
Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
Rong Liu & Jujun Huang & Zhongju Zhang, 2023. "Tracking disclosure change trajectories for financial fraud detection," Production and Operations Management, Production and Operations Management Society, vol. 32(2), pages 584-602, February.
Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
Mushtaq, Rizwan & Gull, Ammar Ali & Shahab, Yasir & Derouiche, Imen, 2022. "Do financial performance indicators predict 10-K text sentiments? An application of artificial intelligence," Research in International Business and Finance, Elsevier, vol. 61(C).
Rong Yang & Yang Yu & Manlu Liu & Kean Wu, 2018. "Corporate Risk Disclosure and Audit Fee: A Text Mining Approach," European Accounting Review, Taylor & Francis Journals, vol. 27(3), pages 583-594, May.
Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Cao, Sean & Jiang, Wei & Wang, Junbo & Yang, Baozhong, 2024. "From Man vs. Machine to Man + Machine: The art and AI of stock analyses," Journal of Financial Economics, Elsevier, vol. 160(C).
Obaid, Khaled & Pukthuanthong, Kuntara, 2022. "A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news," Journal of Financial Economics, Elsevier, vol. 144(1), pages 273-297.
Edeling, Alexander & Srinivasan, Shuba & Hanssens, Dominique M., 2021. "The marketing–finance interface: A new integrative review of metrics, methods, and findings and an agenda for future research," International Journal of Research in Marketing, Elsevier, vol. 38(4), pages 857-876.
Cong Wang, 2024. "Stock return prediction with multiple measures using neural network models," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 10(1), pages 1-34, December.
Liu, Yunting & Zhu, Yandi, 2025. "Good idiosyncratic volatility, bad idiosyncratic volatility, and the cross-section of stock returns," Journal of Banking & Finance, Elsevier, vol. 170(C).
Christian Fieberg & Daniel Metko & Thorsten Poddig & Thomas Loy, 2023. "Machine learning techniques for cross-sectional equity returns’ prediction," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 289-323, March.
Paul Handro & Bogdan Dima, 2024. "Analyzing Financial Markets Efficiency: Insights from a Bibliometric and Content Review," Journal of Financial Studies, Institute of Financial Studies, vol. 16(9), pages 119-175, May.
Wu, Hongxu & Wang, Qiao & Li, Jianping & Deng, Zhibin, 2025. "Enhancing stock return prediction in the Chinese market: A GAN-based approach," Research in International Business and Finance, Elsevier, vol. 75(C).
Jiaju Miao & Pawel Polak, 2023. "Online Ensemble of Models for Optimal Predictive Performance with Applications to Sector Rotation Strategy," Papers 2304.09947, arXiv.org.
Doron Avramov & Si Cheng & Lior Metzker, 2023. "Machine Learning vs. Economic Restrictions: Evidence from Stock Return Predictability," Management Science, INFORMS, vol. 69(5), pages 2587-2619, May.
Lioui, Abraham & Tarelli, Andrea, 2022. "Chasing the ESG factor," Journal of Banking & Finance, Elsevier, vol. 139(C).
Xiao, Xiang & Hua, Xia & Qin, Kexin, 2024. "A self-attention based cross-sectional return forecasting model with evidence from the Chinese market," Finance Research Letters, Elsevier, vol. 62(PA).
Alona Bilokha & Mingying Cheng & Mengchuan Fu & Iftekhar Hasan, 2025. "Understanding CSR champions: a machine learning approach," Annals of Operations Research, Springer, vol. 347(1), pages 761-774, April.
Linying Lv, 2025. "Do Sell-side Analyst Reports Have Investment Value?," Papers 2502.20489, arXiv.org, revised Mar 2025.
Guillaume Chevalier & Guillaume Coqueret & Thomas Raffinot, 2022. "Supervised portfolios," Post-Print hal-04144588, HAL.
Cakici, Nusret & Zaremba, Adam, 2021. "Liquidity and the cross-section of international stock returns," Journal of Banking & Finance, Elsevier, vol. 127(C).
Ma, Tian & Leong, Wen Jun & Jiang, Fuwei, 2023. "A latent factor model for the Chinese stock market," International Review of Financial Analysis, Elsevier, vol. 87(C).
Rizwan Ullah & Muhammad Naveed Jan & Muhammad Tahir, 2025. "Unveiling the optimal factor model in Pakistan: a machine learning approach using support vector regression and extreme gradient boosting algorithms," Future Business Journal, Springer, vol. 11(1), pages 1-20, December.
Nusret Cakici & Christian Fieberg & Daniel Metko & Adam Zaremba, 2024. "Do Anomalies Really Predict Market Returns? New Data and New Evidence," Review of Finance, European Finance Association, vol. 28(1), pages 1-44.
Pagano, Marco & Wagner, Christian & Zechner, Josef, 2023. "Disaster resilience and asset prices," Journal of Financial Economics, Elsevier, vol. 150(2).
- Marco Pagano & Christian Wagner & Josef Zechner, 2020. "Disaster Resilience and Asset Prices," Papers 2005.08929, arXiv.org, revised May 2020.
- Marco Pagano & Christian Wagner & Josef Zechner, 2020. "Disaster Resilience and Asset Prices," CSEF Working Papers 563, Centre for Studies in Economics and Finance (CSEF), University of Naples, Italy.
- Marco Pagano & Christian Wagner & Josef Zechner, 2020. "Disaster Resilience and Asset Prices," EIEF Working Papers Series 2008, Einaudi Institute for Economics and Finance (EIEF), revised Nov 2021.
- Zechner, Josef & Pagano, Marco & Wagner, Christian, 2020. "Disaster Resilience and Asset Prices," CEPR Discussion Papers 14773, C.E.P.R. Discussion Papers.
- Pagano, Marco & Wagner, Christian & Zechner, Josef, 2021. "Disaster resilience and asset prices," CFS Working Paper Series 673, Center for Financial Studies (CFS).

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2023-04-24 (Big Data)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2303.04688. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Form 10-K Itemization

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data