IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2303.04688.html
   My bibliography  Save this paper

Form 10-K Itemization

Author

Listed:
  • Yanci Zhang
  • Mengjia Xia
  • Mingyang Li
  • Haitao Mao
  • Yutong Lu
  • Yupeng Lan
  • Jinlin Ye
  • Rui Dai

Abstract

Form 10-K report is a financial report disclosing the annual financial state of a public company. It is an important evidence to conduct financial analysis, i.e., asset pricing, corporate finance. Practitioners and researchers are constantly designing algorithms to better conduct analysis on information in the Form 10-K report. The vast majority of previous works focus on quantitative data. With recent advancement on natural language processing (NLP), textual data in financial filing attracts more attention. However, to incorporate textual data for analyzing, Form 10-K Itemization is a necessary pre-process step. It aims to segment the whole document into several Item sections, where each Item section focuses on a specific financial aspect of the company. With the segmented Item sections, NLP techniques can directly apply on those Item sections related to downstream tasks. In this paper, we develop a Form 10-K Itemization system which can automatically segment all the Item sections in 10-K documents. The system is both effective and efficient. It reaches a retrieval rate of 93%.

Suggested Citation

  • Yanci Zhang & Mengjia Xia & Mingyang Li & Haitao Mao & Yutong Lu & Yupeng Lan & Jinlin Ye & Rui Dai, 2023. "Form 10-K Itemization," Papers 2303.04688, arXiv.org.
  • Handle: RePEc:arx:papers:2303.04688
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2303.04688
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yanci Zhang & Tianming Du & Yujie Sun & Lawrence Donohue & Rui Dai, 2021. "Form 10-Q Itemization," Papers 2104.11783, arXiv.org, revised Oct 2021.
    2. Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
    3. Feng Li & Russell Lundholm & Michael Minnis, 2013. "A Measure of Competition Based on 10‐K Filings," Journal of Accounting Research, Wiley Blackwell, vol. 51(2), pages 399-436, May.
    4. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    5. Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
    6. Rong Liu & Jujun Huang & Zhongju Zhang, 2023. "Tracking disclosure change trajectories for financial fraud detection," Production and Operations Management, Production and Operations Management Society, vol. 32(2), pages 584-602, February.
    7. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    8. Mushtaq, Rizwan & Gull, Ammar Ali & Shahab, Yasir & Derouiche, Imen, 2022. "Do financial performance indicators predict 10-K text sentiments? An application of artificial intelligence," Research in International Business and Finance, Elsevier, vol. 61(C).
    9. Rong Yang & Yang Yu & Manlu Liu & Kean Wu, 2018. "Corporate Risk Disclosure and Audit Fee: A Text Mining Approach," European Accounting Review, Taylor & Francis Journals, vol. 27(3), pages 583-594, May.
    10. Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
    11. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Obaid, Khaled & Pukthuanthong, Kuntara, 2022. "A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news," Journal of Financial Economics, Elsevier, vol. 144(1), pages 273-297.
    2. Edeling, Alexander & Srinivasan, Shuba & Hanssens, Dominique M., 2021. "The marketing–finance interface: A new integrative review of metrics, methods, and findings and an agenda for future research," International Journal of Research in Marketing, Elsevier, vol. 38(4), pages 857-876.
    3. Christian Fieberg & Daniel Metko & Thorsten Poddig & Thomas Loy, 2023. "Machine learning techniques for cross-sectional equity returns’ prediction," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 289-323, March.
    4. Clarke, Charles, 2022. "The level, slope, and curve factor model for stocks," Journal of Financial Economics, Elsevier, vol. 143(1), pages 159-187.
    5. Yan, Jingda & Yu, Jialin, 2023. "Cross-stock momentum and factor momentum," Journal of Financial Economics, Elsevier, vol. 150(2).
    6. Jiaju Miao & Pawel Polak, 2023. "Online Ensemble of Models for Optimal Predictive Performance with Applications to Sector Rotation Strategy," Papers 2304.09947, arXiv.org.
    7. Alex Kim & Maximilian Muhn & Valeri Nikolaev, 2023. "Bloated Disclosures: Can ChatGPT Help Investors Process Information?," Papers 2306.10224, arXiv.org, revised Feb 2024.
    8. Doron Avramov & Si Cheng & Lior Metzker, 2023. "Machine Learning vs. Economic Restrictions: Evidence from Stock Return Predictability," Management Science, INFORMS, vol. 69(5), pages 2587-2619, May.
    9. Lioui, Abraham & Tarelli, Andrea, 2022. "Chasing the ESG factor," Journal of Banking & Finance, Elsevier, vol. 139(C).
    10. Avramov, Doron & Li, Minwen & Wang, Hao, 2021. "Predicting corporate policies using downside risk: A machine learning approach," Journal of Empirical Finance, Elsevier, vol. 63(C), pages 1-26.
    11. Ni, Xuanming & Zheng, Tiantian & Zhao, Huimin & Zhu, Shushang, 2023. "High-dimensional portfolio optimization based on tree-structured factor model," Pacific-Basin Finance Journal, Elsevier, vol. 81(C).
    12. Guillaume Chevalier & Guillaume Coqueret & Thomas Raffinot, 2022. "Supervised portfolios," Post-Print hal-04144588, HAL.
    13. Cakici, Nusret & Zaremba, Adam, 2021. "Liquidity and the cross-section of international stock returns," Journal of Banking & Finance, Elsevier, vol. 127(C).
    14. Doron Avramov & Guy Kaplanski & Avanidhar Subrahmanyam, 2022. "Postfundamentals Price Drift in Capital Markets: A Regression Regularization Perspective," Management Science, INFORMS, vol. 68(10), pages 7658-7681, October.
    15. Pagano, Marco & Wagner, Christian & Zechner, Josef, 2023. "Disaster resilience and asset prices," Journal of Financial Economics, Elsevier, vol. 150(2).
    16. Ma, Tian & Leong, Wen Jun & Jiang, Fuwei, 2023. "A latent factor model for the Chinese stock market," International Review of Financial Analysis, Elsevier, vol. 87(C).
    17. De Nard, Gianluca & Zhao, Zhao, 2022. "A large-dimensional test for cross-sectional anomalies:Efficient sorting revisited," International Review of Economics & Finance, Elsevier, vol. 80(C), pages 654-676.
    18. Blanco, Ivan & De Jesus, Miguel & Remesal, Alvaro, 2023. "Overlapping momentum portfolios," Journal of Empirical Finance, Elsevier, vol. 72(C), pages 1-22.
    19. Victor DeMiguel & Javier Gil-Bazo & Francisco J. Nogales & André A. P. Santos, 2021. "Can Machine Learning Help to Select Portfolios of Mutual Funds?," Working Papers 1245, Barcelona School of Economics.
    20. Adcock, Christopher & Bessler, Wolfgang & Conlon, Thomas, 2022. "Characteristic-sorted portfolios and macroeconomic risks—An orthogonal decomposition," Journal of Empirical Finance, Elsevier, vol. 65(C), pages 24-50.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2303.04688. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.