IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-05031499.html
   My bibliography  Save this paper

Linking Industry Sectors and Financial Statements: A Hybrid Approach for Company Classification

Author

Listed:
  • Guy Stephane Waffo Dzuyo

    (Forvis Mazars, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique, SYNALP - Natural Language Processing : representations, inference and semantics - LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery - LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique)

  • Gaël Guibon

    (LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique, LIPN - Laboratoire d'Informatique de Paris-Nord - CNRS - Centre National de la Recherche Scientifique - Université Sorbonne Paris Nord, SYNALP - Natural Language Processing : representations, inference and semantics - LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery - LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique)

  • Christophe Cerisara

    (SYNALP - Natural Language Processing : representations, inference and semantics - LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery - LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique)

  • Luis Belmar-Letelier

    (Forvis Mazars)

Abstract

The identification of the financial characteristics of industry sectors has a large importance in accounting audit, allowing auditors to prioritize the most important area during audit. Existing company classification standards such as the Standard Industry Classification (SIC) code allow to map a company to a category based on its activity and products. In this paper, we explore the potential of machine learning algorithms and language models to analyze the relationship between those categories and companies' financial statements. We propose a supervised company classification methodology and analyze several types of representations for financial statements. Existing works address this task using solely numerical information in financial records. Our findings show that beyond numbers, textual information occurring in financial records can be leveraged by language models to match the performance of dedicated decision tree-based classifiers, while providing better explainability and more generic accounting representations. We think this work can serve as a preliminary work towards semi-automatic auditing. Models, code, and a preprocessed dataset are publicly available for further research at https://github.com/WaguyMz/hybrid company classification

Suggested Citation

  • Guy Stephane Waffo Dzuyo & Gaël Guibon & Christophe Cerisara & Luis Belmar-Letelier, 2025. "Linking Industry Sectors and Financial Statements: A Hybrid Approach for Company Classification," Post-Print hal-05031499, HAL.
  • Handle: RePEc:hal:journl:hal-05031499
    Note: View the original document on HAL open archive server: https://hal.science/hal-05031499v1
    as

    Download full text from publisher

    File URL: https://hal.science/hal-05031499v1/document
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. van der Heijden, Hans, 2022. "Predicting industry sectors from financial statements: An illustration of machine learning in accounting research," The British Accounting Review, Elsevier, vol. 54(5).
    2. Carlos León & José Fernando Moreno & Jorge Cely, 2016. "Whose Balance Sheet is this? Neural Networks for Banks’ Pattern Recognition," Borradores de Economia 959, Banco de la Republica de Colombia.
    3. Van der Meulen, Sofie & Gaeremynck, Ann & Willekens, Marleen, 2007. "Attribute differences between U.S. GAAP and IFRS earnings: An exploratory study," The International Journal of Accounting, Elsevier, vol. 42(2), pages 123-142.
    4. Hrazdil, Karel & Trottier, Kim & Zhang, Ray, 2013. "A comparison of industry classification schemes: A large sample study," Economics Letters, Elsevier, vol. 118(1), pages 77-80.
    5. Dimitrios Vamvourellis & M'at'e Toth & Snigdha Bhagat & Dhruv Desai & Dhagash Mehta & Stefano Pasquali, 2023. "Company Similarity using Large Language Models," Papers 2308.08031, arXiv.org.
    6. Van der Meulen, Sofie & Gaeremynck, Ann & Willekens, Marleen, 2007. "Attribute differences between U.S. GAAP and IFRS earnings: An exploratory study," The International Journal of Accounting, Elsevier, vol. 42(2), pages 148-152.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Macías, Marta & Muiño, Flora, 2011. "Examining dual accounting systems in Europe," The International Journal of Accounting, Elsevier, vol. 46(1), pages 51-78, March.
    2. Palea Vera, 2013. "Capital Market Effects of the IFRS Adoption for Separate Financial Statements: Evidence from the Italian Stock Market," Department of Economics and Statistics Cognetti de Martiis. Working Papers 201309, University of Turin.
    3. Ahsina, Khalifa & Taouab, Omar, 2017. "Y’a-t-il vraiment un besoin pour changer de referentiel comptable au Maroc? la prétendue value relevance des normes comptables IFRS [Is there really a need to change accounting references in Morocc," MPRA Paper 81397, University Library of Munich, Germany.
    4. Yuriy, Burykin & Guzaliya, Klychova & Bremmers, Harry J., 2012. "The Development Of Integrated Accounting In Small And Medium-Sized Companies In The Agri- And Foodsector Of The Russian Federation," APSTRACT: Applied Studies in Agribusiness and Commerce, AGRIMBA, vol. 6(01-2), pages 1-6, September.
    5. Palea, Vera, 2013. "IAS/IFRS and Financial Reporting Quality: Lessons from the European Experience," Department of Economics and Statistics Cognetti de Martiis. Working Papers 201330, University of Turin.
    6. Ahmed, Kamran & Chalmers, Keryn & Khlif, Hichem, 2013. "A Meta-analysis of IFRS Adoption Effects," The International Journal of Accounting, Elsevier, vol. 48(2), pages 173-217.
    7. Ernstberger, Jürgen & Vogler, Oliver, 2008. "Analyzing the German accounting triad -- "Accounting Premium" for IAS/IFRS and U.S. GAAP vis-à-vis German GAAP?," The International Journal of Accounting, Elsevier, vol. 43(4), pages 339-386, December.
    8. Habeeb Mohamed Nijam & Athambawa Jahfer, 2018. "IFRS Adoption and Value Relevance of Accounting Information: Evidence from a Developing Country," Global Business Review, International Management Institute, vol. 19(6), pages 1416-1435, December.
    9. Gjerde, Øystein & Knivsflå, Kjell & Sættem, Frode, 2008. "The value-relevance of adopting IFRS: Evidence from 145 NGAAP restatements," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 17(2), pages 92-112.
    10. Legaz Ortiz, Javier & Montoya del Corte, Javier & Rodríguez Ariza, Lázaro, 2015. "Efectos de la reforma contable en el patrimonio neto consolidado a 1 de enero de 2008 de los grupos españoles que no aplican normativa NIIF," Revista de Contabilidad - Spanish Accounting Review, Elsevier, vol. 18(2), pages 217-224.
    11. Gotti, Giorgio & Mastrolia, Stacy, 2012. "The Effect on Financial Reporting Quality of an Exemption from the SEC Reporting Requirements for Foreign Private Issuers," The International Journal of Accounting, Elsevier, vol. 47(1), pages 44-71.
    12. Gotti, Giorgio & Mastrolia, Stacy, 2014. "Cost of Capital for Exempt Foreign Private Issuers: Information Risk Effect or Earnings Quality Effect? It Depends," The International Journal of Accounting, Elsevier, vol. 49(2), pages 190-220.
    13. Nwaobia A. N. & Kwarbai J. D. & Jayeoba, O. O. & Ajibade A. T., 2016. "Financial Reporting Quality on Investors’ Decisions," International Journal of Economics and Financial Research, Academic Research Publishing Group, vol. 2(7), pages 140-147, 07-2016.
    14. Sabur Mollah & Omar Farooque & Asma Mobarek & Philip Molyneux, 2019. "Bank Corporate Governance and Future Earnings Predictability," Journal of Financial Services Research, Springer;Western Finance Association, vol. 56(3), pages 369-394, December.
    15. Wafaa Salah & Abdallah Abdel-Salam, 2019. "The Effects of International Financial Reporting Standards on Financial Reporting Quality," Athens Journal of Business & Economics, Athens Institute for Education and Research (ATINER), vol. 5(3), pages 221-242, July.
    16. Chiu, Tzu-Ting & Lee, Yen-Jung, 2013. "Foreign Private Issuers' Application of IFRS Around the Elimination of the 20-F Reconciliation Requirement," The International Journal of Accounting, Elsevier, vol. 48(1), pages 54-83.
    17. Ellen Cristina Baradel & Fabiano Guasti Lima & Alexandre Assaf Neto, 2016. "Growth Potential of Publicly Traded Brazilian Companies: Accounting and Market Overview," Athens Journal of Business & Economics, Athens Institute for Education and Research (ATINER), vol. 2(2), pages 123-148, April.
    18. Hui-Sung Kao, 2014. "The relationships between IFRS, earnings losses threshold and earnings management," Journal of Chinese Economic and Business Studies, Taylor & Francis Journals, vol. 12(1), pages 81-98, February.
    19. Adhikari, Ajay & Bansal, Manish & Kumar, Ashish, 2021. "IFRS convergence and accounting quality: India a case study," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 45(C).
    20. Karampinis, Nikolaos I. & Hevas, Dimosthenis L., 2011. "Mandating IFRS in an Unfavorable Environment: The Greek Experience," The International Journal of Accounting, Elsevier, vol. 46(3), pages 304-332, September.

    More about this item

    Keywords

    Machine Learning; Industry Sectors; Large Language Models; LLM Applications; Audit; Financial Statement;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-05031499. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.