IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2305.13532.html
   My bibliography  Save this paper

InProC: Industry and Product/Service Code Classification

Author

Listed:
  • Simerjot Kaur
  • Andrea Stefanucci
  • Sameena Shah

Abstract

Determining industry and product/service codes for a company is an important real-world task and is typically very expensive as it involves manual curation of data about the companies. Building an AI agent that can predict these codes automatically can significantly help reduce costs, and eliminate human biases and errors. However, unavailability of labeled datasets as well as the need for high precision results within the financial domain makes this a challenging problem. In this work, we propose a hierarchical multi-class industry code classifier with a targeted multi-label product/service code classifier leveraging advances in unsupervised representation learning techniques. We demonstrate how a high quality industry and product/service code classification system can be built using extremely limited labeled dataset. We evaluate our approach on a dataset of more than 20,000 companies and achieved a classification accuracy of more than 92\%. Additionally, we also compared our approach with a dataset of 350 manually labeled product/service codes provided by Subject Matter Experts (SMEs) and obtained an accuracy of more than 96\% resulting in real-life adoption within the financial domain.

Suggested Citation

  • Simerjot Kaur & Andrea Stefanucci & Sameena Shah, 2023. "InProC: Industry and Product/Service Code Classification," Papers 2305.13532, arXiv.org.
  • Handle: RePEc:arx:papers:2305.13532
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2305.13532
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Savvas Papagiannidis & Eric W.K. See-To & Dimitris Assimakopoulos & Yang Yang, 2018. "Identifying industrial clusters with a novel big-data methodology : Are SIC codes (not) fit for purpose in the Internet age?," Post-Print hal-02312006, HAL.
    2. Christine Oehlert & Evan Schulz & Anne Parker, 2022. "NAICS Code Prediction Using Supervised Methods," Statistics and Public Policy, Taylor & Francis Journals, vol. 9(1), pages 58-66, December.
    3. Sven Husmann & Antoniya Shivarova & Rick Steinert, 2020. "Company classification using machine learning," Papers 2004.01496, arXiv.org, revised May 2020.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marina Y. Sheresheva & Lilia A. Valitova & Elena R. Sharko & Ekaterina V. Buzulukova, 2022. "Application of Social Network Analysis to Visualization and Description of Industrial Clusters: A Case of the Textile Industry," JRFM, MDPI, vol. 15(3), pages 1-17, March.
    2. Occhini, Giulia & Tranos, Emmanouil & Wolf, Levi John, 2023. "Occupational segregation in the digital economy? A Natural Language Processing approach using UK Web Data," SocArXiv z8xta, Center for Open Science.
    3. Christoph Stich & Emmanouil Tranos & Max Nathan, 2023. "Modeling clusters from the ground up: A web data approach," Environment and Planning B, , vol. 50(1), pages 244-267, January.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2305.13532. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.