IDEAS home Printed from https://ideas.repec.org/p/trn/utwprg/2018-09.html
   My bibliography  Save this paper

Classifying Firms with Text Mining

Author

Listed:
  • Giacomo Caterini

Abstract

Statistics on the births, deaths and survival rates of firms are crucial pieces of information, as they enter as an input in the computation of GDP, the identification of each sector’s contribution to the economy, and the assessment of gross job creation and destruction rates. Official statistics on firm demography are made available only several months after data collection and storage, however. Furthermore, unprocessed and untimely administrative data can lead to a misrepresentation of the life-cycle stage of a firm. In this paper we implement an automated version of Eurostat’s algorithm aimed at distinguishing true startup endeavors from the resurrection of pre-existing but apparently defunct firms. The potential gains from combining machine learning, natural language processing and econometric tools for pre- processing and analyzing granular data are exposed, and a machine learning method predicting reactivations of deceptively dead firms is proposed.

Suggested Citation

  • Giacomo Caterini, 2018. "Classifying Firms with Text Mining," DEM Working Papers 2018/09, Department of Economics and Management.
  • Handle: RePEc:trn:utwprg:2018/09
    as

    Download full text from publisher

    File URL: https://www.economia.unitn.it/alfresco/download/workspace/SpacesStore/4c06effa-f574-4bf1-92a1-0ef408aecc44/DEM2018_09.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. André Stel & Martin Carree & Roy Thurik, 2005. "The Effect of Entrepreneurial Activity on National Economic Growth," Small Business Economics, Springer, vol. 24(3), pages 311-321, February.
    2. Akina Ikudo & Julia Lane & Joseph Staudt & Bruce Weinberg, 2018. "Occupational Classifications: A Machine Learning Approach," NBER Working Papers 24951, National Bureau of Economic Research, Inc.
    3. Daron Acemoglu & Pascual Restrepo, 2017. "Robots and Jobs: Evidence from US Labor Markets," Boston University - Department of Economics - Working Papers Series dp-297, Boston University - Department of Economics.
    4. Roger Stough & Dennis McBride, 2014. "Big Data and U.S. Public Policy," Review of Policy Research, Policy Studies Organization, vol. 31(4), pages 339-342, July.
    5. Jennifer L. Castle & Nicholas W.P. Fawcett & David F. Hendry, 2009. "Nowcasting Is Not Just Contemporaneous Forecasting," National Institute Economic Review, National Institute of Economic and Social Research, vol. 210(1), pages 71-89, October.
    6. Laurie A. Schintler & Rajendra Kulkarni, 2014. "Big Data for Policy Analysis: The Good, The Bad, and The Ugly," Review of Policy Research, Policy Studies Organization, vol. 31(4), pages 343-348, July.
    7. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    8. Gul, Ferdinand A. & Kim, Jeong-Bon & Qiu, Annie A., 2010. "Ownership concentration, foreign shareholding, audit quality, and stock price synchronicity: Evidence from China," Journal of Financial Economics, Elsevier, vol. 95(3), pages 425-442, March.
    9. Nadim Ahmad, 2008. "A Proposed Framework for Business Demography Statistics," International Studies in Entrepreneurship, in: Emilio Congregado (ed.), Measuring Entrepreneurship, chapter 0, pages 113-174, Springer.
    10. Daron Acemoglu & Pascual Restrepo, 2020. "Robots and Jobs: Evidence from US Labor Markets," Journal of Political Economy, University of Chicago Press, vol. 128(6), pages 2188-2244.
    11. Wolfgang Härdle & Yuh-Jye Lee & Dorothea Schäfer & Yi-Ren Yeh, 2009. "Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 28(6), pages 512-534.
    12. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hou, Lei & Elsworth, Derek & Zhang, Fengshou & Wang, Zhiyuan & Zhang, Jianbo, 2023. "Evaluation of proppant injection based on a data-driven approach integrating numerical and ensemble learning models," Energy, Elsevier, vol. 264(C).
    2. Uwe JIRJAHN & Stephen C. SMITH, 2018. "Nonunion Employee Representation: Theory And The German Experience With Mandated Works Councils," Annals of Public and Cooperative Economics, Wiley Blackwell, vol. 89(1), pages 201-233, March.
    3. Ufuk Akcigit & Sina T. Ates, 2023. "What Happened to US Business Dynamism?," Journal of Political Economy, University of Chicago Press, vol. 131(8), pages 2059-2124.
    4. Lütkenhorst, Wilfried, 2018. "Creating wealth without labour? Emerging contours of a new techno-economic landscape," IDOS Discussion Papers 11/2018, German Institute of Development and Sustainability (IDOS).
    5. Carbonero, Francesco. & Ernst, Ekkehard & Weber, Enzo., 2018. "Robots worldwide the impact of automation on employment and trade," ILO Working Papers 995008793402676, International Labour Organization.
    6. Joshua Greenstein, 2020. "The Precariat Class Structure and Income Inequality among US Workers: 1980–2018," Review of Radical Political Economics, Union for Radical Political Economics, vol. 52(3), pages 447-469, September.
    7. Greg Howard & Carl Liebersohn, 2019. "What Explains U.S. House Prices? Regional Income Divergence," 2019 Meeting Papers 1054, Society for Economic Dynamics.
    8. Ma, Zhikai & Huo, Qian & Wang, Wei & Zhang, Tao, 2023. "Voltage-temperature aware thermal runaway alarming framework for electric vehicles via deep learning with attention mechanism in time-frequency domain," Energy, Elsevier, vol. 278(C).
    9. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    10. Cristiano CODAGNONE & Giovanni LIVA & Egidijus BARCEVICIUS & Gianluca MISURACA & Luka KLIMAVICIUTE & Michele BENEDETTI & Irene VANINI & Giancarlo VECCHI & Emily RYEN GLOINSON & Katherine STEWART & Sti, 2020. "Assessing the impacts of digital government transformation in the EU: Conceptual framework and empirical case studies," JRC Research Reports JRC120865, Joint Research Centre.
    11. Fabian Eckert & Andrés Gvirtz & Jack Liang & Michael Peters, 2020. "A Method to Construct Geographical Crosswalks with an Application to US Counties since 1790," NBER Working Papers 26770, National Bureau of Economic Research, Inc.
    12. Alain Cohn & Tobias Gesche & Michel André Maréchal, 2022. "Honesty in the Digital Age," Management Science, INFORMS, vol. 68(2), pages 827-845, February.
    13. Christian Dippel & Robert Gold & Stephan Heblich & Rodrigo Pinto, 2017. "Instrumental Variables and Causal Mechanisms: Unpacking the Effect of Trade on Workers and Voters," CESifo Working Paper Series 6816, CESifo.
    14. Dario Cords & Klaus Prettner, 2022. "Technological unemployment revisited: automation in a search and matching framework [The future of work: meeting the global challenges of demographic change and automation]," Oxford Economic Papers, Oxford University Press, vol. 74(1), pages 115-135.
    15. Mr. Francesco Grigoli & Zsoka Koczan & Petia Topalova, 2018. "Drivers of Labor Force Participation in Advanced Economies: Macro and Micro Evidence," IMF Working Papers 2018/150, International Monetary Fund.
    16. Jie Shi & Arno P. J. M. Siebes & Siamak Mehrkanoon, 2023. "TransCORALNet: A Two-Stream Transformer CORAL Networks for Supply Chain Credit Assessment Cold Start," Papers 2311.18749, arXiv.org.
    17. Abeliansky, Ana Lucia & Beulmann, Matthias, 2019. "Are they coming for us? Industrial robots and the mental health of workers," University of Göttingen Working Papers in Economics 379, University of Goettingen, Department of Economics.
    18. Maximiliano Dvorkin & Alexander Monge-Naranjo, 2019. "Occupation Mobility, Human Capital and the Aggregate Consequences of Task-Biased Innovations," Working Papers 2019-064, Human Capital and Economic Opportunity Working Group.
    19. Marc Gilbert Joseph Buchholzer, 2022. "Review of International Comparative Management Volume 23, Issue 1, March 2022 101 Value-ADDED Automation, a Solution for the Future of Work in Automotive Manufacturing in Romania," REVISTA DE MANAGEMENT COMPARAT INTERNATIONAL/REVIEW OF INTERNATIONAL COMPARATIVE MANAGEMENT, Faculty of Management, Academy of Economic Studies, Bucharest, Romania, vol. 23(1), pages 101-111, March.
    20. Bourdouxhe, Axel & Wibail, Lionel & Claessens, Hugues & Dufrêne, Marc, 2023. "Modeling potential natural vegetation: A new light on an old concept to guide nature conservation in fragmented and degraded landscapes," Ecological Modelling, Elsevier, vol. 481(C).

    More about this item

    Keywords

    Business Demography; Classification; Text Mining;
    All these keywords.

    JEL classification:

    • C01 - Mathematical and Quantitative Methods - - General - - - Econometrics
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C80 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - General
    • G33 - Financial Economics - - Corporate Finance and Governance - - - Bankruptcy; Liquidation
    • L11 - Industrial Organization - - Market Structure, Firm Strategy, and Market Performance - - - Production, Pricing, and Market Structure; Size Distribution of Firms
    • L25 - Industrial Organization - - Firm Objectives, Organization, and Behavior - - - Firm Performance
    • L26 - Industrial Organization - - Firm Objectives, Organization, and Behavior - - - Entrepreneurship
    • M13 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - New Firms; Startups
    • R11 - Urban, Rural, Regional, Real Estate, and Transportation Economics - - General Regional Economics - - - Regional Economic Activity: Growth, Development, Environmental Issues, and Changes

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:trn:utwprg:2018/09. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: roberto.gabriele@unitn.it (email available below). General contact details of provider: https://edirc.repec.org/data/detreit.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.