IDEAS home Printed from https://ideas.repec.org/p/zbw/zewdip/19001.html
   My bibliography  Save this paper

Predicting innovative firms using web mining and deep learning

Author

Listed:
  • Kinne, Jan
  • Lenz, David

Abstract

Innovation is considered as a main driver of economic growth. Promoting the development of innovation through STI (science, technology and innovation) policies requires accurate indicators of innovation. Traditional indicators often lack coverage, granularity as well as timeliness and involve high data collection costs, especially when conducted at a large scale. In this paper, we propose a novel approach on how to create firm-level innovation indicators at the scale of millions of firms. We use traditional firm-level innovation indicators from the questionnaire-based Community Innovation Survey (CIS) survey to train an artificial neural network classification model on labelled (innovative/non-innovative) web texts of surveyed firms. Subsequently, we apply this classification model to the web texts of hundreds of thousands of firms in Germany to predict their innovation status. Our results show that this approach produces credible predictions and has the potential to be a valuable and highly cost-efficient addition to the existing set of innovation indicators, especially due to its coverage and regional granularity. The predicted firm-level probabilities can also directly be interpreted as a continuous measure of innovativeness, opening up additional advantages over traditional binary innovation indicators.

Suggested Citation

  • Kinne, Jan & Lenz, David, 2019. "Predicting innovative firms using web mining and deep learning," ZEW Discussion Papers 19-001, ZEW - Leibniz Centre for European Economic Research.
  • Handle: RePEc:zbw:zewdip:19001
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/191615/1/1047440679.pdf
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Carolina Castaldi & Sandro Mendonca, 2021. "Regions and trademarks. Research opportunities and policy insights from leveraging trademarks in regional innovation studies," Papers in Evolutionary Economic Geography (PEEG) 2138, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Dec 2021.
    2. Andres, Antonio Rodriguez & Otero, Abraham & Amavilah, Voxi Heinrich, 2021. "Using Deep Learning Neural Networks to Predict the Knowledge Economy Index for Developing and Emerging Economies," MPRA Paper 109137, University Library of Munich, Germany.
    3. Janna Axenbeck & Patrick Breithaupt, 2021. "Innovation indicators based on firm websites—Which website characteristics predict firm-level innovation activity?," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-23, April.
    4. Diane Coyle & David Nguyen, 2019. "No plant, no problem? Factoryless manufacturing and economic measurement," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2019-15, Economic Statistics Centre of Excellence (ESCoE).
    5. Axenbeck, Janna & Breithaupt, Patrick, 2019. "Web-based innovation indicators: Which firm website characteristics relate to firm-level innovation activity?," ZEW Discussion Papers 19-063, ZEW - Leibniz Centre for European Economic Research.
    6. Breithaupt, Patrick & Kesler, Reinhold & Niebel, Thomas & Rammer, Christian, 2020. "Intangible capital indicators based on web scraping of social media," ZEW Discussion Papers 20-046, ZEW - Leibniz Centre for European Economic Research.
    7. Julian Schwierzy & Robert Dehghan & Sebastian Schmidt & Elisa Rodepeter & Andreas Stoemmer & Kaan Uctum & Jan Kinne & David Lenz & Hanna Hottenrott, 2022. "Technology Mapping Using WebAI: The Case of 3D Printing," Papers 2201.01125, arXiv.org.
    8. Nathan, Max & Rosso, Anna, 2022. "Innovative events: product launches, innovation and firm performance," Research Policy, Elsevier, vol. 51(1).
    9. Falco J. Bargagli-Stoffi & Jan Niederreiter & Massimo Riccaboni, 2020. "Supervised learning for the prediction of firm dynamics," Papers 2009.06413, arXiv.org.
    10. Böhmecke-Schwafert, Moritz & García Moreno, Eduardo, 2023. "Exploring blockchain-based innovations for economic and sustainable development in the global south: A mixed-method approach based on web mining and topic modeling," Technological Forecasting and Social Change, Elsevier, vol. 191(C).
    11. Abbasiharofteh, Milad & Kinne, Jan & Krüger, Miriam, 2021. "The strength of weak and strong ties in bridging geographic and cognitive distances," ZEW Discussion Papers 21-049, ZEW - Leibniz Centre for European Economic Research.
    12. Rammer, Christian & Es-Sadki, Nordine, 2023. "Using big data for generating firm-level innovation indicators - a literature review," Technological Forecasting and Social Change, Elsevier, vol. 197(C).

    More about this item

    Keywords

    Web Mining; Web Scraping; R&D; R&I; STI; Innovation; Indicators; Text Mining; Natural Language Processing; NLP; Deep Learning;
    All these keywords.

    JEL classification:

    • O30 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:zewdip:19001. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/zemande.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.