IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0249071.html
   My bibliography  Save this article

Predicting innovative firms using web mining and deep learning

Author

Listed:
  • Jan Kinne
  • David Lenz

Abstract

Evidence-based STI (science, technology, and innovation) policy making requires accurate indicators of innovation in order to promote economic growth. However, traditional indicators from patents and questionnaire-based surveys often lack coverage, granularity as well as timeliness and may involve high data collection costs, especially when conducted at a large scale. Consequently, they struggle to provide policy makers and scientists with the full picture of the current state of the innovation system. In this paper, we propose a first approach on generating web-based innovation indicators which may have the potential to overcome some of the shortcomings of traditional indicators. Specifically, we develop a method to identify product innovator firms at a large scale and very low costs. We use traditional firm-level indicators from a questionnaire-based innovation survey (German Community Innovation Survey) to train an artificial neural network classification model on labelled (product innovator/no product innovator) web texts of surveyed firms. Subsequently, we apply this classification model to the web texts of hundreds of thousands of firms in Germany to predict whether they are product innovators or not. We then compare these predictions to firm-level patent statistics, survey extrapolation benchmark data, and regional innovation indicators. The results show that our approach produces reliable predictions and has the potential to be a valuable and highly cost-efficient addition to the existing set of innovation indicators, especially due to its coverage and regional granularity.

Suggested Citation

  • Jan Kinne & David Lenz, 2021. "Predicting innovative firms using web mining and deep learning," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-18, April.
  • Handle: RePEc:plo:pone00:0249071
    DOI: 10.1371/journal.pone.0249071
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0249071
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0249071&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0249071?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lüdering Jochen & Winker Peter, 2016. "Forward or Backward Looking? The Economic Discourse and the Observed Reality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 483-515, August.
    2. repec:fth:harver:1473 is not listed on IDEAS
    3. Bettina Peters, 2009. "Persistence of innovation: stylised facts and panel data evidence," The Journal of Technology Transfer, Springer, vol. 34(2), pages 226-243, April.
    4. Bersch, Johannes & Gottschalk, Sandra & Müller, Bettina & Niefert, Michaela, 2014. "The Mannheim Enterprise Panel (MUP) and firm statistics for Germany," ZEW Discussion Papers 14-104, ZEW - Leibniz Centre for European Economic Research.
    5. Zoltan J. Acs & Luc Anselin & Attila Varga, 2008. "Patents and Innovation Counts as Measures of Regional Production of New Knowledge," Chapters, in: Entrepreneurship, Growth and Public Policy, chapter 11, pages 135-151, Edward Elgar Publishing.
    6. Kinne, Jan & Axenbeck, Janna, 2018. "Web mining of firm websites: A framework for web scraping and a pilot study for Germany," ZEW Discussion Papers 18-033, ZEW - Leibniz Centre for European Economic Research.
    7. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    8. Zvi Griliches, 1998. "Patent Statistics as Economic Indicators: A Survey," NBER Chapters, in: R&D and Productivity: The Econometric Evidence, pages 287-343, National Bureau of Economic Research, Inc.
    9. Bronwyn H. Hall & Nathan Rosenberg (ed.), 2010. "Handbook of the Economics of Innovation," Handbook of the Economics of Innovation, Elsevier, edition 1, volume 1, number 1.
    10. Fred Gault (ed.), 2013. "Handbook of Innovation Indicators and Measurement," Books, Edward Elgar Publishing, number 14427.
    11. Vegard H. Larsen & Leif Anders Thorsrud, 2015. "The Value of News," Working Papers No 6/2015, Centre for Applied Macro- and Petroleum economics (CAMP), BI Norwegian Business School.
    12. Nagaoka, Sadao & Motohashi, Kazuyuki & Goto, Akira, 2010. "Patent Statistics as an Innovation Indicator," Handbook of the Economics of Innovation, in: Bronwyn H. Hall & Nathan Rosenberg (ed.), Handbook of the Economics of Innovation, edition 1, volume 2, chapter 0, pages 1083-1127, Elsevier.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lele Cao & Vilhelm von Ehrenheim & Sebastian Krakowski & Xiaoxue Li & Alexandra Lutz, 2022. "Using Deep Learning to Find the Next Unicorn: A Practical Synthesis," Papers 2210.14195, arXiv.org.
    2. Carolina Castaldi & Sandro Mendonca, 2021. "Regions and trademarks. Research opportunities and policy insights from leveraging trademarks in regional innovation studies," Papers in Evolutionary Economic Geography (PEEG) 2138, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Dec 2021.
    3. van Meeteren, Michiel & Trincado-Munoz, Francisco & Rubin, Tzameret H. & Vorley, Tim, 2022. "Rethinking the digital transformation in knowledge-intensive services: A technology space analysis," Technological Forecasting and Social Change, Elsevier, vol. 179(C).
    4. Julian Schwierzy & Robert Dehghan & Sebastian Schmidt & Elisa Rodepeter & Andreas Stoemmer & Kaan Uctum & Jan Kinne & David Lenz & Hanna Hottenrott, 2022. "Technology Mapping Using WebAI: The Case of 3D Printing," Papers 2201.01125, arXiv.org.
    5. Dörr, Julian Oliver & Kinne, Jan & Lenz, David & Licht, Georg & Winker, Peter, 2021. "An integrated data framework for policy guidance in times of dynamic economic shocks," ZEW Discussion Papers 21-062, ZEW - Leibniz Centre for European Economic Research.
    6. Axenbeck, Janna & Breithaupt, Patrick, 2022. "Measuring the digitalisation of firms: A novel text mining approach," ZEW Discussion Papers 22-065, ZEW - Leibniz Centre for European Economic Research.
    7. Rammer, Christian & Es-Sadki, Nordine, 2023. "Using big data for generating firm-level innovation indicators - a literature review," Technological Forecasting and Social Change, Elsevier, vol. 197(C).
    8. Lele Cao & Gustaf Halvardsson & Andrew McCornack & Vilhelm von Ehrenheim & Pawel Herman, 2023. "Sourcing Investment Targets for Venture and Growth Capital Using Multivariate Time Series Transformer," Papers 2309.16888, arXiv.org.
    9. Schmidt, Sebastian & Kinne, Jan & Lautenbach, Sven & Blaschke, Thomas & Lenz, David & Resch, Bernd, 2022. "Greenwashing in the US metal industry? A novel approach combining SO2 concentrations from satellite data, a plant-level firm database and web text mining," ZEW Discussion Papers 22-006, ZEW - Leibniz Centre for European Economic Research.
    10. Breithaupt, Patrick & Hottenrott, Hanna & Rammer, Christian & Römer, Konstantin, 2023. "Mapping employee mobility and employer networks using professional network data," ZEW Discussion Papers 23-041, ZEW - Leibniz Centre for European Economic Research.
    11. Dania Eugenidis & Jan Kinne & David Lenz, 2022. "Analysing Gender Equality at the Firm Level," MAGKS Papers on Economics 202214, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jahn, Vera & Berlemann, Michael, 2014. "Governance, Firm Size and Innovative Capacity: Regional Empirical Evidence for Germany," VfS Annual Conference 2014 (Hamburg): Evidence-based Economic Policy 100412, Verein für Socialpolitik / German Economic Association.
    2. Behrens, Vanessa & Berger, Marius & Hud, Martin & Hünermund, Paul & Iferd, Younes & Peters, Bettina & Rammer, Christian & Schubert, Torben, 2017. "Innovation activities of firms in Germany - Results of the German CIS 2012 and 2014: Background report on the surveys of the Mannheim Innovation Panel Conducted in the Years 2013 to 2016," ZEW Dokumentationen 17-04, ZEW - Leibniz Centre for European Economic Research.
    3. Muhammad Athar Nadeem & Zhiying Liu & Haji Suleman Ali & Amna Younis & Muhammad Bilal & Yi Xu, 2020. "Innovation and Sustainable Development: Does Aid and Political Instability Impede Innovation?," SAGE Open, , vol. 10(4), pages 21582440209, November.
    4. Mohnen, Pierre, 2019. "R&D, innovation and productivity," MERIT Working Papers 2019-016, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    5. Christian Rammer & Gastón P Fernández & Dirk Czarnitzki, 2021. "Artificial Intelligence and Industrial Innovation: Evidence from Firm-Level Data," Working Papers of Department of Economics, Leuven 674605, KU Leuven, Faculty of Economics and Business (FEB), Department of Economics, Leuven.
    6. Carlino, Gerald & Kerr, William R., 2015. "Agglomeration and Innovation," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 349-404, Elsevier.
    7. Jörn Block & Christian Fisch & Kenta Ikeuchi & Masatoshi Kato, 2022. "Trademarks as an indicator of regional innovation: evidence from Japanese prefectures," Regional Studies, Taylor & Francis Journals, vol. 56(2), pages 190-209, February.
    8. Adelheid Holl & Bettina Peters & Christian Rammer, 2023. "Local knowledge spillovers and innovation persistence of firms," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 32(6), pages 826-850, August.
    9. Martin Kalthaus, 2020. "Knowledge recombination along the technology life cycle," Journal of Evolutionary Economics, Springer, vol. 30(3), pages 643-704, July.
    10. Riccardo Crescenzi & Alexander Jaax, 2017. "Innovation in Russia: The Territorial Dimension," Economic Geography, Taylor & Francis Journals, vol. 93(1), pages 66-88, January.
    11. Fritsch, Michael & Wyrwich, Michael, 2021. "Is innovation (increasingly) concentrated in large cities? An international comparison," Research Policy, Elsevier, vol. 50(6).
    12. Marina Flamand, 2016. "Studying strategic choices of carmakers in the development of energy storage solutions: a patent analysis," International Journal of Automotive Technology and Management, Inderscience Enterprises Ltd, vol. 16(2), pages 169-192.
    13. Kang, Byeongwoo, 2014. "The innovation process of a privately-owned enterprise and a state-owned enterprise in China," IDE Discussion Papers 470, Institute of Developing Economies, Japan External Trade Organization(JETRO).
    14. Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 13(1), pages 83-125, January.
    15. Lei Jin & Keran Duan & Xu Tang, 2018. "What Is the Relationship between Technological Innovation and Energy Consumption? Empirical Analysis Based on Provincial Panel Data from China," Sustainability, MDPI, vol. 10(1), pages 1-13, January.
    16. Pfister, Curdin & Koomen, Miriam & Harhoff, Dietmar & Backes-Gellner, Uschi, 2021. "Regional innovation effects of applied research institutions," Research Policy, Elsevier, vol. 50(4).
    17. repec:bof:bofrdp:urn:nbn:fi:bof-201512111472 is not listed on IDEAS
    18. Sam Tavassoli & Nunzia Carbonara, 2014. "The role of knowledge variety and intensity for regional innovation," Small Business Economics, Springer, vol. 43(2), pages 493-509, August.
    19. Stucki, Tobias & Woerter, Martin, 2019. "The private returns to knowledge: A comparison of ICT, biotechnologies, nanotechnologies, and green technologies," Technological Forecasting and Social Change, Elsevier, vol. 145(C), pages 62-81.
    20. Alessandra Colombelli & Francesco Quatraro, 2014. "The persistence of firms' knowledge base: a quantile approach to Italian data," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 23(7), pages 585-610, October.
    21. repec:zbw:bofrdp:urn:nbn:fi:bof-201512111472 is not listed on IDEAS
    22. Rammer, Christian & Es-Sadki, Nordine, 2023. "Using big data for generating firm-level innovation indicators - a literature review," Technological Forecasting and Social Change, Elsevier, vol. 197(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0249071. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.