IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i3d10.1007_s11192-020-03726-9.html
   My bibliography  Save this article

Web mining for innovation ecosystem mapping: a framework and a large-scale pilot study

Author

Listed:
  • Jan Kinne

    (ZEW – Leibniz Centre for European Economic Research
    University of Salzburg
    Harvard University
    istari.ai)

  • Janna Axenbeck

    (ZEW - Leibniz Centre for European Economic Research
    Justus-Liebig-University)

Abstract

Existing approaches to model innovation ecosystems have been mostly restricted to qualitative and small-scale levels or, when relying on traditional innovation indicators such as patents and questionnaire-based survey, suffered from a lack of timeliness, granularity, and coverage. Websites of firms are a particularly interesting data source for innovation research, as they are used for publishing information about potentially innovative products, services, and cooperation with other firms. Analyzing the textual and relational content on these websites and extracting innovation-related information from them has the potential to provide researchers and policy-makers with a cost-effective way to survey millions of businesses and gain insights into their innovation activity, their cooperation, and applied technologies. For this purpose, we propose a web mining framework for consistent and reproducible mapping of innovation ecosystems. In a large-scale pilot study we use a database with 2.4 million German firms to test our framework and explore firm websites as a data source. Thereby we put particular emphasis on the investigation of a potential bias when surveying innovation systems through firm websites if only certain firm types can be surveyed using our proposed approach. We find that the availability of a websites and the characteristics of the website (number of subpages and hyperlinks, text volume, language used) differs according to firm size, age, location, and sector. We also find that patenting firms will be overrepresented in web mining studies. Web mining as a survey method also has to cope with extremely large and hyper-connected outlier websites and the fact that low broadband availability appears to prevent some firms from operating their own website and thus excludes them from web mining analysis. We then apply the proposed framework to map an exemplary innovation ecosystem of Berlin-based firms that are engaged in artificial intelligence. Finally, we outline several approaches how to transfer firm website content into valuable innovation indicators.

Suggested Citation

  • Jan Kinne & Janna Axenbeck, 2020. "Web mining for innovation ecosystem mapping: a framework and a large-scale pilot study," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2011-2041, December.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03726-9
    DOI: 10.1007/s11192-020-03726-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03726-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03726-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    2. Nikolaos Askitas & Klaus F. Zimmermann, 2015. "The internet as a data source for advancement in social sciences," International Journal of Manpower, Emerald Group Publishing Limited, vol. 36(1), pages 2-12, April.
    3. Kleinknecht, Alfred & Reijnen, Jeroen O. N., 1993. "Towards literature-based innovation output indicators," Structural Change and Economic Dynamics, Elsevier, vol. 4(1), pages 199-207, June.
    4. Abdullah Gök & Alec Waterworth & Philip Shapira, 2015. "Use of web mining in studying innovation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 653-671, January.
    5. Zvi Griliches, 1998. "Patent Statistics as Economic Indicators: A Survey," NBER Chapters, in: R&D and Productivity: The Econometric Evidence, pages 287-343, National Bureau of Economic Research, Inc.
    6. Rammer, Christian & Crass, Dirk & Doherr, Thorsten & Hud, Martin & Hünermund, Paul & Iferd, Younes & Köhler, Christian & Peters, Bettina & Schubert, Torben, 2016. "Innovationsverhalten der deutschen Wirtschaft: Indikatorenbericht zur Innovationserhebung 2015," The Annual German Innovation Survey, Key Figures Reports 128149, ZEW - Leibniz Centre for European Economic Research.
    7. repec:bof:bofrdp:urn:nbn:fi:bof-201512111472 is not listed on IDEAS
    8. repec:fth:harver:1473 is not listed on IDEAS
    9. repec:zbw:bofrdp:2015_027 is not listed on IDEAS
    10. Carlino, Gerald & Kerr, William R., 2015. "Agglomeration and Innovation," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 349-404, Elsevier.
    11. Manfred M. Fischer & Arthur Getis (ed.), 2010. "Handbook of Applied Spatial Analysis," Springer Books, Springer, number 978-3-642-03647-7, March.
    12. repec:zbw:bofrdp:urn:nbn:fi:bof-201512111472 is not listed on IDEAS
    13. Zoltan J. Acs & Luc Anselin & Attila Varga, 2008. "Patents and Innovation Counts as Measures of Regional Production of New Knowledge," Chapters, in: Entrepreneurship, Growth and Public Policy, chapter 11, pages 135-151, Edward Elgar Publishing.
    14. Xu, Guannan & Wu, Yuchen & Minshall, Tim & Zhou, Yuan, 2018. "Exploring innovation ecosystems across science, technology, and business: A case of 3D printing in China," Technological Forecasting and Social Change, Elsevier, vol. 136(C), pages 208-221.
    15. Christian Rammer & Jan Kinne & Knut Blind, 2020. "Knowledge proximity and firm innovation: A microgeographic analysis for Berlin," Urban Studies, Urban Studies Journal Limited, vol. 57(5), pages 996-1014, April.
    16. Carlino, Gerald & Kerr, William R., 2015. "Agglomeration and Innovation," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 349-404, Elsevier.
    17. Alfred Kleinknecht & Kees Van Montfort & Erik Brouwer, 2002. "The Non-Trivial Choice between Innovation Indicators," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 11(2), pages 109-121.
    18. Sanjay K. Arora & Jan Youtie & Philip Shapira & Lidan Gao & TingTing Ma, 2013. "Entry strategies in an emerging technology: a pilot web-based study of graphene firms," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 1189-1207, June.
    19. Nelson, Andrew J., 2009. "Measuring knowledge spillovers: What patents, licenses and publications reveal about innovation diffusion," Research Policy, Elsevier, vol. 38(6), pages 994-1005, July.
    20. Mohammad Arzaghi & J. Vernon Henderson, 2008. "Networking off Madison Avenue," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 75(4), pages 1011-1038.
    21. J Sylvan Katz & Viv Cothey, 2006. "Web indicators for complex innovation systems," Research Evaluation, Oxford University Press, vol. 15(2), pages 85-95, August.
    22. Rammer, Christian & Berger, Marius & Doherr, Thorsten & Hud, Martin & Hünermund, Paul & Iferd, Younes & Peters, Bettina & Schubert, Torben, 2017. "Innovationsverhalten der deutschen Wirtschaft: Indikatorenbericht zur Innovationserhebung 2016," The Annual German Innovation Survey, Key Figures Reports 155758, ZEW - Leibniz Centre for European Economic Research.
    23. Gilles Duranton & J. V. Henderson & William C. Strange (ed.), 2015. "Handbook of Regional and Urban Economics," Handbook of Regional and Urban Economics, Elsevier, edition 1, volume 5, number 5.
    24. Krüger, Miriam & Kinne, Jan & Lenz, David & Resch, Bernd, 2020. "The digital layer: How innovative firms relate on the web," ZEW Discussion Papers 20-003, ZEW - Leibniz Centre for European Economic Research.
    25. Seongsoo Jang & Jinwon Kim & Max von Zedtwitz, 2017. "The importance of spatial agglomeration in product innovation: A microgeography perspective," Post-Print hal-02004347, HAL.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kinne, Jan & Axenbeck, Janna, 2018. "Web mining of firm websites: A framework for web scraping and a pilot study for Germany," ZEW Discussion Papers 18-033, ZEW - Leibniz Centre for European Economic Research.
    2. Rammer, Christian & Es-Sadki, Nordine, 2023. "Using big data for generating firm-level innovation indicators - a literature review," Technological Forecasting and Social Change, Elsevier, vol. 197(C).
    3. Dziallas, Marisa & Blind, Knut, 2019. "Innovation indicators throughout the innovation process: An extensive literature analysis," Technovation, Elsevier, vol. 80, pages 3-29.
    4. Riccardo Crescenzi & Alexander Jaax, 2017. "Innovation in Russia: The Territorial Dimension," Economic Geography, Taylor & Francis Journals, vol. 93(1), pages 66-88, January.
    5. Yongfeng Zhu & Zilong Wang & Shilei Qiu & Lingling Zhu, 2019. "Effects of Environmental Regulations on Technological Innovation Efficiency in China’s Industrial Enterprises: A Spatial Analysis," Sustainability, MDPI, vol. 11(7), pages 1-19, April.
    6. Motoyama, Yasuyuki & Cao, Cong & Appelbaum, Richard, 2014. "Observing regional divergence of Chinese nanotechnology centers," Technological Forecasting and Social Change, Elsevier, vol. 81(C), pages 11-21.
    7. Carlino, Gerald & Kerr, William R., 2015. "Agglomeration and Innovation," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 349-404, Elsevier.
    8. Diemer, Andreas & Regan, Tanner, 2022. "No inventor is an island: Social connectedness and the geography of knowledge flows in the US," Research Policy, Elsevier, vol. 51(2).
    9. Tobias Schlegel & Curdin Pfister & Dietmar Harhoff & Uschi Backes-Gellner, 2022. "Innovation effects of universities of applied sciences: an assessment of regional heterogeneity," The Journal of Technology Transfer, Springer, vol. 47(1), pages 63-118, February.
    10. Behrens, Kristian & Kichko, Sergei & Thisse, Jacques-Francois, 2024. "Working from home: Too much of a good thing?," Regional Science and Urban Economics, Elsevier, vol. 105(C).
    11. Hamidi, Shima & Zandiatashbar, Ahoura & Bonakdar, Ahmad, 2019. "The relationship between regional compactness and regional innovation capacity (RIC): Empirical evidence from a national study," Technological Forecasting and Social Change, Elsevier, vol. 142(C), pages 394-402.
    12. Blazquez, Desamparados & Domenech, Josep, 2018. "Big Data sources and methods for social and economic analyses," Technological Forecasting and Social Change, Elsevier, vol. 130(C), pages 99-113.
    13. Matthias Siller & Christoph Hauser & Janette Walde & Gottfried Tappeiner, 2015. "Measuring regional innovation in one dimension: More lost than gained?," Working Papers 2015-14, Faculty of Economics and Statistics, Universität Innsbruck.
    14. Fritsch, Michael & Wyrwich, Michael, 2021. "Is innovation (increasingly) concentrated in large cities? An international comparison," Research Policy, Elsevier, vol. 50(6).
    15. Combes, Pierre-Philippe & Gobillon, Laurent, 2015. "The Empirics of Agglomeration Economies," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 247-348, Elsevier.
    16. William R. Kerr & Frederic Robert-Nicoud, 2020. "Tech Clusters," Journal of Economic Perspectives, American Economic Association, vol. 34(3), pages 50-76, Summer.
    17. Abbasiharofteh, Milad & Kinne, Jan & Krüger, Miriam, 2021. "The strength of weak and strong ties in bridging geographic and cognitive distances," ZEW Discussion Papers 21-049, ZEW - Leibniz Centre for European Economic Research.
    18. Olof Ejermo, 2005. "Technological Diversity and Jacobs’ Externality Hypothesis Revisited," Growth and Change, Wiley Blackwell, vol. 36(2), pages 167-195, June.
    19. Bosquet, Clément & Combes, Pierre-Philippe, 2017. "Sorting and agglomeration economies in French economics departments," Journal of Urban Economics, Elsevier, vol. 101(C), pages 27-44.
    20. Bottai, Carlo & Crosato, Lisa & Domenech, Josep & Guerzoni, Marco & Liberati, Caterina, 2024. "Scraping innovativeness from corporate websites: Empirical evidence on Italian manufacturing SMEs," Technological Forecasting and Social Change, Elsevier, vol. 207(C).

    More about this item

    Keywords

    ;
    ;
    ;

    JEL classification:

    • O30 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03726-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.