IDEAS home Printed from https://ideas.repec.org/p/mar/magkse/201815.html

Measuring the Diffusion of Innovations with Paragraph Vector Topic Models

Author

Listed:
  • David Lenz

    (Justus-Liebig-University Giessen)

  • Peter Winker

    (Justus-Liebig-University Giessen)

Abstract

Measuring the diffusion of innovations from textual data sources besides patent data has not been studied extensively. However, early and accurate indicators of innovation and the recognition of trends in innovation are mandatory to successfully promote economic growth through technological progress via evidence-based policy making. In this study, we propose Paragraph Vector Topic Model (PVTM) and apply it on technology related news articles to analyze innovation related topics over time and gain insights regarding their diffusion process. PVTM represents documents in a semantic space, which has been shown to capture latent variables of the underlying documents, e.g. the latent topics. Clusters of documents in the semantic space can then be interpreted and transformed into meaningful topics by means of Gaussian mixture modeling. Using PVTM we identify innovation related topics from 170 thousand technology news articles published over a span of 20 years and gather insights about their diffusion state by measuring the topics importance in the corpus over time. Thereby, we find that PVTM diffusion indicators for certain topics are Granger causal to Google Trends indices with matching search terms. Further, our results suggest PVTM is well suited to discover latent topics in (technology related) news articles and that the diffusion of innovations could be assessed using topic importance measures derived from PVTM.

Suggested Citation

  • David Lenz & Peter Winker, 2018. "Measuring the Diffusion of Innovations with Paragraph Vector Topic Models," MAGKS Papers on Economics 201815, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
  • Handle: RePEc:mar:magkse:201815
    as

    Download full text from publisher

    File URL: http://www.uni-marburg.de/fb02/makro/forschung/magkspapers/paper_2018/15-2018_lenz.pdf
    File Function: First 201815
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. repec:bny:wpaper:0034 is not listed on IDEAS
    2. Lüdering Jochen & Winker Peter, 2016. "Forward or Backward Looking? The Economic Discourse and the Observed Reality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 483-515, August.
    3. Won Sang Lee & Hyo Shin Choi & So Young Sohn, 2018. "Forecasting new product diffusion using both patent citation and web search traffic," PLOS ONE, Public Library of Science, vol. 13(4), pages 1-12, April.
    4. Bryan Kelly & Dimitris Papanikolaou & Amit Seru & Matt Taddy, 2021. "Measuring Technological Innovation over the Long Run," American Economic Review: Insights, American Economic Association, vol. 3(3), pages 303-320, September.
    5. Antonin Bergeaud & Yoann Potiron & Juste Raimbault, 2017. "Classifying patents based on their semantic content," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-22, April.
    6. Kilian,Lutz & Lütkepohl,Helmut, 2018. "Structural Vector Autoregressive Analysis," Cambridge Books, Cambridge University Press, number 9781107196575, January.
    7. Stephen Hansen & Michael McMahon & Andrea Prat, 2018. "Transparency and Deliberation Within the FOMC: A Computational Linguistics Approach," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(2), pages 801-870.
    8. Leah G. Nichols, 2014. "A topic model approach to measuring interdisciplinarity at the National Science Foundation," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(3), pages 741-754, September.
    9. Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 13(1), pages 83-125, January.
    10. Hansen, Stephen & McMahon, Michael, 2016. "Shocking language: Understanding the macroeconomic effects of central bank communication," Journal of International Economics, Elsevier, vol. 99(S1), pages 114-133.
    11. Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Journal of Historical Economics and Econometric History, Association Française de Cliométrie (AFC), vol. 13(1), pages 83-125, January.
    12. Hyunyoung Choi & Hal Varian, 2012. "Predicting the Present with Google Trends," The Economic Record, The Economic Society of Australia, vol. 88(s1), pages 2-9, June.
    13. Stathoulopoulos, Kostas & Mateos-Garcia, Juan, 2017. "Mapping without a map: Exploring the UK business landscape using unsupervised learning," SocArXiv ryxdk, Center for Open Science.
    14. Choi, Jinho & Hwang, Yong-Sik, 2014. "Patent keyword network analysis for improving technology development efficiency," Technological Forecasting and Social Change, Elsevier, vol. 83(C), pages 170-182.
    15. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    16. David Chavalarias & Jean-Philippe Cointet, 2013. "Phylomemetic Patterns in Science Evolution—The Rise and Fall of Scientific Fields," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-11, February.
    17. Ryohei Hisano & Didier Sornette & Takayuki Mizuno & Takaaki Ohnishi & Tsutomu Watanabe, 2013. "High Quality Topic Extraction from Business News Explains Abnormal Financial Market Volatility," PLOS ONE, Public Library of Science, vol. 8(6), pages 1-12, June.
    18. Lino Wehrheim, 2017. "Economic History Goes Digital: Topic Modeling the Journal of Economic History," Working Papers 177, Bavarian Graduate Program in Economics (BGPE).
    19. Larsen, Vegard H. & Thorsrud, Leif A., 2019. "The value of news for economic developments," Journal of Econometrics, Elsevier, vol. 210(1), pages 203-218.
    20. Ryohei Hisano & Didier Sornette & Takayuki Mizuno & Takaaki Ohnishi & Tsutomu Watanabe, 2012. "High quality topic extraction from business news explains abnormal financial market volatility," Papers 1210.6321, arXiv.org, revised Mar 2013.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Savin, Ivan & Ott, Ingrid & Konop, Chris, 2022. "Tracing the evolution of service robotics: Insights from a topic modeling approach," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    2. Nathan, Max & Rosso, Anna, 2022. "Innovative events: product launches, innovation and firm performance," Research Policy, Elsevier, vol. 51(1).
    3. repec:osf:socarx:t3jrq_v1 is not listed on IDEAS
    4. Jeon, Eunji & Yoon, Naeun & Sohn, So Young, 2023. "Exploring new digital therapeutics technologies for psychiatric disorders using BERTopic and PatentSBERTa," Technological Forecasting and Social Change, Elsevier, vol. 186(PA).
    5. Ballester, Omar & Penner, Orion, 2022. "Robustness, replicability and scalability in topic modelling," Journal of Informetrics, Elsevier, vol. 16(1).
    6. Axenbeck, Janna & Breithaupt, Patrick, 2022. "Measuring the digitalisation of firms: A novel text mining approach," ZEW Discussion Papers 22-065, ZEW - Leibniz Centre for European Economic Research.
    7. Albina Latifi & David Lenz & Peter Winker, 2024. "Identification of Innovation Drivers Based on Technology-Related News Articles," MAGKS Papers on Economics 202401, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    8. Max Nathan & Anna Rosso, 2017. "Innovative events," Development Working Papers 429, Centro Studi Luca d'Agliano, University of Milano, revised 08 Apr 2019.
    9. Winker, Peter, 2023. "Visualizing Topic Uncertainty in Topic Modelling," VfS Annual Conference 2023 (Regensburg): Growth and the "sociale Frage" 277584, Verein für Socialpolitik / German Economic Association.
    10. Janna Axenbeck & Patrick Breithaupt, 2021. "Innovation indicators based on firm websites—Which website characteristics predict firm-level innovation activity?," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-23, April.
    11. Maximilian Maurice Gail & Phil-Adrian Klotz, 2025. "E-book Pricing Under the Agency Model: Lessons from the UK," Journal of Industry, Competition and Trade, Springer, vol. 25(1), pages 1-39, December.
    12. Hongshu Chen & Xinna Song & Qianqian Jin & Ximeng Wang, 2022. "Network dynamics in university-industry collaboration: a collaboration-knowledge dual-layer network perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6637-6660, November.
    13. Axenbeck, Janna & Breithaupt, Patrick, 2019. "Web-based innovation indicators: Which firm website characteristics relate to firm-level innovation activity?," ZEW Discussion Papers 19-063, ZEW - Leibniz Centre for European Economic Research.
    14. Viktoriia Naboka-Krell, 2023. "Construction and Analysis of Uncertainty Indices based on Multilingual Text Representations," MAGKS Papers on Economics 202310, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    15. Qiang Gao & Man Jiang, 2024. "Exploring technology fusion by combining latent Dirichlet allocation with Doc2vec: a case of digital medicine and machine learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4043-4070, July.
    16. Dhar, Suparna & Tarafdar, Pratik & Bose, Indranil, 2022. "Understanding the evolution of an emerging technological paradigm and its impact: The case of Digital Twin," Technological Forecasting and Social Change, Elsevier, vol. 185(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Levy, Daniel & Mayer, Tamir & Raviv, Alon, 2022. "Economists in the 2008 financial crisis: Slow to see, fast to act," Journal of Financial Stability, Elsevier, vol. 60(C).
    2. Jon Ellingsen & Vegard H. Larsen & Leif Anders Thorsrud, 2020. "News Media vs. FRED-MD for Macroeconomic Forecasting," CESifo Working Paper Series 8639, CESifo.
    3. Jon Ellingsen & Vegard H. Larsen & Leif Anders Thorsrud, 2022. "News media versus FRED‐MD for macroeconomic forecasting," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(1), pages 63-81, January.
    4. Guglielmo Maria Caporale & Luis Alberiko Gil-Alana & Carlos Poza & José L. Ruiz-Alba, 2025. "The COVID-19 Shock and Spanish Hotel Activity," CESifo Working Paper Series 11985, CESifo.
    5. Juste Raimbault, 2019. "Exploration of an interdisciplinary scientific landscape," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 617-641, May.
    6. Peter Grajzl & Peter Murrell, 2021. "Characterizing a legal–intellectual culture: Bacon, Coke, and seventeenth-century England," Cliometrica, Journal of Historical Economics and Econometric History, Association Française de Cliométrie (AFC), vol. 15(1), pages 43-88, January.
    7. Moreno-Pérez, Carlos & Minozzo, Marco, 2024. "‘Making text talk’: The minutes of the Central Bank of Brazil and the real economy," Journal of International Money and Finance, Elsevier, vol. 147(C).
    8. Poza, Carlos & Monge, Manuel, 2020. "A real time leading economic indicator based on text mining for the Spanish economy. Fractional cointegration VAR and Continuous Wavelet Transform analysis," International Economics, Elsevier, vol. 163(C), pages 163-175.
    9. repec:bny:wpaper:0046 is not listed on IDEAS
    10. Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
    11. Vegard H. Larsen & Leif Anders Thorsrud, 2026. "Using Transformers and Reinforcement Learning as Narrative Filters in Macroeconomics," CESifo Working Paper Series 12454, CESifo.
    12. Celso Brunetti & Marc Joëts & Valérie Mignon, 2023. "Reasons Behind Words: OPEC Narratives and the Oil Market," Working Papers hal-04196053, HAL.
    13. Leonardo N. Ferreira, 2021. "Forecasting with VAR-teXt and DFM-teXt Models:exploring the predictive power of central bank communication," Working Papers Series 559, Central Bank of Brazil, Research Department.
    14. Gustavo Romero Cardoso & Marcio Issao Nakane, 2024. "What's in a headline? News impact on the Brazilian economy," Working Papers, Department of Economics 2024_12, University of São Paulo (FEA-USP).
    15. Laura Battaglia & Timothy M. Christensen & Stephen Hansen & Szymon Sacher, 2024. "Inference for regression with variables generated from unstructured data," CeMMAP working papers 10/24, Institute for Fiscal Studies.
    16. Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 13(1), pages 83-125, January.
    17. Lenard Lieb & Adam Jassem & Rui Jorge Almeida & Nalan Baştürk & Stephan Smeekes, 2025. "Min(d)ing the President: A Text Analytic Approach to Measuring Tax News," American Economic Journal: Macroeconomics, American Economic Association, vol. 17(2), pages 285-314, April.
    18. Szymon Sacher & Laura Battaglia & Stephen Hansen, 2021. "Hamiltonian Monte Carlo for Regression with High-Dimensional Categorical Data," Papers 2107.08112, arXiv.org, revised Feb 2024.
    19. repec:bny:wpaper:0075 is not listed on IDEAS
    20. Candelon, Bertrand & Joëts, Marc & Mignon, Valérie, 2024. "What makes econometric ideas popular: The role of connectivity," Research Policy, Elsevier, vol. 53(7).
    21. Laura Battaglia & Timothy Christensen & Stephen Hansen & Szymon Sacher, 2024. "Inference for Regression with Variables Generated by AI or Machine Learning," Papers 2402.15585, arXiv.org, revised Apr 2025.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • O30 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:mar:magkse:201815. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Bernd Hayo (email available below). General contact details of provider: https://edirc.repec.org/data/vamarde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.