IDEAS home Printed from https://ideas.repec.org/a/bla/jemstr/v27y2018i3p535-553.html
   My bibliography  Save this article

Machine learning and natural language processing on the patent corpus: Data, tools, and new measures

Author

Listed:
  • Benjamin Balsmeier
  • Mohamad Assaf
  • Tyler Chesebro
  • Gabe Fierro
  • Kevin Johnson
  • Scott Johnson
  • Guan‐Cheng Li
  • Sonja Lück
  • Doug O'Reagan
  • Bill Yeh
  • Guangzheng Zang
  • Lee Fleming

Abstract

Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years. Data and documentation can be found at https://console.cloud.google.com/launcher/partners/patents-public-data.

Suggested Citation

  • Benjamin Balsmeier & Mohamad Assaf & Tyler Chesebro & Gabe Fierro & Kevin Johnson & Scott Johnson & Guan‐Cheng Li & Sonja Lück & Doug O'Reagan & Bill Yeh & Guangzheng Zang & Lee Fleming, 2018. "Machine learning and natural language processing on the patent corpus: Data, tools, and new measures," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 535-553, September.
  • Handle: RePEc:bla:jemstr:v:27:y:2018:i:3:p:535-553
    DOI: 10.1111/jems.12259
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/jems.12259
    Download Restriction: no

    File URL: https://libkey.io/10.1111/jems.12259?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Balsmeier, Benjamin & Fleming, Lee & Manso, Gustavo, 2017. "Independent boards and innovation," Journal of Financial Economics, Elsevier, vol. 123(3), pages 536-557.
    2. Manuel Trajtenberg & Gil Shiff & Ran Melamed, 2009. "The "Names Game": Harnessing Inventors, Patent Data for Economic Research," Annals of Economics and Statistics, GENES, issue 93-94, pages 67-77.
    3. Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
    4. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    5. Trajtenberg, Manuel & Shiff, Gil & Melamed, Ran, 2006. "The ˆNames Game˜: Harnessing Inventors Patent Data for Economic Research," Foerder Institute for Economic Research Working Papers 275702, Tel-Aviv University > Foerder Institute for Economic Research.
    6. Bronwyn H. Hall & Adam B. Jaffe & Manuel Trajtenberg, 2001. "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools," NBER Working Papers 8498, National Bureau of Economic Research, Inc.
    7. Bernstein, Shai, 2014. "Does Going Public Affect Innovation?," Research Papers 3011, Stanford University, Graduate School of Business.
    8. Jasjit Singh, 2005. "Collaborative Networks as Determinants of Knowledge Diffusion Patterns," Management Science, INFORMS, vol. 51(5), pages 756-770, May.
    9. Bronwyn H. Hall & Dietmar Harhoff, 2012. "Recent Research on the Economics of Patents," Annual Review of Economics, Annual Reviews, vol. 4(1), pages 541-565, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. David Autor & David Dorn & Gordon H. Hanson & Gary Pisano & Pian Shu, 2020. "Foreign Competition and Domestic Innovation: Evidence from US Patents," American Economic Review: Insights, American Economic Association, vol. 2(3), pages 357-374, September.
    2. Bryan Kelly & Dimitris Papanikolaou & Amit Seru & Matt Taddy, 2021. "Measuring Technological Innovation over the Long Run," American Economic Review: Insights, American Economic Association, vol. 3(3), pages 303-320, September.
    3. Michela Giorcelli & Nicola Lacetera & Astrid Marinoni, 2022. "How does scientific progress affect cultural changes? A digital text analysis," Journal of Economic Growth, Springer, vol. 27(3), pages 415-452, September.
    4. Stefan Pauly & Fernando Stipanicic, 2022. "The Creation and Diffusion of Knowledge: Evidence from the Jet Age," Working Papers hal-04067326, HAL.
    5. Matthew Clancy & Paul Heisey & Yongjie Ji & GianCarlo Moschini, 2020. "The Roots of Agricultural Innovation: Patent Evidence of Knowledge Spillovers," NBER Chapters, in: Economics of Research and Innovation in Agriculture, pages 21-75, National Bureau of Economic Research, Inc.
    6. Jan-Bart Vervenne & Julie Callaert & Machteld Hoskens & Bart Looy, 2022. "To what extent do SMEs contribute to Europe’s patent stock? A methodological outline for creating a Europe-wide SME technology indicator," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3049-3082, June.
    7. Arts, Sam & Hou, Jianan & Gomez, Juan Carlos, 2021. "Natural language processing to identify the creation and impact of new technologies in patent text: Code, data, and new measures," Research Policy, Elsevier, vol. 50(2).
    8. Jorge Mejia & Shawn Mankad & Anandasivam Gopal, 2019. "A for Effort? Using the Crowd to Identify Moral Hazard in New York City Restaurant Hygiene Inspections," Information Systems Research, INFORMS, vol. 30(4), pages 1363-1386, December.
    9. Chen, Wei & Yan, Yan, 2023. "New components and combinations: The perspective of the internal collaboration networks of scientific teams," Journal of Informetrics, Elsevier, vol. 17(2).
    10. Michela Giorcelli & Nicola Lacetera & Astrid Marinoni, 2019. "Does Scientific Progress Affect Culture? A Digital Text Analysis," CESifo Working Paper Series 7499, CESifo.
    11. Biggi, Gianluca & Giuliani, Elisa & Martinelli, Arianna & Benfenati, Emilio, 2022. "Patent Toxicity," Research Policy, Elsevier, vol. 51(1).
      • Gianluca Biggi & Elisa Giuliani & Arianna Martinelli, 2020. "Patent Toxicity," LEM Papers Series 2020/33, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
    12. Jeffrey L. Furman & Markus Nagler & Martin Watzinger, 2021. "Disclosure and Subsequent Innovation: Evidence from the Patent Depository Library Program," American Economic Journal: Economic Policy, American Economic Association, vol. 13(4), pages 239-270, November.
    13. Joshua R. Bruce & John M. de Figueiredo, 2020. "Innovation in the U.S. Government," NBER Working Papers 27181, National Bureau of Economic Research, Inc.
    14. Shin, Seungryul Ryan & Lee, Jisoo & Jung, Yura Rosemary & Hwang, Junseok, 2022. "The diffusion of scientific discoveries in government laboratories: The role of patents filed by government scientists," Research Policy, Elsevier, vol. 51(5).
    15. Na Zhang & Lu Cheng & Chao Sun & Julie Callaert & Bart Looy, 2023. "The role of inter- and intra-organisational networks in innovation: towards requisite variety," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(7), pages 4117-4136, July.
    16. Pauly, Stefan & Stipanicic, Fernando, 2021. "The creation and diffusion of knowledge: Evidence from the Jet Age," CEPREMAP Working Papers (Docweb) 2112, CEPREMAP.
    17. Q. A. Meertens & C. G. H. Diks & H. J. van den Herik & F. W. Takes, 2020. "A data‐driven supply‐side approach for estimating cross‐border Internet purchases within the European Union," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 61-90, January.
    18. Joshua R. Bruce & John M. de Figueiredo, 2020. "Innovation in the US Government," NBER Chapters, in: The Role of Innovation and Entrepreneurship in Economic Growth, pages 433-464, National Bureau of Economic Research, Inc.
    19. Michael J. Andrews, 2021. "Historical patent data: A practitioner's guide," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 30(2), pages 368-397, May.
    20. Donald E. Bowen & Laurent Frésard & Gerard Hoberg, 2023. "Rapidly Evolving Technologies and Startup Exits," Management Science, INFORMS, vol. 69(2), pages 940-967, February.
    21. Daron Acemoglu & David Autor & Christina Patterson, 2023. "Bottlenecks: Sectoral Imbalances and the US Productivity Slowdown," NBER Chapters, in: NBER Macroeconomics Annual 2023, volume 38, National Bureau of Economic Research, Inc.
    22. Sotaro Shibayama & Deyun Yin & Kuniko Matsumoto, 2021. "Measuring novelty in science with word embedding," PLOS ONE, Public Library of Science, vol. 16(7), pages 1-16, July.
    23. Stefan Pauly & Fernando Stipanicic, 2022. "The Creation and Diffusion of Knowledge: Evidence from the Jet Age," SciencePo Working papers Main hal-04067326, HAL.
    24. Scoresby, Richard B. & Park, Haemin, 2021. "The joint effects of individual and firm level knowledge attributes on inventor mobility to entrepreneurial and established firms," Journal of Business Research, Elsevier, vol. 133(C), pages 218-230.
    25. Carol Corrado & David Martin & Qianfan Wu, 2020. "Innovation α: What Do IP-Intensive Stock Price Indexes Tell Us about Innovation?," AEA Papers and Proceedings, American Economic Association, vol. 110, pages 31-35, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.
    2. Li, Guan-Cheng & Lai, Ronald & D’Amour, Alexander & Doolin, David M. & Sun, Ye & Torvik, Vetle I. & Yu, Amy Z. & Fleming, Lee, 2014. "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010)," Research Policy, Elsevier, vol. 43(6), pages 941-955.
    3. Ventura, Samuel L. & Nugent, Rebecca & Fuchs, Erica R.H., 2015. "Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records," Research Policy, Elsevier, vol. 44(9), pages 1672-1701.
    4. YIN Deyun & MOTOHASHI Kazuyuki, 2018. "Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016)," Discussion papers 18018, Research Institute of Economy, Trade and Industry (RIETI).
    5. Varshney, Mayank & Jain, Amit, 2023. "Understanding “reverse” knowledge flows following inventor exit in the semiconductor industry," Technovation, Elsevier, vol. 121(C).
    6. Stefano Breschi & Francesco Lissoni & Gianluca Tarasconi, 2014. "Inventor Data for Research on Migration and Innovation: A Survey and a Pilot," WIPO Economic Research Working Papers 17, World Intellectual Property Organization - Economics and Statistics Division.
    7. Li, Xiaogang, 2020. "Innovation, market valuations, policy uncertainty and trade: Theory and evidence," ISU General Staff Papers 202001010800009179, Iowa State University, Department of Economics.
    8. Nakajima, Ryo & Tamura, Ryuichi & Hanaki, Nobuyuki, 2010. "The effect of collaboration network on inventors' job match, productivity and tenure," Labour Economics, Elsevier, vol. 17(4), pages 723-734, August.
    9. Martin Ganco & Rosemarie H. Ziedonis & Rajshree Agarwal, 2015. "More stars stay, but the brightest ones still leave: Job hopping in the shadow of patent enforcement," Strategic Management Journal, Wiley Blackwell, vol. 36(5), pages 659-685, May.
    10. Jeongsik “Jay” Lee, 2010. "Heterogeneity, Brokerage, and Innovative Performance: Endogenous Formation of Collaborative Inventor Networks," Organization Science, INFORMS, vol. 21(4), pages 804-822, August.
    11. Stefano Breschi & Francesco Lissoni & Ernest Miguelez, 2017. "Foreign-origin inventors in the USA: testing for diaspora and brain gain effects," Journal of Economic Geography, Oxford University Press, vol. 17(5), pages 1009-1038.
    12. Amit Jain & Will Mitchell, 2022. "Specialization as a double‐edged sword: The relationship of scientist specialization with R&D productivity and impact following collaborator change," Strategic Management Journal, Wiley Blackwell, vol. 43(5), pages 986-1024, May.
    13. Ernest Miguelez & Carsten Fink, 2013. "Measuring the International Mobility of Inventors: A New Database," WIPO Economic Research Working Papers 08, World Intellectual Property Organization - Economics and Statistics Division, revised May 2013.
    14. Marx, Matt & Singh, Jasjit & Fleming, Lee, 2015. "Regional disadvantage? Employee non-compete agreements and brain drain," Research Policy, Elsevier, vol. 44(2), pages 394-404.
    15. Clément Gorin, 2017. "Accessibility, absorptive capacity and innovation in European urban areas," Working Papers 1722, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.
    16. Massimiliano Ferrara & Roberto Mavilia & Bruno Antonio Pansera, 2017. "Extracting knowledge patterns with a social network analysis approach: an alternative methodology for assessing the impact of power inventors," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1593-1625, December.
    17. Giuri, Paola & Mariani, Myriam, 2007. "Inventors and invention processes in Europe: Results from the PatVal-EU survey," Research Policy, Elsevier, vol. 36(8), pages 1105-1106, October.
    18. repec:wip:wpaper:8 is not listed on IDEAS
    19. Hanaki, Nobuyuki & Nakajima, Ryo & Ogura, Yoshiaki, 2010. "The dynamics of R&D network in the IT industry," Research Policy, Elsevier, vol. 39(3), pages 386-399, April.
    20. Martha Prevezer & Pietro Panzarasa & Tore Opsahl, 2010. "Geographic clustering and network evolution of innovative activities: Evidence from China’s patents," Working Papers 32, Queen Mary, University of London, School of Business and Management, Centre for Globalisation Research.
    21. Castellani, Davide & Perri, Alessandra & Scalera, Vittoria G., 2022. "Knowledge integration in multinational enterprises: The role of inventors crossing national and organizational boundaries," Journal of World Business, Elsevier, vol. 57(3).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jemstr:v:27:y:2018:i:3:p:535-553. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.kellogg.northwestern.edu/research/journals/JEMS/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.