IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/15851.html
   My bibliography  Save this paper

Harmonizing and Combining Large Datasets - An Application to Firm-Level Patent and Accounting Data

Author

Listed:
  • Grid Thoma
  • Salvatore Torrisi
  • Alfonso Gambardella
  • Dominique Guellec
  • Bronwyn H. Hall
  • Dietmar Harhoff

Abstract

This paper discusses methods for the harmonization and combination of large-scale patent and trademark datasets with each other and other sources of data. Dictionary- and rule-based approaches to the consolidation of applicant names in patent data are presented and shown to have both benefits and drawbacks in isolation. We combine the two methods and develop a set of rules and dictionaries to consolidate European, Patent Cooperation Treaty (PCT) and US patent data with firm accounting data. The resulting data encompass about 131,000 patent applicant names from 46 countries, covering 58.8 percent of EPO applications and 50.6 percent of PCT applications by business organizations during the time period from 1979 to 2008. For US data, the resulting dataset includes around 54,000 assignee names and 51.3 percent of US granted patents during approximately the same time period.

Suggested Citation

  • Grid Thoma & Salvatore Torrisi & Alfonso Gambardella & Dominique Guellec & Bronwyn H. Hall & Dietmar Harhoff, 2010. "Harmonizing and Combining Large Datasets - An Application to Firm-Level Patent and Accounting Data," NBER Working Papers 15851, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:15851 Note: PR
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w15851.pdf
    Download Restriction: no

    References listed on IDEAS

    as
    1. Mendonca, Sandro & Pereira, Tiago Santos & Godinho, Manuel Mira, 2004. "Trademarks as an indicator of innovation and industrial change," Research Policy, Elsevier, vol. 33(9), pages 1385-1404, November.
    2. Petra Moser, 2005. "How Do Patent Laws Influence Innovation? Evidence from Nineteenth-Century World's Fairs," American Economic Review, American Economic Association, vol. 95(4), pages 1214-1236, September.
    3. Giarratana, Marco S. & Fosfuri, Andrea, 2004. "Product strategies and startups' survival in turbulent industries: evidence from the security software industry," DEE - Working Papers. Business Economics. WB wb044816, Universidad Carlos III de Madrid. Departamento de Economía de la Empresa.
    4. Christine Greenhalgh & Mark Rogers, 2007. "The value of intellectual property rights to firms and society," Oxford Review of Economic Policy, Oxford University Press, vol. 23(4), pages 541-567, Winter.
    5. Bronwyn H. Hall & Adam Jaffe & Manuel Trajtenberg, 2005. "Market Value and Patent Citations," RAND Journal of Economics, The RAND Corporation, vol. 36(1), pages 16-38, Spring.
    6. Hall, B. & Jaffe, A. & Trajtenberg, M., 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," Papers 2001-29, Tel Aviv.
    7. repec:fth:harver:1473 is not listed on IDEAS
    8. Zvi Griliches, 1998. "Patent Statistics as Economic Indicators: A Survey," NBER Chapters,in: R&D and Productivity: The Econometric Evidence, pages 287-343 National Bureau of Economic Research, Inc.
    9. Christine Greenhalgh & Mark Rogers, 2007. "The Value of Intellectual Property Rights to Firms," Economics Series Working Papers 319, University of Oxford, Department of Economics.
    10. Giuri, Paola & Mariani, Myriam & Brusoni, Stefano & Crespi, Gustavo & Francoz, Dominique & Gambardella, Alfonso & Garcia-Fontes, Walter & Geuna, Aldo & Gonzales, Raul & Harhoff, Dietmar & Hoisl, Karin, 2007. "Inventors and invention processes in Europe: Results from the PatVal-EU survey," Research Policy, Elsevier, vol. 36(8), pages 1107-1127, October.
    11. Grid Thoma & Salvatore Torrisi, 2007. "Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases," KITeS Working Papers 211, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Dec 2007.
    12. Pavitt, Keith & Robson, Michael & Townsend, Joe, 1987. "The Size Distribution of Innovating Firms in the UK: 1945-1983," Journal of Industrial Economics, Wiley Blackwell, vol. 35(3), pages 297-316, March.
    13. Richard C. Levin & Alvin K. Klevorick & Richard R. Nelson & Sidney G. Winter, 1987. "Appropriating the Returns from Industrial Research and Development," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 18(3, Specia), pages 783-832.
    Full references (including those not matched with items on IDEAS)

    More about this item

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • O34 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Intellectual Property and Intellectual Capital

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:15851. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (). General contact details of provider: http://edirc.repec.org/data/nberrus.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.