This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Grid Thoma (Department of Mathematics and Computer Science, University of Camerino and CESPRI - Bocconi University, Milan, Italy.)
Salvatore Torrisi (Department of Management, Univesity of Bologna and CESPRI - Bocconi University, Milan, Italy.)

Additional information is available for the following registered author(s):

Abstract

The lack of firm-level data on innovative activities has always constrained the development of empirical studies on innovation. More recently, the availability of large datasets on indicators, such as R&D expenditures and patents, has relaxed these constrains and spurred the growth of a new wave of research. However, measuring innovation still remains a difficult task for reasons linked to the quality of available indicators and the difficulty of integrating innovation indicators to other firm-level data. As regards quality, data on R&D expenditures represent a measure of input but do not tell much about the ‘success’ of innovative activities. Moreover, especially in the case of European firms, data on R&D expenditures are often missing because reporting these expenditures is not required by accounting and fiscal regulations in some countries. An increasing number of studies have used patents counts as a measure of inventive output. However, crude patent counts are a biased indicator of inventive output because they do not account for differences in the value of patented inventions. This is the reason why innovation scholars have introduced various patent-related indicators as a measure of the ‘quality’ of the inventive output. Integrating these measures of inventive activity with other firm-level information, such as accounting and financial data, is another challenging task. A major problem in this field is represented by the difficulty of harmonizing information from different data sources. This is a relevant issue since inaccuracy in data merging and integration leads to measurement errors and biased results. An important source of measurement error arises from inaccuracies in matching data on innovators across different datasets. This study reports on a test of company names standardization and matching. Our test is based on two data sources: the PATSTAT patent database and the Amadeus accounting and financial dataset. Earlier studies have mostly relied on manual, ad-hoc methods. More recently scholars have started experimenting with automatic matching techniques. This paper contributes to this body of research by comparing two different approaches – the character-tocharacter match of standardized company names (perfect matching) and the approximate matching based on string similarity functions. Our results show that approximate matching yields substantial gains over perfect matching, in terms of frequency of positive matches, with a limited loss of precision – i.e., low rates of false matches and false negatives.

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help file. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: ftp://ftp.unibocconi.it/pub/RePEc/cri/papers/WP211ThomaTorrisi.pdf
File Format: application/pdf
File Function:
Download Restriction: no

Publisher Info
Paper provided by CESPRI, Centre for Research on Innovation and Internationalisation, Universita' Bocconi, Milano, Italy in its series CESPRI Working Papers with number 211.

Download reference. The following formats are available: HTML, plain text, BibTeX, RIS (EndNote), ReDIF
Length: pages 24
Date of creation: Dec 2007
Date of revision: Dec 2007
Handle: RePEc:cri:cespri:wp211

Contact details of provider:
Postal: via Sarfatti, 25 - 20136 Milano - Italy
Phone: +39.025836.3397
Fax: +39.025836.3399
Web page: http://www.cespri.unibocconi.it/

Order Information:
Postal: E G E A - via R. Sarfatti, 25 - 20136 Milano -Italy

For technical questions regarding this item, or to correct its listing, contact: (Roberta Ometti).

Related research
Keywords: innovation statistics patents matching company names software.

Other versions of this item:

Find related papers by JEL classification:
C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Microeconomic Data
C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
O31 - Economic Development, Technological Change, and Growth - - Technological Change - - - Innovation and Invention: Processes and Incentives
O34 - Economic Development, Technological Change, and Growth - - Technological Change - - - Intellectual Property Rights

This paper has been announced in the following NEP Reports:

This item is featured on the following reading lists:
  1. Socio-Economics of Innovation
References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
  1. Dietmar Harhoff & Francis Narin & F. M. Scherer & Katrin Vopel, 1999. "Citation Frequency And The Value Of Patented Inventions," The Review of Economics and Statistics, MIT Press, vol. 81(3), pages 511-515, August. [Downloadable!] (restricted)
  2. Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2007. "The market value of patents and R&D: Evidence from European firms," NBER Working Papers 13426, National Bureau of Economic Research, Inc. [Downloadable!] (restricted)
    Other versions:
  3. Bronwyn H. Hall & Adam Jaffe & Manuel Trajtenberg, 2005. "Market Value and Patent Citations," RAND Journal of Economics, The RAND Corporation, vol. 36(1), pages 16-38, Spring.
  4. Griliches, Zvi, 1990. "Patent Statistics as Economic Indicators: A Survey," Journal of Economic Literature, American Economic Association, vol. 28(4), pages 1661-1707, December. [Downloadable!] (restricted)
    Other versions:
  5. Griliches, Zvi, 1981. "Market value, R&D, and patents," Economics Letters, Elsevier, vol. 7(2), pages 183-187. [Downloadable!] (restricted)
  6. Jean O. Lanjouw & Mark Schankerman, 2004. "Patent Quality and Research Productivity: Measuring Innovation with Multiple Indicators," Economic Journal, Royal Economic Society, vol. 114(495), pages 441-465, 04. [Downloadable!] (restricted)
  7. Hall, Bronwyn H & Jaffe, Adam B & Trajtenberg, Manuel, 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," CEPR Discussion Papers 3094, C.E.P.R. Discussion Papers. [Downloadable!] (restricted)
    Other versions:
  8. Richard C. Levin & Alvin K. Klevorick & Richard R. Nelson & Sidney G. Winter, 1987. "Appropriating the Returns from Industrial Research and Development," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 18(1987-3), pages 783-832. [Downloadable!]
  9. repec:fth:harver:1473 is not listed on IDEAS
Full references

Statistics
Access and download statistics

Did you know? To receive notification of recent additions to the database, subscribe to the free NEP reports.

This page was last updated on 2008-7-29.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.