Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases
AbstractThe lack of firm-level data on innovative activities has always constrained the development of empirical studies on innovation. More recently, the availability of large datasets on indicators, such as R&D expenditures and patents, has relaxed these constrains and spurred the growth of a new wave of research. However, measuring innovation still remains a difficult task for reasons linked to the quality of available indicators and the difficulty of integrating innovation indicators to other firm-level data. As regards quality, data on R&D expenditures represent a measure of input but do not tell much about the ‘success’ of innovative activities. Moreover, especially in the case of European firms, data on R&D expenditures are often missing because reporting these expenditures is not required by accounting and fiscal regulations in some countries. An increasing number of studies have used patents counts as a measure of inventive output. However, crude patent counts are a biased indicator of inventive output because they do not account for differences in the value of patented inventions. This is the reason why innovation scholars have introduced various patent-related indicators as a measure of the ‘quality’ of the inventive output. Integrating these measures of inventive activity with other firm-level information, such as accounting and financial data, is another challenging task. A major problem in this field is represented by the difficulty of harmonizing information from different data sources. This is a relevant issue since inaccuracy in data merging and integration leads to measurement errors and biased results. An important source of measurement error arises from inaccuracies in matching data on innovators across different datasets. This study reports on a test of company names standardization and matching. Our test is based on two data sources: the PATSTAT patent database and the Amadeus accounting and financial dataset. Earlier studies have mostly relied on manual, ad-hoc methods. More recently scholars have started experimenting with automatic matching techniques. This paper contributes to this body of research by comparing two different approaches – the character-tocharacter match of standardized company names (perfect matching) and the approximate matching based on string similarity functions. Our results show that approximate matching yields substantial gains over perfect matching, in terms of frequency of positive matches, with a limited loss of precision – i.e., low rates of false matches and false negatives.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy in its series KITeS Working Papers with number 211.
Length: pages 24
Date of creation: Dec 2007
Date of revision: Dec 2007
Contact details of provider:
Postal: via Sarfatti, 25 - 20136 Milano - Italy
Web page: http://www.kites.unibocconi.it/
Postal: E G E A - via R. Sarfatti, 25 - 20136 Milano -Italy
Find related papers by JEL classification:
- C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data
- C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
- O31 - Economic Development, Technological Change, and Growth - - Technological Change; Research and Development; Intellectual Property Rights - - - Innovation and Invention: Processes and Incentives
- O34 - Economic Development, Technological Change, and Growth - - Technological Change; Research and Development; Intellectual Property Rights - - - Intellectual Property Rights
This paper has been announced in the following NEP Reports:
- NEP-ACC-2008-02-02 (Accounting & Auditing)
- NEP-ALL-2008-02-02 (All new papers)
- NEP-INO-2008-02-02 (Innovation)
- NEP-IPR-2008-02-02 (Intellectual Property Rights)
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Hall, B. & Jaffe, A. & Trajtenberg, M., 2001.
"The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools,"
2001-29, Tel Aviv.
- Hall, Bronwyn H & Jaffe, Adam B & Trajtenberg, Manuel, 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," CEPR Discussion Papers 3094, C.E.P.R. Discussion Papers.
- Bronwyn H. Hall & Adam B. Jaffe & Manuel Trajtenberg, 2001. "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools," NBER Working Papers 8498, National Bureau of Economic Research, Inc.
- Paola Giuri & Myriam Mariani & Stefano Brusoni & Gustavo Crespi & Dominique Francoz & Alfonso Gambardella & Walter Garcia-Fontes & Aldo Geuna & Raul Gonzales & Dietmar Harhoff & Karin Hoisl & Christia, 2005.
"Everything you Always Wanted to Know about Inventors (but Never Asked): Evidence from the PatVal-EU Survey,"
LEM Papers Series
2005/20, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
- Harhoff, Dietmar & Hoisl, Karin, 2006. "Everything you Always Wanted to Know About Inventors (But Never Asked): Evidence from the PatVal-EU Survey," Discussion Papers in Business Administration 1261, University of Munich, Munich School of Management.
- Brusoni, Stefano & Crespi, Gustavo & Francoz, Dominique & Gambardella, Alfonso & Garcia-Fontes, Walter & Geuna, Aldo & Giuri, Paola & Gonzales, Raul & Harhoff, Dietmar & Hoisl, Karin & LeBas, Christia, 2006. "Everything You Always Wanted to Know about Inventors (But Never Asked): Evidence from the PatVal-EU Survey," CEPR Discussion Papers 5752, C.E.P.R. Discussion Papers.
- Giarratana, Marco S. & Fosfuri, Andrea, . "Product strategies and survival in schumpeterian environments: evidence from the us security software industry," Open Access publications from Universidad Carlos III de Madrid info:hdl:10016/7656, Universidad Carlos III de Madrid.
- Griliches, Zvi, 1990.
"Patent Statistics as Economic Indicators: A Survey,"
Journal of Economic Literature,
American Economic Association, vol. 28(4), pages 1661-1707, December.
- Zvi Griliches, 1998. "Patent Statistics as Economic Indicators: A Survey," NBER Chapters, in: R&D and Productivity: The Econometric Evidence, pages 287-343 National Bureau of Economic Research, Inc.
- Zvi Griliches, 1991. "Patent Statistics as Economic Indicators: A Survey," NBER Working Papers 3301, National Bureau of Economic Research, Inc.
- Dietmar Harhoff & Francis Narin & F. M. Scherer & Katrin Vopel, 1999. "Citation Frequency And The Value Of Patented Inventions," The Review of Economics and Statistics, MIT Press, vol. 81(3), pages 511-515, August.
- Richard C. Levin & Alvin K. Klevorick & Richard R. Nelson & Sidney G. Winter, 1987. "Appropriating the Returns from Industrial Research and Development," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 18(3), pages 783-832.
- repec:fth:harver:1473 is not listed on IDEAS
- Jean O. Lanjouw & Mark Schankerman, 2004. "Patent Quality and Research Productivity: Measuring Innovation with Multiple Indicators," Economic Journal, Royal Economic Society, vol. 114(495), pages 441-465, 04.
- Giarratana, Marco S. & Fosfuri, Andrea, 2007. "Product strategies and survival in schumpeterian environments: evidence from the US security software industry," Open Access publications from Universidad Carlos III de Madrid info:hdl:10016/13428, Universidad Carlos III de Madrid.
- Griliches, Zvi, 1981.
"Market value, R&D, and patents,"
Elsevier, vol. 7(2), pages 183-187.
- Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2007.
"The market value of patents and R&D: Evidence from European firms,"
NBER Working Papers
13426, National Bureau of Economic Research, Inc.
- Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2006. "The market value of patents and R&D: Evidence from European firms," KITeS Working Papers 186, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Nov 2006.
- Criscuolo, Paola & Verspagen, Bart, 2008.
"Does it matter where patent citations come from? Inventor vs. examiner citations in European patents,"
Elsevier, vol. 37(10), pages 1892-1908, December.
- Criscuolo, Paola & Verspagen, Bart, 2008. "Does it matter where patent citations come from? Inventor vs. examiner citations in European patents," Open Access publications from Maastricht University urn:nbn:nl:ui:27-18009, Maastricht University.
- Dornbusch, Friedrich & Schmoch, Ulrich & Schulze, Nicole & Bethke, Nadine, 2012. "Identification of university-based patents: A new large-scale approach," Discussion Papers "Innovation Systems and Policy Analysis" 32, Fraunhofer Institute for Systems and Innovation Research (ISI).
- Michele PEZZONI (University of Milano-Bicocca - KiTES-Università Bocconi - Observatoire des Sciences et des Techniques) & Francesco LISSONI (GREThA, CNRS, UMR 5113 - KiTES) & Gianluca TARASCONI (KiTE, 2012. "How To Kill Inventors: Testing The Massacrator© Algorithm For Inventor Disambiguation," Cahiers du GREThA 2012-29, Groupe de Recherche en Economie Théorique et Appliquée.
- Ernest Miguélez & Ismael Gómez-Miguélez, 2011.
"“Singling out individual inventors from patent data”,"
IREA Working Papers
201105, University of Barcelona, Research Institute of Applied Economics, revised May 2011.
- Ernest Miguélez & Ismael Gómez-Miguélez, 2011. "Singling out individual inventors from patent data," Working Papers XREAP2011-03, Xarxa de Referència en Economia Aplicada (XREAP), revised May 2011.
- Julio Raffo & Stéphane Lhuillery, 2009.
"How to play the “Names Game”: Patent retrieval comparing different heuristics,"
CEMI Working Papers
cemi-workingpaper-2009-00, Ecole Polytechnique Fédérale de Lausanne, Collège du Management de la Technologie, Management of Technology and Entrepreneurship Institute, Chaire en Economie et Management de l'Innovation.
- Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Valerio Sterzi).
If references are entirely missing, you can add them using this form.