Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases

My bibliography Save this paper

Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases

Author

Listed:

Grid Thoma
(Department of Mathematics and Computer Science, University of Camerino and CESPRI - Bocconi University, Milan, Italy.)
Salvatore Torrisi
(Department of Management, Univesity of Bologna and CESPRI - Bocconi University, Milan, Italy.)

Registered:

Abstract

The lack of firm-level data on innovative activities has always constrained the development of empirical studies on innovation. More recently, the availability of large datasets on indicators, such as R&D expenditures and patents, has relaxed these constrains and spurred the growth of a new wave of research. However, measuring innovation still remains a difficult task for reasons linked to the quality of available indicators and the difficulty of integrating innovation indicators to other firm-level data. As regards quality, data on R&D expenditures represent a measure of input but do not tell much about the ‘success’ of innovative activities. Moreover, especially in the case of European firms, data on R&D expenditures are often missing because reporting these expenditures is not required by accounting and fiscal regulations in some countries. An increasing number of studies have used patents counts as a measure of inventive output. However, crude patent counts are a biased indicator of inventive output because they do not account for differences in the value of patented inventions. This is the reason why innovation scholars have introduced various patent-related indicators as a measure of the ‘quality’ of the inventive output. Integrating these measures of inventive activity with other firm-level information, such as accounting and financial data, is another challenging task. A major problem in this field is represented by the difficulty of harmonizing information from different data sources. This is a relevant issue since inaccuracy in data merging and integration leads to measurement errors and biased results. An important source of measurement error arises from inaccuracies in matching data on innovators across different datasets. This study reports on a test of company names standardization and matching. Our test is based on two data sources: the PATSTAT patent database and the Amadeus accounting and financial dataset. Earlier studies have mostly relied on manual, ad-hoc methods. More recently scholars have started experimenting with automatic matching techniques. This paper contributes to this body of research by comparing two different approaches – the character-tocharacter match of standardized company names (perfect matching) and the approximate matching based on string similarity functions. Our results show that approximate matching yields substantial gains over perfect matching, in terms of frequency of positive matches, with a limited loss of precision – i.e., low rates of false matches and false negatives.

Suggested Citation

Grid Thoma & Salvatore Torrisi, 2007. "Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases," KITeS Working Papers 211, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Dec 2007.

Handle: RePEc:cri:cespri:wp211

Download full text from publisher

References listed on IDEAS

Fosfuri, Andrea & Giarratana, Marco S., 2004. "Product strategies and startups' survival in turbulent industries: evidence from the security software industry," DEE - Working Papers. Business Economics. WB wb044816, Universidad Carlos III de Madrid. Departamento de EconomÃa de la Empresa.
Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2006. "The market value of patents and R&D: Evidence from European firms," KITeS Working Papers 186, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Nov 2006.
- Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2007. "The market value of patents and R&D: Evidence from European firms," NBER Working Papers 13426, National Bureau of Economic Research, Inc.
Paola Giuri & Myriam Mariani & Stefano Brusoni & Gustavo Crespi & Dominique Francoz & Alfonso Gambardella & Walter Garcia-Fontes & Aldo Geuna & Raul Gonzales & Dietmar Harhoff & Karin Hoisl & Christia, 2005. "Everything you Always Wanted to Know about Inventors (but Never Asked): Evidence from the PatVal-EU Survey," LEM Papers Series 2005/20, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
- Harhoff, Dietmar & Hoisl, Karin, 2006. "Everything you Always Wanted to Know About Inventors (But Never Asked): Evidence from the PatVal-EU Survey," Discussion Papers in Business Administration 1261, University of Munich, Munich School of Management.
- Harhoff, Dietmar & Garcia-Fontes, Walter & Gambardella, Alfonso & Giuri, Paola & Mariani, Myriam & Luzzi, Alessandra & Brusoni, Stefano & , & Francoz, Dominique & Geuna, Aldo & Gonzales, Raul & Hoisl,, 2006. "Everything You Always Wanted to Know about Inventors (But Never Asked): Evidence from the PatVal-EU Survey," CEPR Discussion Papers 5752, C.E.P.R. Discussion Papers.
Bronwyn H. Hall & Adam Jaffe & Manuel Trajtenberg, 2005. "Market Value and Patent Citations," RAND Journal of Economics, The RAND Corporation, vol. 36(1), pages 16-38, Spring.
- Hall, Bronwyn H. & Jaffe, A & Trajtenberg, M, 2005. "Market value and patent citations," Department of Economics, Working Paper Series qt0cs6v2w7, Department of Economics, Institute for Business and Economic Research, UC Berkeley.
Zvi Griliches, 1984. "Market Value, R&D, and Patents," NBER Chapters, in: R&D, Patents, and Productivity, pages 249-252, National Bureau of Economic Research, Inc.
- Griliches, Zvi, 1981. "Market value, R&D, and patents," Economics Letters, Elsevier, vol. 7(2), pages 183-187.
Hall, B. & Jaffe, A. & Trajtenberg, M., 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," Papers 2001-29, Tel Aviv.
- Bronwyn H. Hall & Adam B. Jaffe & Manuel Trajtenberg, 2001. "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools," NBER Working Papers 8498, National Bureau of Economic Research, Inc.
- Hall, Bronwyn & Trajtenberg, Manuel & Jaffe, Adam B, 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," CEPR Discussion Papers 3094, C.E.P.R. Discussion Papers.
repec:fth:harver:1473 is not listed on IDEAS
Zvi Griliches, 1998. "Patent Statistics as Economic Indicators: A Survey," NBER Chapters, in: R&D and Productivity: The Econometric Evidence, pages 287-343, National Bureau of Economic Research, Inc.
- Griliches, Zvi, 1990. "Patent Statistics as Economic Indicators: A Survey," Journal of Economic Literature, American Economic Association, vol. 28(4), pages 1661-1707, December.
- Zvi Griliches, 1990. "Patent Statistics as Economic Indicators: A Survey," NBER Working Papers 3301, National Bureau of Economic Research, Inc.
Zvi Griliches & Bronwyn H. Hall & Ariel Pakes, 1988. "R&D, Patents, and Market Value Revisited: Is There Evidence of A SecondTechnological Opportunity Related Factor?," NBER Working Papers 2624, National Bureau of Economic Research, Inc.
Dietmar Harhoff & Francis Narin & F. M. Scherer & Katrin Vopel, 1999. "Citation Frequency And The Value Of Patented Inventions," The Review of Economics and Statistics, MIT Press, vol. 81(3), pages 511-515, August.
Richard C. Levin & Alvin K. Klevorick & Richard R. Nelson & Sidney G. Winter, 1987. "Appropriating the Returns from Industrial Research and Development," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 18(3, Specia), pages 783-832.
Jean O. Lanjouw & Mark Schankerman, 2004. "Patent Quality and Research Productivity: Measuring Innovation with Multiple Indicators," Economic Journal, Royal Economic Society, vol. 114(495), pages 441-465, April.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Ernest Miguélez & Rosina Moreno & Jordi Suriñach, 2010. "Inventors on the move: Tracing inventors' mobility and its spatial distribution," Papers in Regional Science, Wiley Blackwell, vol. 89(2), pages 251-274, June.
- Ernest Miguélez & Rosina Moreno & Jordi Suriñach, 2010. "Inventors on the move: Tracing inventors' mobility and its spatial distribution," Post-Print hal-03910248, HAL.
Seung Hwan Kim & Jeong hwan Jeon & Anwar Aridi & Bogang Jun, 2022. "Factors that affect the technological transition of firms toward the industry 4.0 technologies," Papers 2209.02239, arXiv.org.
Tarasconi, Gianluca & Kang, Byeongwoo, 2015. "PATSTAT revisited," IDE Discussion Papers 527, Institute of Developing Economies, Japan External Trade Organization(JETRO).
Roberta Piergiovanni & Enrico Santarelli, 2013. "The more you spend, the more you get? The effects of R&D and capital expenditures on the patenting activities of biotechnology firms," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(2), pages 497-521, February.
- Roberta Piergiovanni & Enrico Santarelli, 2010. "The More You Spend, the More You Get? The Effects of R&D and Capital Expenditures on the Patenting Activities of Biotechnology Firms," JRC Working Papers on Corporate R&D and Innovation 2010-06, Joint Research Centre.
Grid Thoma & Salvatore Torrisi & Alfonso Gambardella & Dominique Guellec & Bronwyn H. Hall & Dietmar Harhoff, 2010. "Harmonizing and Combining Large Datasets - An Application to Firm-Level Patent and Accounting Data," NBER Working Papers 15851, National Bureau of Economic Research, Inc.
Dornbusch, Friedrich & Schmoch, Ulrich & Schulze, Nicole & Bethke, Nadine, 2012. "Identification of university-based patents: A new large-scale approach," Discussion Papers "Innovation Systems and Policy Analysis" 32, Fraunhofer Institute for Systems and Innovation Research (ISI).
Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
- Michele PEZZONI & Francesco LISSONI & Gianluca TARASCONI, 2012. "How To Kill Inventors: Testing The Massacrator© Algorithm For Inventor Disambiguation," Cahiers du GREThA (2007-2019) 2012-29, Groupe de Recherche en Economie Théorique et Appliquée (GREThA).
- Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Post-Print halshs-01074536, HAL.
Ernest Miguélez & Ismael Gómez-Miguélez, 2011. "“Singling out individual inventors from patent data”," IREA Working Papers 201105, University of Barcelona, Research Institute of Applied Economics, revised May 2011.
- Ernest Miguélez & Ismael Gómez-Miguélez, 2011. "Singling out individual inventors from patent data," Working Papers XREAP2011-03, Xarxa de Referència en Economia Aplicada (XREAP), revised May 2011.
Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
- Julio Raffo & Stéphane Lhuillery, 2009. "How to play the “Names Game”: Patent retrieval comparing different heuristics," CEMI Working Papers cemi-workingpaper-2009-00, Ecole Polytechnique Fédérale de Lausanne, Collège du Management de la Technologie, Management of Technology and Entrepreneurship Institute, Chaire en Economie et Management de l'Innovation.
Kim, Seung Hwan & Jun, Bogang & Lee, Jeong-Dong, 2021. "Technological relatedness: How do firms diversify their technology?," SocArXiv 47ank, Center for Open Science.
Zi‐Lin He & Tony W. Tong & Yuchen Zhang & Wenlong He, 2018. "Constructing a Chinese Patent Database of listed firms in China: Descriptions, lessons, and insights," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 579-606, September.
Kang, Taewon & Baek, Chulwoo & Lee, Jeong-Dong, 2019. "Effects of knowledge accumulation strategies through experience and experimentation on firm growth," Technological Forecasting and Social Change, Elsevier, vol. 144(C), pages 169-181.
Criscuolo, Paola & Verspagen, Bart, 2008. "Does it matter where patent citations come from? Inventor vs. examiner citations in European patents," Research Policy, Elsevier, vol. 37(10), pages 1892-1908, December.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2006. "The market value of patents and R&D: Evidence from European firms," KITeS Working Papers 186, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Nov 2006.
- Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2007. "The market value of patents and R&D: Evidence from European firms," NBER Working Papers 13426, National Bureau of Economic Research, Inc.
Yi Deng, 2005. "The Value of Knowledge Flows: Evidence from Patent Citations Data," Computing in Economics and Finance 2005 374, Society for Computational Economics.
Markus Simeth & Michele Cincera, 2016. "Corporate Science, Innovation, and Firm Value," Management Science, INFORMS, vol. 62(7), pages 1970-1981, July.
- Marcus Simeth & Michele Cincera, 2013. "Corporate Science, Innovation and Firm Value," Working Papers TIMES² 2013-006, ULB -- Universite Libre de Bruxelles.
- Markus Simeth & Michele Cincera, 2016. "Corporate science, innovation, and firm value," ULB Institutional Repository 2013/240033, ULB -- Universite Libre de Bruxelles.
Yi Deng, 2005. "The value of knowledge spillovers," Working Paper Series 2005-14, Federal Reserve Bank of San Francisco.
Burak Dindaroglu, 2010. "Intra-Industry Knowledge Spillovers and Scientific Labor Mobility," Discussion Papers 10-01, University at Albany, SUNY, Department of Economics.
- Dindaroğlu, Burak, 2014. "Scientific Labor Mobility, Market Value, and Knowledge Flows," MPRA Paper 88043, University Library of Munich, Germany.
Justus Baron & Henry Delcamp, 2012. "The private and social value of patents in discrete and cumulative innovation," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 581-606, February.
Chang, Hsiu-yun & Liang, Woan-lih & Wang, Yanzhi, 2019. "Do institutional investors still encourage patent-based innovation after the tech bubble period?," Journal of Empirical Finance, Elsevier, vol. 51(C), pages 149-164.
Giuri, Paola & Mariani, Myriam, 2007. "Inventors and invention processes in Europe: Results from the PatVal-EU survey," Research Policy, Elsevier, vol. 36(8), pages 1105-1106, October.
- Giuri, Paola & Mariani, Myriam & Brusoni, Stefano & Crespi, Gustavo & Francoz, Dominique & Gambardella, Alfonso & Garcia-Fontes, Walter & Geuna, Aldo & Gonzales, Raul & Harhoff, Dietmar & Hoisl, Karin, 2007. "Inventors and invention processes in Europe: Results from the PatVal-EU survey," Research Policy, Elsevier, vol. 36(8), pages 1107-1127, October.
Deng, Yi, 2005. "The Value of Knowledge Spillovers in the US Semiconductor Industry," Departmental Working Papers 0516, Southern Methodist University, Department of Economics, revised Nov 2006.
Nicolas van Zeebroeck, 2007. "Patents only live twice: a patent survival analysis in Europe," Working Papers CEB 07-028.RS, ULB -- Universite Libre de Bruxelles.
John Laitner & Dmitriy Stolyarov, 2013. "Derivative Ideas And The Value Of Intangible Assets," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 54(1), pages 59-95, February.
Nagaoka, Sadao & Motohashi, Kazuyuki & Goto, Akira, 2010. "Patent Statistics as an Innovation Indicator," Handbook of the Economics of Innovation, in: Bronwyn H. Hall & Nathan Rosenberg (ed.), Handbook of the Economics of Innovation, edition 1, volume 2, chapter 0, pages 1083-1127, Elsevier.
Guan-Can Yang & Gang Li & Chun-Ya Li & Yun-Hua Zhao & Jing Zhang & Tong Liu & Dar-Zen Chen & Mu-Hsuan Huang, 2015. "Using the comprehensive patent citation network (CPC) to evaluate patent value," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1319-1346, December.
Emanuele Bacchiocchi & Fabio Montobbio, 2010. "International Knowledge Diffusion and Home‐bias Effect: Do USPTO and EPO Patent Citations Tell the Same Story?," Scandinavian Journal of Economics, Wiley Blackwell, vol. 112(3), pages 441-470, September.
- Emanuele Bacchiocchi & Fabio Montobbio, 2009. "International knowledge diffusion and home-bias effect. Do USPTO and EPO patent citations tell the same story?," KITeS Working Papers 015, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Feb 2009.
Dirk Czarnitzki & Katrin Hussinger & Bart Leten, 2020. "How Valuable are Patent Blocking Strategies?," Review of Industrial Organization, Springer;The Industrial Organization Society, vol. 56(3), pages 409-434, May.
Myriam Mariani & Marzia Romanelli, 2006. ""Stacking" or "Picking" Patents? The Inventors' Choice Between Quantity and Quality," LEM Papers Series 2006/06, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
Mohd Shadab Danish & Pritam Ranjan & Ruchi Sharma, 2022. "Assessing the Impact of Patent Attributes on the Value of Discrete and Complex Innovations," Papers 2208.07222, arXiv.org.
Mohd Shadab Danish & Pritam Ranjan & Ruchi Sharma, 2021. "Identification of “Valuable” Technologies via Patent Statistics in India: An Analysis Based on Renewal Information," BASE University Working Papers 13/2021, BASE University, Bengaluru, India.
Nicolas van Zeebroeck, 2011. "The puzzle of patent value indicators," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 20(1), pages 33-62.
- Nicolas van Zeebroeck, 2007. "The puzzle of patent value indicators," Working Papers CEB 07-023.RS, ULB -- Universite Libre de Bruxelles.
- Nicolas van Zeebroeck, 2011. "The Puzzle of Patent Value Indicators," ULB Institutional Repository 2013/60729, ULB -- Universite Libre de Bruxelles.
Chiara Pederzoli & Grid Thoma & Costanza Torricelli, 2011. "Modelling credit risk for innovative firms: the role of innovation measures," Centro Studi di Banca e Finanza (CEFIN) (Center for Studies in Banking and Finance) 0025, Universita di Modena e Reggio Emilia, Dipartimento di Economia "Marco Biagi".

More about this item

Keywords

innovation statistics; patents; matching company names; software.;
All these keywords.

JEL classification:

C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
O31 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Innovation and Invention: Processes and Incentives
O34 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Intellectual Property and Intellectual Capital

NEP fields

This paper has been announced in the following NEP Reports:

NEP-ACC-2008-02-02 (Accounting and Auditing)
NEP-INO-2008-02-02 (Innovation)
NEP-IPR-2008-02-02 (Intellectual Property Rights)

Lists

This item is featured on the following reading lists, Wikipedia, or ReplicationWiki pages:

Socio-Economics of Innovation

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cri:cespri:wp211. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Valerio Sterzi (email available below). General contact details of provider: http://www.kites.unibocconi.it/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

JEL classification:

NEP fields

Lists

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data