How To Kill Inventors: Testing The Massacrator© Algorithm For Inventor Disambiguation
AbstractInventor disambiguation is an increasingly important issue for users of patent data. We propose and test a number of refinements to the Massacrator© algorithm, originally proposed by Lissoni et al. (2006) and now applied to APE-INV, a free access database funded by the European Science Foundation. Following Raffo and Lhuillery (2009) we describe disambiguation as a 3-step process: cleaning&parsing, matching, and filtering. By means of sensitivity analysis, based on MonteCarlo simulations, we show how various filtering criteria can be manipulated in order to obtain optimal combinations of precision and recall (type I and type II errors). We also show how these different combinations generate different results for applications to studies on inventors\' productivity, mobility, and networking. The filtering criteria based upon information on inventors\' addresses are sensitive to data quality, while those based upon information on co-inventorship networks are always effective. Details on data access and data quality improvement via feedback collection are also discussed.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by Groupe de Recherche en Economie Théorique et Appliquée in its series Cahiers du GREThA with number 2012-29.
Date of creation: 2012
Date of revision:
Contact details of provider:
Postal: Avenue Léon Duguit, 33608 Pessac Cedex
Phone: +33 (0)188.8.131.52.75
Fax: +33 (0)184.108.40.206.47
Web page: http://gretha.u-bordeaux4.fr/
More information through EDIRC
patent data; inventors; name disambiguation;
Find related papers by JEL classification:
- C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
- C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
- O34 - Economic Development, Technological Change, and Growth - - Technological Change; Research and Development; Intellectual Property Rights - - - Intellectual Property and Intellectual Capital
This paper has been announced in the following NEP Reports:
- NEP-ALL-2013-01-07 (All new papers)
- NEP-CMP-2013-01-07 (Computational Economics)
- NEP-INO-2013-01-07 (Innovation)
- NEP-IPR-2013-01-07 (Intellectual Property Rights)
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Gerald Marschke, 2006.
"The influence of university research on industrial innovation,"
Federal Reserve Bank of Cleveland.
- Jinyoung Kim & Sangjoon John Lee & Gerald Marschke, 2010. "The Influence of University Research on Industrial Innovation," Discussion Paper Series 1006, Institute of Economic Research, Korea University.
- Jinyoung Kim & Sangjoon John Lee & Gerald Marschke, 2005. "The Influence of University Research on Industrial Innovation," NBER Working Papers 11447, National Bureau of Economic Research, Inc.
- Stefano Breschi & Francesco Lissoni & Fabio Montobbio, 2006. "University patenting and scientific productivity. A quantitative study of Italian academic inventors," KITeS Working Papers 189, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Nov 2006.
- Matt Marx & Deborah Strumsky & Lee Fleming, 2009. "Mobility, Skills, and the Michigan Non-Compete Experiment," Management Science, INFORMS, vol. 55(6), pages 875-889, June.
- Julio Raffo & Stéphane Lhuillery, 2009.
"How to play the “Names Game”: Patent retrieval comparing different heuristics,"
CEMI Working Papers
cemi-workingpaper-2009-00, Ecole Polytechnique Fédérale de Lausanne, Collège du Management de la Technologie, Management of Technology and Entrepreneurship Institute, Chaire en Economie et Management de l'Innovation.
- Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
- Grid Thoma & Salvatore Torrisi, 2007. "Creating Powerful Indicators for Innovation Studies with Approximate Matching Algorithms. A test based on PATSTAT and Amadeus databases," KITeS Working Papers 211, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Dec 2007.
- Stefano Breschi & Francesco Lissoni, 2009. "Mobility of skilled workers and co-invention networks: an anatomy of localized knowledge flows," Journal of Economic Geography, Oxford University Press, vol. 9(4), pages 439-468, July.
- Nicolas CARAYOL (GREThA UMR CNRS 5113) & Lorenzo CASSI (CES, Université Paris 1 Panthéon Sorbonne - CNRS), 2009.
"Who\'s Who in Patents. A Bayesian approach,"
Cahiers du GREThA
2009-07, Groupe de Recherche en Economie Théorique et Appliquée.
- Lorenzo Cassi & Nicolas Carayol, 2009. "Who's Who in Patents. A Bayesian approach," Working Papers hal-00631750, HAL.
- Lorenzo Cassi & Nicolas Carayol, 2009. "Who's Who in Patents. A Bayesian approach," UniversitÃ© Paris1 PanthÃ©on-Sorbonne (Post-Print and Working Papers) hal-00631750, HAL.
- Grid Thoma & Salvatore Torrisi & Alfonso Gambardella & Dominique Guellec & Bronwyn H. Hall & Dietmar Harhoff, 2010. "Harmonizing and Combining Large Datasets – An Application to Firm-Level Patent and Accounting Data," NBER Working Papers 15851, National Bureau of Economic Research, Inc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Vincent Frigant).
If references are entirely missing, you can add them using this form.