Grid Thoma (Department of Mathematics and Computer Science, University of Camerino and CESPRI - Bocconi University, Milan, Italy.) Salvatore Torrisi (Department of Management, Univesity of Bologna and CESPRI - Bocconi University, Milan, Italy.)
Additional information is available for the following
registered author(s):
The lack of firm-level data on innovative activities has always constrained the development of empirical studies on innovation. More recently, the availability of large datasets on indicators, such as R&D expenditures and patents, has relaxed these constrains and spurred the growth of a new wave of research. However, measuring innovation still remains a difficult task for reasons linked to the quality of available indicators and the difficulty of integrating innovation indicators to other firm-level data. As regards quality, data on R&D expenditures represent a measure of input but do not tell much about the ‘success’ of innovative activities. Moreover, especially in the case of European firms, data on R&D expenditures are often missing because reporting these expenditures is not required by accounting and fiscal regulations in some countries. An increasing number of studies have used patents counts as a measure of inventive output. However, crude patent counts are a biased indicator of inventive output because they do not account for differences in the value of patented inventions. This is the reason why innovation scholars have introduced various patent-related indicators as a measure of the ‘quality’ of the inventive output. Integrating these measures of inventive activity with other firm-level information, such as accounting and financial data, is another challenging task. A major problem in this field is represented by the difficulty of harmonizing information from different data sources. This is a relevant issue since inaccuracy in data merging and integration leads to measurement errors and biased results. An important source of measurement error arises from inaccuracies in matching data on innovators across different datasets. This study reports on a test of company names standardization and matching. Our test is based on two data sources: the PATSTAT patent database and the Amadeus accounting and financial dataset. Earlier studies have mostly relied on manual, ad-hoc methods. More recently scholars have started experimenting with automatic matching techniques. This paper contributes to this body of research by comparing two different approaches – the character-tocharacter match of standardized company names (perfect matching) and the approximate matching based on string similarity functions. Our results show that approximate matching yields substantial gains over perfect matching, in terms of frequency of positive matches, with a limited loss of precision – i.e., low rates of false matches and false negatives.
Download Info
To download:
If you experience problems downloading a file, check if you have the
proper application to
view it first. Information about this may be contained
in the File-Format links below. In case of further problems read
the IDEAS help
file. Note that these files are not on the IDEAS
site. Please be patient as the files may be large.
Publisher Info
Paper provided by CESPRI, Centre for Research on Innovation and Internationalisation, Universita' Bocconi, Milano, Italy in its series CESPRI Working Papers with number
211.
Length: pages 24 Date of creation: Dec 2007 Date of revision:
Dec 2007 Handle: RePEc:cri:cespri:wp211
Contact details of provider: Postal: via Sarfatti, 25 - 20136 Milano - Italy Phone: +39.025836.3397 Fax: +39.025836.3399 Web page: http://www.cespri.unibocconi.it/
Order Information: Postal: E G E A - via R. Sarfatti, 25 - 20136 Milano -Italy
For technical questions regarding this item, or to correct its listing, contact: (Roberta Ometti).
References listed on IDEAS Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.: