This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

How to Normalize Co-Occurrence Data? An Analysis of Some Well-Known Similarity Measures

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Eck, N.J.P. van
Waltman, L.R. (Erasmus Research Institute of Management (ERIM), RSM Erasmus University)
Abstract

In scientometric research, the use of co-occurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this paper, we theoretically analyze the properties of similarity measures for co-occurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that co-occurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research.

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://hdl.handle.net/1765/14528
File Format: application/pdf
File Function:
Download Restriction: no

Publisher Info
Paper provided by Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam. in its series Research Paper with number ERS-2009-001-LIS Revision_Date: 2009-07-29.

Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Length:
Date of creation: 07 Jan 2009
Date of revision:
Handle: RePEc:dgr:eureri:1765014528

Contact details of provider:
Web page: http://www.erim.eur.nl/

For technical questions regarding this item, or to correct its listing, contact: (ERIM Series Handler at the ERIM Office).

Related research
Keywords: similarity measure; association strength; cosine; inclusion index; Jaccard index;

This paper has been announced in the following NEP Reports:

Statistics
Access and download statistics

Did you know? There is a FAQ (frequently asked questions).

This page was last updated on 2009-11-11.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.