IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v3y2009i1p49-63.html

Document–document similarity approaches and science mapping: Experimental comparison of five approaches

Author

Listed:
  • Ahlgren, Per
  • Colliander, Cristian

Abstract

This paper treats document–document similarity approaches in the context of science mapping. Five approaches, involving nine methods, are compared experimentally. We compare text-based approaches, the citation-based bibliographic coupling approach, and approaches that combine text-based approaches and bibliographic coupling. Forty-three articles, published in the journal Information Retrieval, are used as test documents. We investigate how well the approaches agree with a ground truth subject classification of the test documents, when the complete linkage method is used, and under two types of similarities, first-order and second-order. The results show that it is possible to achieve a very good approximation of the classification by means of automatic grouping of articles. One text-only method and one combination method, under second-order similarities in both cases, give rise to cluster solutions that to a large extent agree with the classification.

Suggested Citation

  • Ahlgren, Per & Colliander, Cristian, 2009. "Document–document similarity approaches and science mapping: Experimental comparison of five approaches," Journal of Informetrics, Elsevier, vol. 3(1), pages 49-63.
  • Handle: RePEc:eee:infome:v:3:y:2009:i:1:p:49-63
    DOI: 10.1016/j.joi.2008.11.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157708000680
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2008.11.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    2. Per Ahlgren & Bo Jarneving & Ronald Rousseau, 2003. "Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 54(6), pages 550-560, April.
    3. Bénédicte Vidaillet & V. d'Estaintot & P. Abécassis, 2005. "Introduction," Post-Print hal-00287137, HAL.
    4. M. M. Kessler, 1963. "Bibliographic coupling between scientific papers," American Documentation, Wiley Blackwell, vol. 14(1), pages 10-25, January.
    5. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    6. M. M. Kessler, 1965. "Comparison of the results of bibliographic coupling and analytic subject indexing," American Documentation, Wiley Blackwell, vol. 16(3), pages 223-233, July.
    7. Per Ahlgren & Bo Jarneving, 2008. "Bibliographic coupling, common abstract stems and clustering: A comparison of two document-document similarity approaches in the context of science mapping," Scientometrics, Springer;Akadémiai Kiadó, vol. 76(2), pages 273-290, August.
    8. H. P. F. Peters & R. R. Braam & A. F. J. van Raan, 1995. "Cognitive resemblance and citation relations in chemical engineering publications," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 46(1), pages 9-21, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jarneving, Bo, 2007. "Complete graphs and bibliographic coupling: A test of the applicability of bibliographic coupling for the identification of cognitive cores on the field level," Journal of Informetrics, Elsevier, vol. 1(4), pages 338-356.
    2. Jarneving, Bo, 2007. "Bibliographic coupling and its application to research-front and other core documents," Journal of Informetrics, Elsevier, vol. 1(4), pages 287-307.
    3. Peter Sjögårde & Per Ahlgren, 2024. "Normalization of direct citations for clustering in publication-level networks: evaluation of six approaches," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(3), pages 1949-1968, March.
    4. García-Lillo, Francisco & Seva-Larrosa, Pedro & Sánchez-García, Eduardo, 2024. "On the basis of research on ‘green’ in the disciplines of management and business," Journal of Business Research, Elsevier, vol. 172(C).
    5. Viergutz, Tim & Schulze-Ehlers, Birgit, 2018. "The use of hybrid scientometric clustering for systematic literature reviews in business and economics," DARE Discussion Papers 1804, Georg-August University of Göttingen, Department of Agricultural Economics and Rural Development (DARE).
    6. Christian Sternitzke & Isumo Bergmann, 2009. "Similarity measures for document mapping: A comparative study on the level of an individual scientist," Scientometrics, Springer;Akadémiai Kiadó, vol. 78(1), pages 113-130, January.
    7. García-Lillo, Francisco & Seva-Larrosa, Pedro & Sánchez-García, Eduardo, 2023. "What is going on in entrepreneurship research? A bibliometric and SNA analysis," Journal of Business Research, Elsevier, vol. 158(C).
    8. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    9. van Eck, N.J.P. & Waltman, L., 2009. "How to Normalize Co-Occurrence Data? An Analysis of Some Well-Known Similarity Measures," ERIM Report Series Research in Management ERS-2009-001-LIS, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    10. Ying Huang & Wolfgang Glänzel & Lin Zhang, 2021. "Tracing the development of mapping knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6201-6224, July.
    11. Raphaël Maucuer & Alexandre Renaud & Sébastien Ronteau & Laurent Muzellec, 2022. "What can we learn from marketers? A bibliometric analysis of the marketing literature on business model research," Post-Print hal-03718522, HAL.
    12. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    13. Perianes-Rodriguez, Antonio & Waltman, Ludo & van Eck, Nees Jan, 2016. "Constructing bibliometric networks: A comparison between full and fractional counting," Journal of Informetrics, Elsevier, vol. 10(4), pages 1178-1195.
    14. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    15. Chaoqun Ni & Cassidy R. Sugimoto & Jiepu Jiang, 2013. "Venue-author-coupling: A measure for identifying disciplines through author communities," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(2), pages 265-279, February.
    16. Cristian Colliander & Per Ahlgren, 2012. "Experimental comparison of first and second-order similarities in a scientometric context," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 675-685, February.
    17. Kraker, Peter & Schlögl, Christian & Jack, Kris & Lindstaedt, Stefanie, 2015. "Visualization of co-readership patterns from an online reference management system," Journal of Informetrics, Elsevier, vol. 9(1), pages 169-182.
    18. Ali Gazni & Fereshteh Didegah, 2016. "The relationship between authors’ bibliographic coupling and citation exchange: analyzing disciplinary differences," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 609-626, May.
    19. Per Ahlgren & Bo Jarneving, 2008. "Bibliographic coupling, common abstract stems and clustering: A comparison of two document-document similarity approaches in the context of science mapping," Scientometrics, Springer;Akadémiai Kiadó, vol. 76(2), pages 273-290, August.
    20. Ruhao Zhang & Junpeng Yuan, 2022. "Enhanced author bibliographic coupling analysis using semantic and syntactic citation information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7681-7706, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:3:y:2009:i:1:p:49-63. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.