IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v111y2017i3d10.1007_s11192-017-2341-y.html
   My bibliography  Save this article

Use of ResearchGate and Google CSE for author name disambiguation

Author

Listed:
  • Mehmet Ali Abdulhayoglu

    (KU Leuven)

  • Bart Thijs

    (KU Leuven)

Abstract

Author name disambiguation plays a very important role in individual based bibliometric analysis and has suffered from lack of information. Therefore, some have tried to leverage external web sources to obtain additional evidence with success. However, the main problem is generally the high cost of extracting data from web pages due to their diverse designs. Considering this challenge, we employed ResearchGate (RG), a social network platform for scholars presenting their publication lists in a structured way. Even though the platform might be imperfect, it can be valuable when it is used along with traditional approaches for the purpose of confirmation. To this end, in our first (retrieval) stage we applied a graph based machine learning approach, connected components (CC) and formed clusters. Then, the data crawled from RG for the same authors were combined with the CC results in stage 2. We observed that 76.40% of the clusters formed by CC were confirmed by the RG data and they accounted for 68.33% of all citations. Second, a subset was drawn from the dataset by retaining those clusters having at least 10 members to examine the details. This time we additionally employed the Google Custom Search Engine (CSE) API to access authors’ web pages as a complementary tool to RG. We observed an F score of 0.95 when CC results were confirmed by RG&CSE. Almost the same success was observed when only the CC approach was applied. In addition, we observed that the publications identified and confirmed through the external sources were cited to a greater extent than those publications not found in the related external sources. Even though promising, there are still issues with the use of external sources. We have seen that many authors present only a few selected papers on the web. This hampers our procedure, making it unable to obtain the entire publication list. Missing publications affect bibliometric analysis adversely since all citation data is required. That is, if only the data confirmed via external sources is used, bibliometric indicators will be overestimated. On the other hand, our suggested methodology can potentially decrease the manual work required for individual based bibliometric analysis. The procedure may also present more reliable results by confirming cluster members derived from unsupervised grouping methods. This approach might be especially beneficial for large datasets where extensive manual work would otherwise be required.

Suggested Citation

  • Mehmet Ali Abdulhayoglu & Bart Thijs, 2017. "Use of ResearchGate and Google CSE for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1965-1985, June.
  • Handle: RePEc:spr:scient:v:111:y:2017:i:3:d:10.1007_s11192-017-2341-y
    DOI: 10.1007/s11192-017-2341-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-017-2341-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-017-2341-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ricardo G. Cota & Anderson A. Ferreira & Cristiano Nascimento & Marcos André Gonçalves & Alberto H. F. Laender, 2010. "An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(9), pages 1853-1870, September.
    2. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large‐scale research assessments," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    3. Ortega, José Luis, 2015. "Relationship between altmetric and bibliometric indicators across academic social sites: The case of CSIC's members," Journal of Informetrics, Elsevier, vol. 9(1), pages 39-49.
    4. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
    5. Mehmet Ali Abdulhayoglu & Bart Thijs & Wouter Jeuris, 2016. "Using character n-grams to match a list of publications to references in bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1525-1546, December.
    6. Mike Thelwall & Kayvan Kousha, 2015. "ResearchGate: Disseminating, communicating, and measuring Scholarship?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(5), pages 876-889, May.
    7. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
    2. Rehs, Andreas, 2021. "A supervised machine learning approach to author disambiguation in the Web of Science," Journal of Informetrics, Elsevier, vol. 15(3).
    3. Weiwei Yan & Yin Zhang & Wendy Bromfield, 2018. "Analyzing the follower–followee ratio to determine user characteristics and institutional participation differences among research universities on ResearchGate," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 299-316, April.
    4. Min Song & Keun Young Kang & Tatsawan Timakum & Xinyuan Zhang, 2020. "Examining influential factors for acknowledgements classification using supervised learning," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-21, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
    2. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    3. Jinseok Kim & Jason Owen-Smith, 2021. "ORCID-linked labeled data for evaluating author name disambiguation at scale," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2057-2083, March.
    4. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    5. Jinseok Kim, 2018. "Evaluating author name disambiguation for digital libraries: a case of DBLP," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1867-1886, September.
    6. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    7. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    8. Milojević, Staša, 2013. "Accuracy of simple, initials-based methods for author name disambiguation," Journal of Informetrics, Elsevier, vol. 7(4), pages 767-773.
    9. Abramo, Giovanni & D'Angelo, Ciriaco Andrea & Di Costa, Flavia, 2019. "Diversification versus specialization in scientific research: Which strategy pays off?," Technovation, Elsevier, vol. 82, pages 51-57.
    10. Shannon Mason & Yusuke Sakurai, 2021. "A ResearchGate-way to an international academic community?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1149-1171, February.
    11. Alison M. J. Buchan & Eva Jurczyk & Ruth Isserlin & Gary D. Bader, 2016. "Global neuroscience and mental health research: a bibliometrics case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(1), pages 515-531, October.
    12. Gianluca Fabiano & Andrea Marcellusi & Giampiero Favato, 2020. "Public–private contribution to biopharmaceutical discoveries: a bibliometric analysis of biomedical research in UK," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 153-168, July.
    13. Giovanni Abramo & Ciriaco Andrea D’Angelo, 2022. "Drivers of academic engagement in public–private research collaboration: an empirical study," The Journal of Technology Transfer, Springer, vol. 47(6), pages 1861-1884, December.
    14. Lutz Bornmann & Werner Marx, 2014. "How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 487-509, January.
    15. Abramo, Giovanni & D'Angelo, Ciriaco Andrea & Grilli, Leonardo, 2021. "The effects of citation-based research evaluation schemes on self-citation behavior," Journal of Informetrics, Elsevier, vol. 15(4).
    16. Sergio Copiello, 2019. "Research Interest: another undisclosed (and redundant) algorithm by ResearchGate," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 351-360, July.
    17. Abramo, Giovanni & D’Angelo, Ciriaco Andrea, 2014. "Assessing national strengths and weaknesses in research fields," Journal of Informetrics, Elsevier, vol. 8(3), pages 766-775.
    18. Yan, Weiwei & Zhang, Yin, 2018. "Research universities on the ResearchGate social networking site: An examination of institutional differences, research activity level, and social networks formed," Journal of Informetrics, Elsevier, vol. 12(1), pages 385-400.
    19. Giovanni Abramo & Ciriaco Andrea D’Angelo & Anastasiia Soldatenkova, 2016. "The dispersion of the citation distribution of top scientists’ publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1711-1724, December.
    20. Abramo, Giovanni & Cicero, Tindaro & D’Angelo, Ciriaco Andrea, 2015. "Should the research performance of scientists be distinguished by gender?," Journal of Informetrics, Elsevier, vol. 9(1), pages 25-38.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:111:y:2017:i:3:d:10.1007_s11192-017-2341-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.