IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v123y2020i2d10.1007_s11192-020-03410-y.html
   My bibliography  Save this article

Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation

Author

Listed:
  • Ciriaco Andrea D’Angelo

    (University of Rome “Tor Vergata”)

  • Nees Jan Eck

    (Leiden University)

Abstract

The disambiguation of author names is an important and challenging task in bibliometrics. We propose an approach that relies on an external source of information for selecting and validating clusters of publications identified through an unsupervised author name disambiguation method. The application of the proposed approach to a random sample of Italian scholars shows encouraging results, with an overall precision, recall, and F-measure of over 96%. The proposed approach can serve as a starting point for large-scale census of publication portfolios for bibliometric analyses at the level of individual researchers.

Suggested Citation

  • Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
  • Handle: RePEc:spr:scient:v:123:y:2020:i:2:d:10.1007_s11192-020-03410-y
    DOI: 10.1007/s11192-020-03410-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03410-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03410-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Dag W. Aksnes, 2008. "When different persons have an identical author name. How frequent are homonyms?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(5), pages 838-841, March.
    2. Hirotaka Kawashima & Hiroyuki Tomizawa, 2015. "Accuracy evaluation of Scopus Author ID based on the largest funding database in Japan," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(3), pages 1061-1071, June.
    3. Ricardo G. Cota & Anderson A. Ferreira & Cristiano Nascimento & Marcos André Gonçalves & Alberto H. F. Laender, 2010. "An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(9), pages 1853-1870, September.
    4. Vincent Larivière & Rodrigo Costas, 2016. "How Many Is Too Many? On the Relationship between Research Productivity and Impact," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-10, September.
    5. Wanli Liu & Rezarta Islamaj Doğan & Sun Kim & Donald C. Comeau & Won Kim & Lana Yeganova & Zhiyong Lu & W. John Wilbur, 2014. "Author name disambiguation for PubMed," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 765-781, April.
    6. Mehmet Ali Abdulhayoglu & Bart Thijs, 2017. "Use of ResearchGate and Google CSE for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1965-1985, June.
    7. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large‐scale research assessments," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    8. Jan Youtie & Stephen Carley & Alan L. Porter & Philip Shapira, 2017. "Tracking researchers and their outputs: new insights from ORCIDs," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(1), pages 437-453, October.
    9. Zaida Chinchilla-Rodríguez & Yi Bu & Nicolás Robinson-García & Rodrigo Costas & Cassidy R. Sugimoto, 2018. "Travel bans and scientific mobility: utility of asymmetry and affinity indexes to inform science policy," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 569-590, July.
    10. Laurel L. Cornell, 1982. "Duplication of japanese names: a problem in citations and bibliographies," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 33(2), pages 102-104, March.
    11. Ruiz-Castillo, Javier & Costas, Rodrigo, 2014. "The skewness of scientific productivity," Journal of Informetrics, Elsevier, vol. 8(4), pages 917-934.
    12. Ricardo G. Cota & Anderson A. Ferreira & Cristiano Nascimento & Marcos André Gonçalves & Alberto H. F. Laender, 2010. "An unsupervised heuristic‐based hierarchical method for name disambiguation in bibliographic citations," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(9), pages 1853-1870, September.
    13. Robert Tijssen & Alfredo Yegros, 2017. "UK universities and European industry," Nature, Nature, vol. 544(7648), pages 35-35, April.
    14. Michael Levin & Stefan Krawczyk & Steven Bethard & Dan Jurafsky, 2012. "Citation‐based bootstrapping for large‐scale author disambiguation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(5), pages 1030-1047, May.
    15. Mark-Christoph Müller & Florian Reitz & Nicolas Roy, 2017. "Data sets for author name disambiguation: an empirical analysis and a new resource," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1467-1500, June.
    16. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    17. José M. Soler, 2007. "Separating the articles of authors with the same name," Scientometrics, Springer;Akadémiai Kiadó, vol. 72(2), pages 281-290, August.
    18. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
    19. Andreas Strotmann & Dangzhi Zhao, 2012. "Author name disambiguation: What difference does it make in author-based citation analysis?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(9), pages 1820-1833, September.
    20. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    21. Robinson-Garcia, Nicolás & Sugimoto, Cassidy R. & Murray, Dakota & Yegros-Yegros, Alfredo & Larivière, Vincent & Costas, Rodrigo, 2019. "The many faces of mobility: Using bibliometric data to measure the movement of scientists," Journal of Informetrics, Elsevier, vol. 13(1), pages 50-63.
    22. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    23. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    24. Andreas Strotmann & Dangzhi Zhao, 2012. "Author name disambiguation: What difference does it make in author‐based citation analysis?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(9), pages 1820-1833, September.
    25. Fernanda Morillo & Ignacio Santabárbara & Javier Aparicio, 2013. "The automatic normalisation challenge: detailed addresses identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 953-966, June.
    26. Jinseok Kim, 2018. "Evaluating author name disambiguation for digital libraries: a case of DBLP," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1867-1886, September.
    27. Michael Levin & Stefan Krawczyk & Steven Bethard & Dan Jurafsky, 2012. "Citation-based bootstrapping for large-scale author disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(5), pages 1030-1047, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yezhu Wang & Yundong Xie & Dong Wang & Lu Guo & Rongting Zhou, 2022. "Do cover papers get better citations and usage counts? An analysis of 42 journals in cell biology," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(7), pages 3793-3813, July.
    2. Rehs, Andreas, 2021. "A supervised machine learning approach to author disambiguation in the Web of Science," Journal of Informetrics, Elsevier, vol. 15(3).
    3. Paul Sebo & Sylvain de Lucia & Nathalie Vernaz, 2021. "Accuracy of PubMed-based author lists of publications and use of author identifiers to address author name ambiguity: a cross-sectional study," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4121-4135, May.
    4. Domenico A. Maisano & Luca Mastrogiacomo & Fiorenzo Franceschini, 2023. "Empirical evidence on the relationship between research and teaching in academia," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4475-4507, August.
    5. Asli Ebru Şanlitürk & Samin Aref & Emilio Zagheni & Francesco C. Billari, 2022. "Homecoming after Brexit: evidence on academic migration from bibliometric data," MPIDR Working Papers WP-2022-019, Max Planck Institute for Demographic Research, Rostock, Germany.
    6. Tong Li & Yanfen Wang & Lizhen Cui & Ranjay K. Singh & Hongdou Liu & Xiufang Song & Zhihong Xu & Xiaoyong Cui, 2023. "Exploring the evolving landscape of COVID-19 interfaced with livelihoods," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-12, December.
    7. Xinyi Zhao & Samin Aref & Emilio Zagheni & Guy Stecklov, 2022. "Return migration of German-affiliated researchers: analyzing departure and return by gender, cohort, and discipline using Scopus bibliometric data 1996–2020," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7707-7729, December.
    8. Alexander Subbotin & Samin Aref, 2021. "Brain drain and brain gain in Russia: Analyzing international migration of researchers by discipline using Scopus bibliometric data 1996–2020," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7875-7900, September.
    9. Xinyuan Zhang & Qing Xie & Chaemin Song & Min Song, 2022. "Mining the evolutionary process of knowledge through multiple relationships between keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2023-2053, April.
    10. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jinseok Kim & Jason Owen-Smith, 2021. "ORCID-linked labeled data for evaluating author name disambiguation at scale," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2057-2083, March.
    2. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    3. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    4. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    5. Jinseok Kim, 2018. "Evaluating author name disambiguation for digital libraries: a case of DBLP," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1867-1886, September.
    6. Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.
    7. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    8. Humaira Waqas & Abdul Qadir, 2022. "Completing features for author name disambiguation (AND): an empirical analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 1039-1063, February.
    9. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    10. Li Zhang & Wei Lu & Jinqing Yang, 2023. "LAGOS‐AND: A large gold standard dataset for scholarly author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 168-185, February.
    11. Jinseok Kim & Jenna Kim, 2018. "The impact of imbalanced training data on machine learning for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 511-526, October.
    12. Humaira Waqas & Muhammad Abdul Qadir, 2021. "Multilayer heuristics based clustering framework (MHCF) for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7637-7678, September.
    13. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
    14. Milojević, Staša, 2013. "Accuracy of simple, initials-based methods for author name disambiguation," Journal of Informetrics, Elsevier, vol. 7(4), pages 767-773.
    15. Liu, Meijun & Hu, Xiao, 2021. "Will collaborators make scientists move? A Generalized Propensity Score analysis," Journal of Informetrics, Elsevier, vol. 15(1).
    16. Lutz Bornmann & Werner Marx, 2014. "How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 487-509, January.
    17. Hao Wu & Bo Li & Yijian Pei & Jun He, 2014. "Unsupervised author disambiguation using Dempster–Shafer theory," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1955-1972, December.
    18. Maxim Kotsemir & Sergey Shashnov, 2017. "Measuring, analysis and visualization of research capacity of university at the level of departments and staff members," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1659-1689, September.
    19. Mehmet Ali Abdulhayoglu & Bart Thijs, 2017. "Use of ResearchGate and Google CSE for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1965-1985, June.
    20. KM. Pooja & Samrat Mondal & Joydeep Chandra, 2021. "Exploiting similarities across multiple dimensions for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7525-7560, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:123:y:2020:i:2:d:10.1007_s11192-020-03410-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.