IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v99y2014i3d10.1007_s11192-013-1214-2.html
   My bibliography  Save this article

Institution name disambiguation for research assessment

Author

Listed:
  • Shuiqing Huang

    (Nanjing Agricultural University)

  • Bo Yang

    (Nanjing Agricultural University)

  • Sulan Yan

    (Nanjing Agricultural University)

  • Ronald Rousseau

    (University of Antwerp (UA)
    KU Leuven)

Abstract

Research evaluation is a necessity for management of academic units (scientists, research groups, departments, institutes, universities) and for government decision making in science and technology. Yet, wrong conclusions may be drawn due to errors in assignments of authors to institutions. To improve existing techniques of institution name disambiguation (IND) based on word similarity or editing distance, a rule-based algorithm is proposed in this study. One-to-many relationships between an institution and many variant names under which it is referred to in bylines of publications are recognized with the aid of statistical methods and specific rules. The performance of the rule based IND algorithm is evaluated on large datasets in four fields. These experimental results demonstrate that the precision of the algorithm is high. Yet, recall should be improved.

Suggested Citation

  • Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
  • Handle: RePEc:spr:scient:v:99:y:2014:i:3:d:10.1007_s11192-013-1214-2
    DOI: 10.1007/s11192-013-1214-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-013-1214-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-013-1214-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sungwon Kim & Seongyun Cho, 2013. "Characteristics of Korean personal names," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(1), pages 86-95, January.
    2. Carmen Galvez & Félix Moya-Anegón, 2007. "Standardizing formats of corporate source data," Scientometrics, Springer;Akadémiai Kiadó, vol. 70(1), pages 3-26, January.
    3. Ricardo G. Cota & Anderson A. Ferreira & Cristiano Nascimento & Marcos André Gonçalves & Alberto H. F. Laender, 2010. "An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(9), pages 1853-1870, September.
    4. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large‐scale research assessments," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    5. Natsuo Onodera & Mariko Iwasawa & Nobuyuki Midorikawa & Fuyuki Yoshikane & Kou Amano & Yutaka Ootani & Tadashi Kodama & Yasuhiko Kiyama & Hiroyuki Tsunoda & Shizuka Yamazaki, 2011. "A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(4), pages 677-690, April.
    6. Carmen Galvez & Félix Moya-Anegón, 2006. "The unification of institutional addresses applying parametrized finite-state graphs (P-FSG)," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(2), pages 323-345, November.
    7. Sungwon Kim & Seongyun Cho, 2013. "Characteristics of Korean personal names," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(1), pages 86-95, January.
    8. Andreas Strotmann & Dangzhi Zhao, 2012. "Author name disambiguation: What difference does it make in author-based citation analysis?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(9), pages 1820-1833, September.
    9. Natsuo Onodera & Mariko Iwasawa & Nobuyuki Midorikawa & Fuyuki Yoshikane & Kou Amano & Yutaka Ootani & Tadashi Kodama & Yasuhiko Kiyama & Hiroyuki Tsunoda & Shizuka Yamazaki, 2011. "A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(4), pages 677-690, April.
    10. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    11. Denilson Alves Pereira & Berthier Ribeiro-Neto & Nivio Ziviani & Alberto H.F. Laender & Marcos André Gonçalves, 2011. "A generic Web-based entity resolution framework," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(5), pages 919-932, May.
    12. Anthony F. J. van Raan, 2005. "Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 62(1), pages 133-143, January.
    13. Yong Jiang & Hai-Tao Zheng & Xinmin Wang & Binggan Lu & Kaihua Wu, 2011. "Affiliation disambiguation for constructing semantic digital libraries," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(6), pages 1029-1041, June.
    14. Andreas Strotmann & Dangzhi Zhao, 2012. "Author name disambiguation: What difference does it make in author‐based citation analysis?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(9), pages 1820-1833, September.
    15. Fernanda Morillo & Javier Aparicio & Borja González-Albo & Luz Moreno, 2013. "Towards the automation of address identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 207-224, January.
    16. Yong Jiang & Hai‐Tao Zheng & Xinmin Wang & Binggan Lu & Kaihua Wu, 2011. "Affiliation disambiguation for constructing semantic digital libraries," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(6), pages 1029-1041, June.
    17. Michael Levin & Stefan Krawczyk & Steven Bethard & Dan Jurafsky, 2012. "Citation-based bootstrapping for large-scale author disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(5), pages 1030-1047, May.
    18. Vetle I. Torvik & Marc Weeber & Don R. Swanson & Neil R. Smalheiser, 2005. "A probabilistic similarity metric for Medline records: A model for author name disambiguation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 56(2), pages 140-158, January.
    19. Abramo, Giovanni & Cicero, Tindaro & D’Angelo, Ciriaco Andrea, 2011. "A field-standardized application of DEA to national-scale research assessment of universities," Journal of Informetrics, Elsevier, vol. 5(4), pages 618-628.
    20. Edit Csajbók & Anna Berhidi & Lívia Vasas & András Schubert, 2007. "Hirsch-index for countries based on Essential Science Indicators data," Scientometrics, Springer;Akadémiai Kiadó, vol. 73(1), pages 91-117, October.
    21. James C. French & Allison L. Powell & Eric Schulman, 2000. "Using clustering strategies for creating authority files," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(8), pages 774-786.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
    2. Yongwen Huang & Jiao Li & Tan Sun & Guojian Xian, 2020. "Institution information specification and correlation based on institutional PIDs and IND tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 381-396, January.
    3. Fernanda Morillo & Belén Álvarez-Bornstein, 2018. "How to automatically identify major research sponsors selecting keywords from the WoS Funding Agency field," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1755-1770, December.
    4. Andrea Ancona & Roy Cerqueti & Gianluca Vagnani, 2023. "A novel methodology to disambiguate organization names: an application to EU Framework Programmes data," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4447-4474, August.
    5. Maxim Kotsemir & Sergey Shashnov, 2017. "Measuring, analysis and visualization of research capacity of university at the level of departments and staff members," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1659-1689, September.
    6. Si Shen & Ronald Rousseau & Dongbo Wang, 2018. "Do papers with an institutional e-mail address receive more citations than those with a non-institutional one?," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 1039-1050, May.
    7. Osmo Kivinen & Juha Hedman & Kalle Artukka, 2017. "Scientific publishing and global university rankings. How well are top publishing universities recognized?," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 679-695, July.
    8. Tomaz Bartol & Gordana Budimir & Primoz Juznic & Karmen Stopar, 2016. "Mapping and classification of agriculture in Web of Science: other subject categories and research fields may benefit," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 979-996, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    2. Jinseok Kim & Jason Owen-Smith, 2021. "ORCID-linked labeled data for evaluating author name disambiguation at scale," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2057-2083, March.
    3. Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
    4. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    5. Jinseok Kim, 2018. "Evaluating author name disambiguation for digital libraries: a case of DBLP," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1867-1886, September.
    6. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    7. Milojević, Staša, 2013. "Accuracy of simple, initials-based methods for author name disambiguation," Journal of Informetrics, Elsevier, vol. 7(4), pages 767-773.
    8. Yongwen Huang & Jiao Li & Tan Sun & Guojian Xian, 2020. "Institution information specification and correlation based on institutional PIDs and IND tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 381-396, January.
    9. Hao Wu & Bo Li & Yijian Pei & Jun He, 2014. "Unsupervised author disambiguation using Dempster–Shafer theory," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1955-1972, December.
    10. Rehs, Andreas, 2021. "A supervised machine learning approach to author disambiguation in the Web of Science," Journal of Informetrics, Elsevier, vol. 15(3).
    11. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
    12. Pascal Cuxac & Jean-Charles Lamirel & Valerie Bonvallot, 2013. "Efficient supervised and semi-supervised approaches for affiliations disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(1), pages 47-58, October.
    13. Lutz Bornmann & Werner Marx, 2014. "How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 487-509, January.
    14. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    15. Fernanda Morillo & Ignacio Santabárbara & Javier Aparicio, 2013. "The automatic normalisation challenge: detailed addresses identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 953-966, June.
    16. Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.
    17. Maxim Kotsemir & Sergey Shashnov, 2017. "Measuring, analysis and visualization of research capacity of university at the level of departments and staff members," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1659-1689, September.
    18. Kim, Jinseok & Diesner, Jana, 2015. "The effect of data pre-processing on understanding the evolution of collaboration networks," Journal of Informetrics, Elsevier, vol. 9(1), pages 226-236.
    19. Jun-Ping Qiu & Ke Dong & Hou-Qiang Yu, 2014. "Comparative study on structure and correlation among author co-occurrence networks in bibliometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1345-1360, November.
    20. Freeman, Richard B. & Huang, Wei, 2014. "Collaborating With People Like Me: Ethnic Co-authorship within the US," IZA Discussion Papers 8432, Institute of Labor Economics (IZA).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:99:y:2014:i:3:d:10.1007_s11192-013-1214-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.