IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v128y2023i8d10.1007_s11192-023-04746-x.html
   My bibliography  Save this article

A novel methodology to disambiguate organization names: an application to EU Framework Programmes data

Author

Listed:
  • Andrea Ancona

    (Sapienza University of Rome)

  • Roy Cerqueti

    (Sapienza University of Rome
    University of Angers)

  • Gianluca Vagnani

    (Sapienza University of Rome)

Abstract

The concept of collaborative R&D has been increasing interest among scholars and policy-makers, making collaboration a pivotal determinant to innovate nowadays. The availability of reliable data is a necessary condition to obtain valuable results. Specifically, in a collaborative environment, we must avoid mistaken identities among organizations. In many datasets, indeed, the same organization can appear in a non-univocal way. Thus its information is shared among multiple entities. In this work, we propose a novel methodology to disambiguate organization names. In particular, we combine supervised and unsupervised techniques to design a “hybrid” methodology that is neither fully automated nor completely manual, and easy to adapt to many different datasets. Thus, the flexibility and potential scalability of the methodology make this paper a worthwhile contribution to different research fields. We provide an empirical application of the methodology to the dataset of participants in projects funded by the first three European Framework Programmes. This choice is because we can test the quality of our procedure by comparing the refined dataset it returns to a well-recognized benchmark (i.e., the EUPRO database) in terms of the connection structure of the collaborative networks. Our results show the advantages of our approach based on the quality of the obtained dataset, and the efficiency of the designed methodology, leaving space for the integration of affiliation hierarchies in the future.

Suggested Citation

  • Andrea Ancona & Roy Cerqueti & Gianluca Vagnani, 2023. "A novel methodology to disambiguate organization names: an application to EU Framework Programmes data," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4447-4474, August.
  • Handle: RePEc:spr:scient:v:128:y:2023:i:8:d:10.1007_s11192-023-04746-x
    DOI: 10.1007/s11192-023-04746-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-023-04746-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-023-04746-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    2. Pedro Campos & Pavel Brazdil & Isabel Mota, 2013. "Comparing Strategies of Collaborative Networks for R&D: An Agent-Based Study," Computational Economics, Springer;Society for Computational Economics, vol. 42(1), pages 1-22, June.
    3. Joan Crespo & Raphaël Suire & Jérôme Vicente, 2016. "Network structural properties for cluster long-run dynamics: evidence from collaborative R&D networks in the European mobile phone industry," Industrial and Corporate Change, Oxford University Press and the Associazione ICC, vol. 25(2), pages 261-282.
    4. Thomas Scherngell & Rafael Lata, 2013. "Towards an integrated European Research Area? Findings from Eigenvector spatially filtered spatial interaction models using European Framework Programme data," Papers in Regional Science, Wiley Blackwell, vol. 92(3), pages 555-577, August.
    5. Michael D. König & Xiaodong Liu & Yves Zenou, 2019. "R&D Networks: Theory, Empirics, and Policy Implications," The Review of Economics and Statistics, MIT Press, vol. 101(3), pages 476-491, July.
    6. Cristian Santini & Genet Asefa Gesese & Silvio Peroni & Aldo Gangemi & Harald Sack & Mehwish Alam, 2022. "A knowledge graph embeddings based approach for author name disambiguation using literals," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4887-4912, August.
    7. Dongwook Shin & Taehwan Kim & Joongmin Choi & Jungsun Kim, 2014. "Author name disambiguation using a graph model with node splitting and merging based on bibliographic information," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(1), pages 15-50, July.
    8. Jiang Wu & Xiu-Hao Ding, 2013. "Author name disambiguation in scientific collaboration and mobility cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 683-697, September.
    9. Jarno Hoekman & Thomas Scherngell & Koen Frenken & Robert Tijssen, 2013. "Acquisition of European research funds and its effect on international scientific collaboration," Journal of Economic Geography, Oxford University Press, vol. 13(1), pages 23-52, January.
    10. Benedetto Lepori & Valerio Veglio & Barbara Heller-Schuh & Thomas Scherngell & Michael Barber, 2015. "Participations to European Framework Programs of higher education institutions and their association with organizational characteristics," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2149-2178, December.
    11. Luis Diestre & Nandini Rajagopalan, 2012. "Are all ‘sharks’ dangerous? new biotechnology ventures and partner selection in R&D alliances," Strategic Management Journal, Wiley Blackwell, vol. 33(10), pages 1115-1134, October.
    12. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    13. Pascal Cuxac & Jean-Charles Lamirel & Valerie Bonvallot, 2013. "Efficient supervised and semi-supervised approaches for affiliations disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(1), pages 47-58, October.
    14. Jeffrey J. Reuer & Ramakrishna Devarakonda, 2017. "Partner Selection in R&D Collaborations: Effects of Affiliations with Venture Capitalists," Organization Science, INFORMS, vol. 28(3), pages 574-595, June.
    15. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    16. Tsai, Kuen-Hung, 2009. "Collaborative networks and product innovation performance: Toward a contingency perspective," Research Policy, Elsevier, vol. 38(5), pages 765-778, June.
    17. Diego R. Amancio & Osvaldo N. Oliveira jr & Luciano F. Costa, 2015. "Topological-collaborative approach for disambiguating authors’ names in collaborative networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 465-485, January.
    18. Marco Cavallaro & Benedetto Lepori, 2021. "Institutional barriers to participation in EU framework programs: contrasting the Swiss and UK cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1311-1328, February.
    19. Yong Jiang & Hai-Tao Zheng & Xinmin Wang & Binggan Lu & Kaihua Wu, 2011. "Affiliation disambiguation for constructing semantic digital libraries," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(6), pages 1029-1041, June.
    20. Jinseok Kim, 2018. "Evaluating author name disambiguation for digital libraries: a case of DBLP," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1867-1886, September.
    21. Yong Jiang & Hai‐Tao Zheng & Xinmin Wang & Binggan Lu & Kaihua Wu, 2011. "Affiliation disambiguation for constructing semantic digital libraries," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(6), pages 1029-1041, June.
    22. Manfred Paier & Thomas Scherngell, 2011. "Determinants of Collaboration in European R&D Networks: Empirical Evidence from a Discrete Choice Model," Industry and Innovation, Taylor & Francis Journals, vol. 18(1), pages 89-104.
    23. Thomas Scherngell & Michael Barber, 2011. "Distinct spatial characteristics of industrial and public research collaborations: evidence from the fifth EU Framework Programme," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 46(2), pages 247-266, April.
    24. Pieter W. Heringa & Laurens K. Hessels & Mariëlle van der Zouwen, 2016. "The influence of proximity dimensions on international research collaboration: an analysis of European water projects," Industry and Innovation, Taylor & Francis Journals, vol. 23(8), pages 753-772, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maxim Kotsemir & Sergey Shashnov, 2017. "Measuring, analysis and visualization of research capacity of university at the level of departments and staff members," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1659-1689, September.
    2. Sara Amoroso & Alex Coad & Nicola Grassano, 2017. "European R&D networks: A snapshot from the 7th EU Framework Programme," JRC Working Papers on Corporate R&D and Innovation JRC107546, Joint Research Centre (Seville site).
    3. Aurélien Fichet de Clairfontaine & Manfred Fischer & Rafael Lata & Manfred Paier, 2015. "Barriers to cross-region research and development collaborations in Europe: evidence from the fifth European Framework Programme," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 54(2), pages 577-590, March.
    4. Mafini Dosso & Antonio Vezzani, 2020. "Firm market valuation and intellectual property assets," Industry and Innovation, Taylor & Francis Journals, vol. 27(7), pages 705-729, August.
    5. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    6. Sara Amoroso & Alex Coad & Nicola Grassano, 2018. "European R&D networks: a snapshot from the 7th EU Framework Programme," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 27(5-6), pages 404-419, August.
    7. Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.
    8. Yongwen Huang & Jiao Li & Tan Sun & Guojian Xian, 2020. "Institution information specification and correlation based on institutional PIDs and IND tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 381-396, January.
    9. Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
    10. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    11. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    12. Iris Wanzenböck & Thomas Scherngell & Thomas Brenner, 2014. "Embeddedness of regions in European knowledge networks: a comparative analysis of inter-regional R&D collaborations, co-patents and co-publications," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(2), pages 337-368, September.
    13. Tom Broekel & Marcel Bednarz, 2018. "Disentangling link formation and dissolution in spatial networks: An Application of a Two-Mode STERGM to a Project-Based R&D Network in the German Biotechnology Industry," Networks and Spatial Economics, Springer, vol. 18(3), pages 677-704, September.
    14. Jinseok Kim & Jenna Kim, 2018. "The impact of imbalanced training data on machine learning for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 511-526, October.
    15. Benedetto Lepori & Valerio Veglio & Barbara Heller-Schuh & Thomas Scherngell & Michael Barber, 2015. "Participations to European Framework Programs of higher education institutions and their association with organizational characteristics," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2149-2178, December.
    16. Jarno Hoekman & Koen Frenken, 2013. "Proximity and Stratification in European Scientific Research Collaboration Networks: A Policy Perspective," Advances in Spatial Science, in: Thomas Scherngell (ed.), The Geography of Networks and R&D Collaborations, edition 127, chapter 0, pages 263-277, Springer.
    17. Lili Wang & Xianwen Wang & Niels J. Philipsen, 2017. "Network structure of scientific collaborations between China and the EU member states," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(2), pages 765-781, November.
    18. Pierre-Alexandre Balland & Ron Boschma & Julien Ravet, 2019. "Network dynamics in collaborative research in the EU, 2003–2017," European Planning Studies, Taylor & Francis Journals, vol. 27(9), pages 1811-1837, September.
    19. Pascal Cuxac & Jean-Charles Lamirel & Valerie Bonvallot, 2013. "Efficient supervised and semi-supervised approaches for affiliations disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(1), pages 47-58, October.
    20. Mario A. Maggioni & Teodora Erika Uberti & Mario Nosvelli, 2017. "The "Political" Geography of Research Networks," International Regional Science Review, , vol. 40(4), pages 337-376, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:128:y:2023:i:8:d:10.1007_s11192-023-04746-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.