IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v95y2013i3d10.1007_s11192-013-0965-0.html
   My bibliography  Save this article

The automatic normalisation challenge: detailed addresses identification

Author

Listed:
  • Fernanda Morillo

    (Spanish National Research Council (CSIC))

  • Ignacio Santabárbara

    (Spanish National Research Council (CSIC))

  • Javier Aparicio

    (Spanish National Research Council (CSIC))

Abstract

The correct attribution of scientific publications to their true owners is extremely important, considering the detailed evaluation processes and the future investments based upon them. This attribution is a hard job for bibliometricians because of the increasing amount of documents and the raise of collaboration. Nevertheless, there is no published work with a comprehensive solution of the problem. This article introduces a procedure for the detailed identification and normalisation of addresses to facilitate the correct allocation of the scientific production included in databases. Thanks to our long experience in the manual normalisation of addresses, we have created and maintained various master lists. We have already developed an application to detect institutional sectors (issued in a previous paper) and now we analyse the details of particular institutions, taking advantage of our master tables. To test our methodology we have implemented it in a Spanish data set already manually codified (95,314 unique addresses included in the year 2008 on the Web of Science databases). This data was analysed with a full text search against our master lists, giving optional codes for each address and choosing which one could be automatically encoded and which one should be reviewed manually. The results of the implementation, comparing the automatic versus manual codes, showed 87 % automatically codified records with 1.9 % of error. We should review manually only 13 %. Finally, we applied the Wilcoxon non-parametric test to show the validity of the methodology, comparing detailed codes of centres already encoded with the automatically encoded ones, and concluding that their distribution was similar with a significance of 0.078.

Suggested Citation

  • Fernanda Morillo & Ignacio Santabárbara & Javier Aparicio, 2013. "The automatic normalisation challenge: detailed addresses identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 953-966, June.
  • Handle: RePEc:spr:scient:v:95:y:2013:i:3:d:10.1007_s11192-013-0965-0
    DOI: 10.1007/s11192-013-0965-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-013-0965-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-013-0965-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    2. Almeida, J.A.S. & Pais, A.A.C.C. & Formosinho, S.J., 2009. "Science indicators and science patterns in Europe," Journal of Informetrics, Elsevier, vol. 3(2), pages 134-142.
    3. Mallig, Nicolai, 2010. "A relational database for bibliometric analysis," Journal of Informetrics, Elsevier, vol. 4(4), pages 564-580.
    4. Bart Thijs & Wolfgang Glänzel, 2008. "A structural analysis of publication profiles for the classification of European research institutes," Scientometrics, Springer;Akadémiai Kiadó, vol. 74(2), pages 223-236, February.
    5. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large‐scale research assessments," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    6. Thomas Gurney & Edwin Horlings & Peter van den Besselaar, 2012. "Author disambiguation using multi-aspect similarity indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 435-449, May.
    7. Giovanni Abramo & Ciriaco Andrea D’Angelo & Fabio Pugini, 2008. "The measurement of Italian universities’ research productivity by a non parametric-bibliometric methodology," Scientometrics, Springer;Akadémiai Kiadó, vol. 76(2), pages 225-244, August.
    8. Mallig, Nicolai, 2010. "A relational database for bibliometric analysis," Discussion Papers "Innovation Systems and Policy Analysis" 22, Fraunhofer Institute for Systems and Innovation Research (ISI).
    9. William W. Hood & Concepción S. Wilson, 2003. "Informetric studies using databases: Opportunities and challenges," Scientometrics, Springer;Akadémiai Kiadó, vol. 58(3), pages 587-608, November.
    10. Bornmann, Lutz & Ozimek, Adam, 2012. "Stata commands for importing bibliometric data and processing author address information," Journal of Informetrics, Elsevier, vol. 6(4), pages 505-512.
    11. Ciriaco Andrea D'Angelo & Cristiano Giuffrida & Giovanni Abramo, 2011. "A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(2), pages 257-269, February.
    12. Anthony F. J. van Raan, 2005. "Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 62(1), pages 133-143, January.
    13. Giovanni Abramo & Ciriaco Andrea D’Angelo & Flavia Di Costa, 2011. "National research assessment exercises: the effects of changing the rules of the game during the game," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(1), pages 229-238, July.
    14. Fernanda Morillo & Javier Aparicio & Borja González-Albo & Luz Moreno, 2013. "Towards the automation of address identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 207-224, January.
    15. Perianes-Rodríguez, Antonio & Chinchilla-Rodríguez, Zaida & Vargas-Quesada, Benjamín & Olmeda Gómez, Carlos & Moya-Anegón, Félix, 2009. "Synthetic hybrid indicators based on scientific collaboration to quantify and evaluate individual research results," Journal of Informetrics, Elsevier, vol. 3(2), pages 91-101.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cova, Tânia F.G.G. & Jarmelo, Susana & Formosinho, Sebastião J. & de Melo, J. Sérgio Seixas & Pais, Alberto A.C.C., 2015. "Unsupervised characterization of research institutions with task-force estimation," Journal of Informetrics, Elsevier, vol. 9(1), pages 59-68.
    2. Ciriaco Andrea D’Angelo & Nees Jan Eck, 2020. "Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 883-907, May.
    3. Fernanda Morillo & Belén Álvarez-Bornstein, 2018. "How to automatically identify major research sponsors selecting keywords from the WoS Funding Agency field," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1755-1770, December.
    4. Fernanda Morillo & Adrián A. Díaz-Faes & Borja González-Albo & Luz Moreno, 2014. "Do networking centres perform better? An exploratory analysis in Psychiatry and Gastroenterology/Hepatology in Spain," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1401-1416, February.
    5. Fernanda Morillo & Rodrigo Costas & María Bordons, 2015. "How is credit given to networking centres in their publications? A case study of the Spanish CIBER research structures," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(3), pages 923-938, June.
    6. Daniela De Filippo & Fernanda Morillo & Borja González-Albo, 2023. "Measuring the Impact and Influence of Scientific Activity in the Humanities and Social Sciences," Publications, MDPI, vol. 11(2), pages 1-17, June.
    7. Fernanda Morillo & Preiddy Efrain-Garcia, 2015. "A bibliometric analysis of Technology Centres," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 685-713, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cova, Tânia F.G.G. & Jarmelo, Susana & Formosinho, Sebastião J. & de Melo, J. Sérgio Seixas & Pais, Alberto A.C.C., 2015. "Unsupervised characterization of research institutions with task-force estimation," Journal of Informetrics, Elsevier, vol. 9(1), pages 59-68.
    2. Gagolewski, Marek, 2011. "Bibliometric impact assessment with R and the CITAN package," Journal of Informetrics, Elsevier, vol. 5(4), pages 678-692.
    3. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    4. Jinseok Kim & Jinmo Kim & Jason Owen-Smith, 2019. "Generating automatically labeled data for author name disambiguation: an iterative clustering method," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 253-280, January.
    5. Lutz Bornmann & Werner Marx, 2014. "How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 487-509, January.
    6. Dominik P. Heinisch & Guido Buenstorf, 2018. "The next generation (plus one): an analysis of doctoral students’ academic fecundity based on a novel approach to advisor identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 351-380, October.
    7. Giovanni Abramo & Corrado Costa & Ciriaco Andrea D’Angelo, 2015. "A multivariate stochastic model to assess research performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1755-1772, February.
    8. Pascal Cuxac & Jean-Charles Lamirel & Valerie Bonvallot, 2013. "Efficient supervised and semi-supervised approaches for affiliations disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(1), pages 47-58, October.
    9. Abramo, Giovanni & D’Angelo, Ciriaco Andrea & Soldatenkova, Anastasiia, 2016. "The ratio of top scientists to the academic staff as an indicator of the competitive strength of universities," Journal of Informetrics, Elsevier, vol. 10(2), pages 596-605.
    10. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    11. Mehdi Rhaiem & Nabil Amara, 2020. "Determinants of research efficiency in Canadian business schools: evidence from scholar-level data," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 53-99, October.
    12. Gianluca Fabiano & Andrea Marcellusi & Giampiero Favato, 2020. "Public–private contribution to biopharmaceutical discoveries: a bibliometric analysis of biomedical research in UK," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 153-168, July.
    13. Omar Hernando Avila-Poveda, 2014. "Technical report: the trend of author compound names and its implications for authorship identity identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 833-846, October.
    14. Abramo, Giovanni & D'Angelo, Ciriaco Andrea & Grilli, Leonardo, 2021. "The effects of citation-based research evaluation schemes on self-citation behavior," Journal of Informetrics, Elsevier, vol. 15(4).
    15. Giovanni Abramo & Ciriaco Andrea D’Angelo & Anastasiia Soldatenkova, 2016. "The dispersion of the citation distribution of top scientists’ publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1711-1724, December.
    16. Giovanni Abramo & Ciriaco Andrea D’Angelo & Flavia Di Costa, 2011. "A national-scale cross-time analysis of university research performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(2), pages 399-413, May.
    17. Abramo, Giovanni & Cicero, Tindaro & D’Angelo, Ciriaco Andrea, 2015. "Should the research performance of scientists be distinguished by gender?," Journal of Informetrics, Elsevier, vol. 9(1), pages 25-38.
    18. Guillaume Cabanac, 2012. "Shaping the landscape of research in information systems from the perspective of editorial boards: A scientometric study of 77 leading journals," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(5), pages 977-996, May.
    19. Giovanni Abramo & Ciriaco D’Angelo, 2015. "An assessment of the first “scientific habilitation” for university appointments in Italy," Economia Politica: Journal of Analytical and Institutional Economics, Springer;Fondazione Edison, vol. 32(3), pages 329-357, December.
    20. Mallig, Nicolai, 2010. "A relational database for bibliometric analysis," Journal of Informetrics, Elsevier, vol. 4(4), pages 564-580.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:95:y:2013:i:3:d:10.1007_s11192-013-0965-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.