IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v70y2007i1d10.1007_s11192-007-0101-0.html
   My bibliography  Save this article

Standardizing formats of corporate source data

Author

Listed:
  • Carmen Galvez

    (University of Granada)

  • Félix Moya-Anegón

    (University of Granada)

Abstract

This paper describe an approach for improving the data quality of corporate sources when databases are used for bibliometric purposes. Research management relies on bibliographic databases and citation index systems as analytical tools, yet the raw resources for bibliometric studies are plagued by a lack of consistency in fied formatting for institution data. The present contribution puts forth a Natural Language Processing (NLP)-oriented method for the identification of the structures guiding corporate data and their mapping into a standardized format. The proposed unification process is based on the definition of address patterns and the ensuing application of Enhanced Finite-State Transducers (E-FST). Our procedure was tested on address formats downloaded from the INSPEC, MEDLINE and CAB Abstracts. The results demonstrate the helpfulness of the method as long as close control of errors is exercised as far as the formats to be unified. The computational efficacy of the model is noteworthy, due to the fact that it is firmly guided by the definition of data in the application domain.

Suggested Citation

  • Carmen Galvez & Félix Moya-Anegón, 2007. "Standardizing formats of corporate source data," Scientometrics, Springer;Akadémiai Kiadó, vol. 70(1), pages 3-26, January.
  • Handle: RePEc:spr:scient:v:70:y:2007:i:1:d:10.1007_s11192-007-0101-0
    DOI: 10.1007/s11192-007-0101-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-007-0101-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-007-0101-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Herman Van den Berghe & Josee A. Houben & Renger E. de Bruin & Henk F. Moed & André Kint & Marc Luwel & Eric H. J. Spruyt, 1998. "Bibliometric indicators of university research performance in Flanders," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 49(1), pages 59-67.
    2. Peter Ingwersen & Finn Hjortgaard Christensen, 1997. "Data set isolation for bibliometric online analyses of research publications: Fundamental methodological issues," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 48(3), pages 205-217, March.
    3. William W. Hood & Concepción S. Wilson, 2003. "Informetric studies using databases: Opportunities and challenges," Scientometrics, Springer;Akadémiai Kiadó, vol. 58(3), pages 587-608, November.
    4. E.C.M. Noyons & H.F. Moed & M. Luwel, 1999. "Combining mapping and citation analysis for evaluative bibliometric purposes: A bibliometric study," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 50(2), pages 115-131.
    5. Paula Mählck & Olle Persson, 2000. "Socio-Bibliometric Mapping of Intra-Departmental Networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 49(1), pages 81-91, August.
    6. Martha E. Williams & Laurence Lannom, 1981. "Lack of standardization of the journal title data element in databases," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 32(3), pages 229-233, May.
    7. Anthony F. J. van Raan, 2005. "Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 62(1), pages 133-143, January.
    8. Herbertz, Heinrich & Muller-Hill, Benno, 1995. "Quality and efficiency of basic research in molecular biology: a bibliometric analysis of thirteen excellent research institutes," Research Policy, Elsevier, vol. 24(6), pages 959-979, November.
    9. Henk F. Moed, 2000. "Bibliometric Indicators Reflect Publication and Management Strategies," Scientometrics, Springer;Akadémiai Kiadó, vol. 47(2), pages 323-346, February.
    10. Félix Moya-Anegón & Benjamín Vargas-Quesada & Victor Herrero-Solana & Zaida Chinchilla-Rodríguez & Elena Corera-Álvarez & Francisco J. Munoz-Fernández, 2004. "A new technique for building maps of large scientific domains based on the cocitation of classes and categories," Scientometrics, Springer;Akadémiai Kiadó, vol. 61(1), pages 129-145, September.
    11. Carmen Galvez & Félix Moya-Anegón, 2006. "The unification of institutional addresses applying parametrized finite-state graphs (P-FSG)," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(2), pages 323-345, November.
    12. Anne B. Piternick, 1982. "Standardization of journal titles in databases," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 33(2), pages 105-105, March.
    13. James C. French & Allison L. Powell & Eric Schulman, 2000. "Using clustering strategies for creating authority files," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(8), pages 774-786.
    14. Bourke, Paul & Butler, Linda, 1998. "Institutions and the map of science: matching university departments and fields of research," Research Policy, Elsevier, vol. 26(6), pages 711-718, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zehra Taşkın & Umut Al, 2014. "Standardization problem of author affiliations in citation indexes," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 347-368, January.
    2. Sjoerd Hardeman, 2013. "Organization level research in scientometrics: a plea for an explicit pragmatic approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 1175-1194, March.
    3. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    4. Woo Hyoung Lee, 2008. "How to identify emerging research fields using scientometrics: An example in the field of Information Security," Scientometrics, Springer;Akadémiai Kiadó, vol. 76(3), pages 503-525, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zehra Taşkın & Umut Al, 2014. "Standardization problem of author affiliations in citation indexes," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 347-368, January.
    2. Sjoerd Hardeman, 2013. "Organization level research in scientometrics: a plea for an explicit pragmatic approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 1175-1194, March.
    3. Pascal Cuxac & Jean-Charles Lamirel & Valerie Bonvallot, 2013. "Efficient supervised and semi-supervised approaches for affiliations disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(1), pages 47-58, October.
    4. William W. Hood & Concepción S. Wilson, 2003. "Informetric studies using databases: Opportunities and challenges," Scientometrics, Springer;Akadémiai Kiadó, vol. 58(3), pages 587-608, November.
    5. Ismael Rafols & Alan Porter & Loet Leydesdorff, 2009. "Overlay Maps of Science: a New Tool for Research Policy," SPRU Working Paper Series 179, SPRU - Science Policy Research Unit, University of Sussex Business School.
    6. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    7. Rabishankar Giri & Sabuj Kumar Chaudhuri, 2021. "Ranking journals through the lens of active visibility," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2189-2208, March.
    8. Feng Li & Yong Yi & Xiaolong Guo & Wei Qi, 2012. "Performance evaluation of research universities in Mainland China, Hong Kong and Taiwan: based on a two-dimensional approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 531-542, February.
    9. Henk F. Moed, 2000. "Bibliometric Indicators Reflect Publication and Management Strategies," Scientometrics, Springer;Akadémiai Kiadó, vol. 47(2), pages 323-346, February.
    10. Mingers, John & Yang, Liying, 2017. "Evaluating journal quality: A review of journal citation indicators and ranking in business and management," European Journal of Operational Research, Elsevier, vol. 257(1), pages 323-337.
    11. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    12. He, Zi-Lin & Geng, Xue-Song & Campbell-Hunt, Colin, 2009. "Research collaboration and research output: A longitudinal study of 65 biomedical scientists in a New Zealand university," Research Policy, Elsevier, vol. 38(2), pages 306-317, March.
    13. Mallig, Nicolai, 2010. "A relational database for bibliometric analysis," Journal of Informetrics, Elsevier, vol. 4(4), pages 564-580.
    14. Fernanda Morillo & Ignacio Santabárbara & Javier Aparicio, 2013. "The automatic normalisation challenge: detailed addresses identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 953-966, June.
    15. Mingers, John & Leydesdorff, Loet, 2015. "A review of theory and practice in scientometrics," European Journal of Operational Research, Elsevier, vol. 246(1), pages 1-19.
    16. Ismael Rafols & Martin Meyer, 2010. "Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(2), pages 263-287, February.
    17. Yao, Ye & Du, Huibin & Zou, Hongyang & Zhou, Peng & Antunes, Carlos Henggeler & Neumann, Anne & Yeh, Sonia, 2023. "Fifty years of Energy Policy: A bibliometric overview," Energy Policy, Elsevier, vol. 183(C).
    18. Mallig, Nicolai, 2010. "A relational database for bibliometric analysis," Discussion Papers "Innovation Systems and Policy Analysis" 22, Fraunhofer Institute for Systems and Innovation Research (ISI).
    19. Fernanda Morillo & Javier Aparicio & Borja González-Albo & Luz Moreno, 2013. "Towards the automation of address identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 207-224, January.
    20. Rigby, J. & Edler, J., 2005. "Peering inside research networks: Some observations on the effect of the intensity of collaboration on the variability of research quality," Research Policy, Elsevier, vol. 34(6), pages 784-794, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:70:y:2007:i:1:d:10.1007_s11192-007-0101-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.