IDEAS home Printed from
   My bibliography  Save this article

Ten challenges in modeling bibliographic data for bibliometric analysis


  • Alfio Ferrara

    () (Università degli Studi di Milano)

  • Silvia Salini

    () (Università degli Studi di Milano)


Abstract The complexity and variety of bibliographic data is growing, and efforts to define new methodologies and techniques for bibliometric analysis are intensifying. In this complex scenario, one of the most crucial issues is the quality of data and the capability of bibliometric analysis to cope with multiple data dimensions. Although the problem of enforcing a multidimensional approach to the analysis and management of bibliographic data is not new, a reference design pattern and a specific conceptual model for multidimensional analysis of bibliographic data are still missing. In this paper, we discuss ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data, and we propose a reference data model that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of bibliographic data.

Suggested Citation

  • Alfio Ferrara & Silvia Salini, 2012. "Ten challenges in modeling bibliographic data for bibliometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(3), pages 765-785, December.
  • Handle: RePEc:spr:scient:v:93:y:2012:i:3:d:10.1007_s11192-012-0810-x
    DOI: 10.1007/s11192-012-0810-x

    Download full text from publisher

    File URL:
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Yu, Hairong & Davis, Mari & Wilson, Concepción S. & Cole, Fletcher T.H., 2008. "Object-relational data modelling for informetric databases," Journal of Informetrics, Elsevier, vol. 2(3), pages 240-251.
    2. Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
    3. Jean-Francois Molinari & Alain Molinari, 2008. "A new methodology for ranking scientific institutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 75(1), pages 163-174, April.
    4. Romera Ayllón, María Rosario & Benito Bonito, Mónica, 2011. "Improving quality assessment of composite indicators in university rankings: a case study of French and German universities of excellence," DES - Working Papers. Statistics and Econometrics. WS ws112015, Universidad Carlos III de Madrid. Departamento de Estadística.
    5. Mallig, Nicolai, 2010. "A relational database for bibliometric analysis," Journal of Informetrics, Elsevier, vol. 4(4), pages 564-580.
    6. Michael Greenacre, 2008. "Correspondence analysis of raw data," Economics Working Papers 1112, Department of Economics and Business, Universitat Pompeu Fabra, revised Jul 2009.
    7. Ron S. Kenett & Silvia Salini, 2011. "Modern analysis of customer satisfaction surveys: comparison of models and integrated analysis," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 27(5), pages 465-475, September.
    8. Wolfgang Glänzel & András Schubert, 2003. "A new classification scheme of science fields and subfields designed for scientometric evaluation purposes," Scientometrics, Springer;Akadémiai Kiadó, vol. 56(3), pages 357-367, March.
    9. J. Hubert, 1977. "Bibliometric models for journal productivity," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 4(1), pages 441-473, January.
    10. Mallig, Nicolai, 2010. "A relational database for bibliometric analysis," Discussion Papers "Innovation Systems and Policy Analysis" 22, Fraunhofer Institute for Systems and Innovation Research (ISI).
    11. Emil Hudomalj & Gaj Vidmar, 2003. "OLAP and bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 58(3), pages 609-622, November.
    12. Marco Geraci & M. Degli Esposti, 2011. "Where do Italian universities stand? An in-depth statistical analysis of national and international rankings," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(3), pages 667-681, June.
    13. M. Benito & R. Romera, 2011. "Improving quality assessment of composite indicators in university rankings: a case study of French and German universities of excellence," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(1), pages 153-176, October.
    14. Dietmar Wolfram, 2006. "Applications of SQL for informetric frequency distribution processing," Scientometrics, Springer;Akadémiai Kiadó, vol. 67(2), pages 301-313, May.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Jeong, Yujin & Park, Inchae & Yoon, Byungun, 2019. "Identifying emerging Research and Business Development (R&BD) areas based on topic modeling and visualization with intellectual property right data," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 655-672.
    2. Chyi-Kwei Yau & Alan Porter & Nils Newman & Arho Suominen, 2014. "Clustering scientific documents with topic modeling," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(3), pages 767-786, September.
    3. Sabine Loudcher & Wararat Jakawat & Edmundo Pavel Soriano Morales & Cécile Favre, 2015. "Combining OLAP and information networks for bibliographic data analysis: a survey," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 471-487, May.
    4. Massimo FLORIO & Francesco GIFFONI, 2019. "L’impatto sociale della produzione di scienza su larga scala: come governarlo?," Departmental Working Papers 2019-05, Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano.
    5. Francesca Battisti & Alfio Ferrara & Silvia Salini, 2015. "A decade of research in statistics: a topic model approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 413-433, May.
    6. Bornmann, Lutz, 2019. "Does the normalized citation impact of universities profit from certain properties of their published documents – such as the number of authors and the impact factor of the publishing journals? A mult," Journal of Informetrics, Elsevier, vol. 13(1), pages 170-184.


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:93:y:2012:i:3:d:10.1007_s11192-012-0810-x. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Sonal Shukla) or (Springer Nature Abstracting and Indexing). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.