IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v130y2025i8d10.1007_s11192-025-05376-1.html
   My bibliography  Save this article

Comparing representations of a discipline derived through LDA vs. intellectual content analysis: the case of information science

Author

Listed:
  • Kaisa Ylikruuvi

    (Tampere University)

  • Kalervo Järvelin

    (Tampere University)

  • Pertti Vakkari

    (Tampere University)

  • Martti Juhola

    (Tampere University)

Abstract

The paper looks at the methodology of empirical analyses of the content and structure of Information Science (IS). The traditional approach in empirical analysis is intellectual content analysis (ICA) of a representative data set. The high labor cost prohibits the analysis of massive data sets. A recent alternative is based on data mining/machine learning. Its strength is the capability of analyzing massive datasets efficiently. However, a significant issue is the quality of content analysis. The paper compares latent Dirichlet allocation/topic modeling (LDA/TM) based statistical analysis to ICA using the same data set, 1514 scholarly articles from the year 2015 volumes of 30 IS journals. The intellectual analysis provides the mirror for reflecting the TM results. LDA/TM is strong in identifying new directions of a discipline and processing masses of text. Its weaknesses include semantic haziness of topics due to bag-of-words article representation, text pre-processing, tuning of parameters, and being unanalytic in composing topics from words belonging to different categories.

Suggested Citation

  • Kaisa Ylikruuvi & Kalervo Järvelin & Pertti Vakkari & Martti Juhola, 2025. "Comparing representations of a discipline derived through LDA vs. intellectual content analysis: the case of information science," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(8), pages 4309-4337, August.
  • Handle: RePEc:spr:scient:v:130:y:2025:i:8:d:10.1007_s11192-025-05376-1
    DOI: 10.1007/s11192-025-05376-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-025-05376-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-025-05376-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. V. Cano, 1999. "Bibliometric overview of Library and Information Science Research in Spain," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 50(8), pages 675-680.
    2. Cassidy R. Sugimoto & Daifeng Li & Terrell G. Russell & S. Craig Finlay & Ying Ding, 2011. "The shifting sands of disciplinary development: Analyzing North American Library and Information Science dissertations using latent Dirichlet allocation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(1), pages 185-204, January.
    3. Pertti Vakkari & Yu‐Wei Chang & Kalervo Järvelin, 2022. "Disciplinary contributions to research topics and methodology in Library and Information Science—Leading to fragmentation?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(12), pages 1706-1722, December.
    4. Grün, Bettina & Hornik, Kurt, 2011. "topicmodels: An R Package for Fitting Topic Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i13).
    5. Jinxuan Ma & Brady Lund, 2021. "The evolution and shift of research topics and methods in library and information science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 1059-1074, August.
    6. Pertti Vakkari & Yu-Wei Chang & Kalervo Järvelin, 2022. "Largest contribution to LIS by external disciplines as measured by the characteristics of research articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4499-4522, August.
    7. Yu-Wei Chang & Mu-Hsuan Huang & Chiao-Wen Lin, 2015. "Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2071-2087, December.
    8. Cassidy R. Sugimoto & Daifeng Li & Terrell G. Russell & S. Craig Finlay & Ying Ding, 2011. "The shifting sands of disciplinary development: Analyzing North American Library and Information Science dissertations using latent Dirichlet allocation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(1), pages 185-204, January.
    9. Yingyi Zhang & Chengzhi Zhang, 2024. "Extracting problem and method sentence from scientific papers: a context-enhanced transformer using formulaic expression desensitization," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 3433-3468, June.
    10. Yu-Wei Chang, 2018. "Examining interdisciplinarity of library and information science (LIS) based on LIS articles contributed by non-LIS authors," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1589-1613, September.
    11. Pertti Vakkari & Kalervo Järvelin & Yu‐Wei Chang, 2023. "The association of disciplinary background with the evolution of topics and methods in Library and Information Science research 1995–2015," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(7), pages 811-827, July.
    12. Cristóbal Urbano & Jordi Ardanuy, 2020. "Cross-disciplinary collaboration versus coexistence in LIS serials: analysis of authorship affiliations in four European countries," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 575-602, July.
    13. Yosuke Miyata & Emi Ishita & Fang Yang & Michimasa Yamamoto & Azusa Iwase & Keiko Kurata, 2020. "Knowledge structure transition in library and information science: topic modeling and visualization," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 665-687, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Abhijit Thakuria & Dipen Deka, 2024. "A decadal study on identifying latent topics and research trends in open access LIS journals using topic modeling approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 3841-3869, July.
    2. Pertti Vakkari & Kalervo Järvelin & Yu‐Wei Chang, 2023. "The association of disciplinary background with the evolution of topics and methods in Library and Information Science research 1995–2015," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(7), pages 811-827, July.
    3. Pertti Vakkari & Yu-Wei Chang & Kalervo Järvelin, 2022. "Largest contribution to LIS by external disciplines as measured by the characteristics of research articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4499-4522, August.
    4. Pin Li & Guoli Yang & Chuanqi Wang, 2019. "Visual topical analysis of library and information science," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1753-1791, December.
    5. Yu-Wei Chang, 2018. "Examining interdisciplinarity of library and information science (LIS) based on LIS articles contributed by non-LIS authors," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1589-1613, September.
    6. Yunhan Liu & Xia Xu & Shuqing Li, 2025. "Understanding of evolutionary features in the library and information science with interdisciplinary network analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(2), pages 781-808, February.
    7. Wen-Yau Cathy Lin, 2012. "Research status and characteristics of library and information science in Taiwan: a bibliometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(1), pages 7-21, July.
    8. Pertti Vakkari & Yu‐Wei Chang & Kalervo Järvelin, 2022. "Disciplinary contributions to research topics and methodology in Library and Information Science—Leading to fragmentation?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(12), pages 1706-1722, December.
    9. Yi Bu & Binglu Wang & Win-bin Huang & Shangkun Che & Yong Huang, 2018. "Using the appearance of citations in full text on author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 275-289, July.
    10. John Rigby & Barbara Jones, 2020. "Bringing the doctoral thesis by published papers to the Social Sciences and the Humanities: A quantitative easing? A small study of doctoral thesis submission rules and practice in two disciplines in ," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1387-1409, August.
    11. Beibei Hu & Xianlei Dong & Chenwei Zhang & Timothy D. Bowman & Ying Ding & Staša Milojević & Chaoqun Ni & Erjia Yan & Vincent Larivière, 2015. "A lead-lag analysis of the topic evolution patterns for preprints and publications," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(12), pages 2643-2656, December.
    12. Nikoleta E. Glynatsi & Vincent A. Knight, 2021. "A bibliometric study of research topics, collaboration, and centrality in the iterated prisoner’s dilemma," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 8(1), pages 1-12, December.
    13. Jianbing Ma & Kexin Yang, 2025. "A three-dimensional framework for quantifying knowledge intersection intensity: from a micro perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(1), pages 367-398, January.
    14. Erjia Yan, 2014. "Topic-based Pagerank: toward a topic-level scientific evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(2), pages 407-437, August.
    15. Kim, Min Sung & Kim, Junghwan & Kim, Seongcheol, 2023. "Korea's leadership in 5G and beyond: Footprints and futures," Telecommunications Policy, Elsevier, vol. 47(8).
    16. Sarah Tiba & Frank J. van Rijnsoever & Marko P. Hekkert, 2019. "Firms with benefits: A systematic review of responsible entrepreneurship and corporate social responsibility literature," Corporate Social Responsibility and Environmental Management, John Wiley & Sons, vol. 26(2), pages 265-284, March.
    17. Sabina-Cristiana NECULA & Catalin STRIMBEI, 2019. "Identifying Software Complexity Topics with Latent Dirichlet Allocation on Design Patterns," Informatica Economica, Academy of Economic Studies - Bucharest, Romania, vol. 23(4), pages 5-16.
    18. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    19. Zuo, Zhiya & Zhao, Kang & Ni, Chaoqun, 2019. "Standing on the shoulders of giants?—Faculty hiring in information schools," Journal of Informetrics, Elsevier, vol. 13(1), pages 341-353.
    20. Carlos G. Figuerola & Francisco Javier García Marco & María Pinto, 2017. "Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1507-1535, September.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:130:y:2025:i:8:d:10.1007_s11192-025-05376-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.