IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v126y2021i7d10.1007_s11192-021-03984-1.html
   My bibliography  Save this article

Semantic and relational spaces in science of science: deep learning models for article vectorisation

Author

Listed:
  • Diego Kozlowski

    (University of Luxembourg)

  • Jennifer Dusdal

    (University of Luxembourg)

  • Jun Pang

    (University of Luxembourg)

  • Andreas Zilian

    (University of Luxembourg)

Abstract

Over the last century, we observe a steady and exponential growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while GNN we enable us to build a relational space where the social practices of a research community are also encoded.

Suggested Citation

  • Diego Kozlowski & Jennifer Dusdal & Jun Pang & Andreas Zilian, 2021. "Semantic and relational spaces in science of science: deep learning models for article vectorisation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5881-5910, July.
  • Handle: RePEc:spr:scient:v:126:y:2021:i:7:d:10.1007_s11192-021-03984-1
    DOI: 10.1007/s11192-021-03984-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-03984-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-03984-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Eva Lillquist & Sheldon Green, 2010. "The discipline dependence of citation statistics," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(3), pages 749-762, September.
    2. Milojević, Staša, 2015. "Quantifying the cognitive extent of science," Journal of Informetrics, Elsevier, vol. 9(4), pages 962-973.
    3. M. M. Kessler, 1963. "Bibliographic coupling between scientific papers," American Documentation, Wiley Blackwell, vol. 14(1), pages 10-25, January.
    4. Kevin W. Boyack & Richard Klavans, 2010. "Co‐citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(12), pages 2389-2404, December.
    5. Marton Demeter & Tamas Toth, 2020. "The world-systemic network of global elite sociology: the western male monoculture at faculties of the top one-hundred sociology departments of the world," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 2469-2495, September.
    6. Jonathan Adams, 2013. "The fourth age of research," Nature, Nature, vol. 497(7451), pages 557-560, May.
    7. Nikhil Garg & Londa Schiebinger & Dan Jurafsky & James Zou, 2018. "Word embeddings quantify 100 years of gender and ethnic stereotypes," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 115(16), pages 3635-3644, April.
    8. Henry Small, 1973. "Co‐citation in the scientific literature: A new measure of the relationship between two documents," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 24(4), pages 265-269, July.
    9. Kevin W. Boyack & Richard Klavans, 2010. "Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(12), pages 2389-2404, December.
    10. Karl W. Broman & Kara H. Woo, 2018. "Data Organization in Spreadsheets," The American Statistician, Taylor & Francis Journals, vol. 72(1), pages 2-10, January.
    11. Radhamany Sooryamoorthy, 2009. "Do types of collaboration change citation? Collaboration and citation patterns of South African science publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 81(1), pages 177-193, October.
    12. Yi Zhang & Fen Zhao & Jianguo Lu, 2019. "P2V: large-scale academic paper embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 399-432, October.
    13. David A. King, 2004. "The scientific impact of nations," Nature, Nature, vol. 430(6997), pages 311-316, July.
    14. Jonathan B. Slapin & Sven‐Oliver Proksch, 2008. "A Scaling Model for Estimating Time‐Series Party Positions from Texts," American Journal of Political Science, John Wiley & Sons, vol. 52(3), pages 705-722, July.
    15. Chanwoo Jeong & Sion Jang & Eunjeong Park & Sungchul Choi, 2020. "A context-aware citation recommendation model with BERT and graph convolutional networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1907-1922, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yiqin Lv & Zheng Xie & Xiaojing Zuo & Yiping Song, 2022. "A multi-view method of scientific paper classification via heterogeneous graph embeddings," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4847-4872, August.
    2. Barbara McGillivray & Gard B. Jenset & Khalid Salama & Donna Schut, 2022. "Investigating patterns of change, stability, and interaction among scientific disciplines using embeddings," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-15, December.
    3. Yuan Chih Fu & Marcelo Marques & Yuen-Hsien Tseng & Justin J. W. Powell & David P. Baker, 2022. "An evolving international research collaboration network: spatial and thematic developments in co-authored higher education research, 1998–2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(3), pages 1403-1429, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jun-Ping Qiu & Ke Dong & Hou-Qiang Yu, 2014. "Comparative study on structure and correlation among author co-occurrence networks in bibliometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1345-1360, November.
    2. Chen, Kaihua & Zhang, Yi & Fu, Xiaolan, 2019. "International research collaboration: An emerging domain of innovation studies?," Research Policy, Elsevier, vol. 48(1), pages 149-168.
    3. Rons, Nadine, 2018. "Bibliometric approximation of a scientific specialty by combining key sources, title words, authors and references," Journal of Informetrics, Elsevier, vol. 12(1), pages 113-132.
    4. Piñeiro-Chousa, Juan & López-Cabarcos, M. Ángeles & Romero-Castro, Noelia María & Pérez-Pico, Ada María, 2020. "Innovation, entrepreneurship and knowledge in the business scientific field: Mapping the research front," Journal of Business Research, Elsevier, vol. 115(C), pages 475-485.
    5. Guan-Can Yang & Gang Li & Chun-Ya Li & Yun-Hua Zhao & Jing Zhang & Tong Liu & Dar-Zen Chen & Mu-Hsuan Huang, 2015. "Using the comprehensive patent citation network (CPC) to evaluate patent value," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1319-1346, December.
    6. Rey-Long Liu, 2017. "A new bibliographic coupling measure with descriptive capability," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 915-935, February.
    7. Chris W. Belter, 2013. "A bibliometric analysis of NOAA’s Office of Ocean Exploration and Research," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 629-644, May.
    8. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    9. Ignacio Rodríguez-Rodríguez & José-Víctor Rodríguez & Niloofar Shirvanizadeh & Andrés Ortiz & Domingo-Javier Pardo-Quiles, 2021. "Applications of Artificial Intelligence, Machine Learning, Big Data and the Internet of Things to the COVID-19 Pandemic: A Scientometric Review Using Text Mining," IJERPH, MDPI, vol. 18(16), pages 1-29, August.
    10. Yu-Wei Chang & Mu-Hsuan Huang & Chiao-Wen Lin, 2015. "Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2071-2087, December.
    11. Ying Huang & Wolfgang Glänzel & Lin Zhang, 2021. "Tracing the development of mapping knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6201-6224, July.
    12. Tandon, Anushree & Kaur, Puneet & Mäntymäki, Matti & Dhir, Amandeep, 2021. "Blockchain applications in management: A bibliometric analysis and literature review," Technological Forecasting and Social Change, Elsevier, vol. 166(C).
    13. Liu, Yunmei & Yang, Liu & Chen, Min, 2021. "A new citation concept: Triangular citation in the literature," Journal of Informetrics, Elsevier, vol. 15(2).
    14. Duong, Quang Huy & Zhou, Li & Meng, Meng & Nguyen, Truong Van & Ieromonachou, Petros & Nguyen, Duy Tiep, 2022. "Understanding product returns: A systematic literature review using machine learning and bibliometric analysis," International Journal of Production Economics, Elsevier, vol. 243(C).
    15. Toshiyuki Hasumi & Mei-Shiu Chiu, 2022. "Online mathematics education as bio-eco-techno process: bibliometric analysis using co-authorship and bibliographic coupling," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4631-4654, August.
    16. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    17. Yun, Jinhyuk & Ahn, Sejung & Lee, June Young, 2020. "Return to basics: Clustering of scientific literature using structural information," Journal of Informetrics, Elsevier, vol. 14(4).
    18. Goodell, John W. & Kumar, Satish & Lim, Weng Marc & Pattnaik, Debidutta, 2021. "Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis," Journal of Behavioral and Experimental Finance, Elsevier, vol. 32(C).
    19. Kyebambe, Moses Ntanda & Cheng, Ge & Huang, Yunqing & He, Chunhui & Zhang, Zhenyu, 2017. "Forecasting emerging technologies: A supervised learning approach through patent analysis," Technological Forecasting and Social Change, Elsevier, vol. 125(C), pages 236-244.
    20. Mingchun Cao & Ilan Alon, 2020. "Intellectual Structure of the Belt and Road Initiative Research: A Scientometric Analysis and Suggestions for a Future Research Agenda," Sustainability, MDPI, vol. 12(17), pages 1-40, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:126:y:2021:i:7:d:10.1007_s11192-021-03984-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.