IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v116y2018i2d10.1007_s11192-018-2767-x.html
   My bibliography  Save this article

A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis

Author

Listed:
  • Saeed-Ul Hassan

    (Information Technology University)

  • Iqra Safder

    (Information Technology University)

  • Anam Akram

    (Information Technology University)

  • Faisal Kamiran

    (Information Technology University)

Abstract

We measure the knowledge flows between countries by analysing publication and citation data, arguing that not all citations are equally important. Therefore, in contrast to existing techniques that utilize absolute citation counts to quantify knowledge flows between different entities, our model employs a citation context analysis technique, using a machine-learning approach to distinguish between important and non-important citations. We use 14 novel features (including context-based, cue words-based and text-based) to train a Support Vector Machine (SVM) and Random Forest classifier on an annotated dataset of 20,527 publications downloaded from the Association for Computational Linguistics anthology ( http://allenai.org/data.html ). Our machine-learning models outperform existing state-of-the-art citation context approaches, with the SVM model reaching up to 61% and the Random Forest model up to a very encouraging 90% Precision–Recall Area Under the Curve, with 10-fold cross-validation. Finally, we present a case study to explain our deployed method for datasets of PLoS ONE full-text publications in the field of Computer and Information Sciences. Our results show that a significant volume of knowledge flows from the United States, based on important citations, are consumed by the international scientific community. Of the total knowledge flow from China, we find a relatively smaller proportion (only 4.11%) falling into the category of knowledge flow based on important citations, while The Netherlands and Germany show the highest proportions of knowledge flows based on important citations, at 9.06 and 7.35% respectively. Among the institutions, interestingly, the findings show that at the University of Malaya more than 10% of the knowledge produced falls into the category of important. We believe that such analyses are helpful to understand the dynamics of the relevant knowledge flows across nations and institutions.

Suggested Citation

  • Saeed-Ul Hassan & Iqra Safder & Anam Akram & Faisal Kamiran, 2018. "A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 973-996, August.
  • Handle: RePEc:spr:scient:v:116:y:2018:i:2:d:10.1007_s11192-018-2767-x
    DOI: 10.1007/s11192-018-2767-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-018-2767-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-018-2767-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Erjia Yan & Cassidy R. Sugimoto, 2011. "Institutional interactions: Exploring social, cognitive, and geographic relationships between institutions as demonstrated through citation networks," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(8), pages 1498-1514, August.
    2. Adam B. Jaffe & Manuel Trajtenberg & Rebecca Henderson, 1993. "Geographic Localization of Knowledge Spillovers as Evidenced by Patent Citations," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 108(3), pages 577-598.
    3. Hu, Albert G. Z. & Jaffe, Adam B., 2003. "Patent citations and international knowledge flow: the cases of Korea and Taiwan," International Journal of Industrial Organization, Elsevier, vol. 21(6), pages 849-880, June.
    4. Erjia Yan & Cassidy R. Sugimoto, 2011. "Institutional interactions: Exploring social, cognitive, and geographic relationships between institutions as demonstrated through citation networks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(8), pages 1498-1514, August.
    5. Susan Bonzi, 1982. "Characteristics of a Literature as Predictors of Relatedness Between Cited and Citing Works," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 33(4), pages 208-216, July.
    6. Hu, Zhigang & Chen, Chaomei & Liu, Zeyuan, 2013. "Where are citations located in the body of scientific articles? A study of the distributions of citation locations," Journal of Informetrics, Elsevier, vol. 7(4), pages 887-896.
    7. Erjia Yan, 2016. "Disciplinary knowledge production and diffusion in science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(9), pages 2223-2245, September.
    8. Saeed-Ul Hassan & Peter Haddawy, 2013. "Measuring international knowledge flows and scholarly impact of scientific research," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 163-179, January.
    9. Yang, Siluo & Wang, Feifei, 2015. "Visualizing information science: Author direct citation analysis in China and around the world," Journal of Informetrics, Elsevier, vol. 9(1), pages 208-225.
    10. Hicks, Diana & Breitzman, Tony & Olivastro, Dominic & Hamilton, Kimberly, 2001. "The changing composition of innovative activity in the US -- a portrait based on patent analysis," Research Policy, Elsevier, vol. 30(4), pages 681-703, April.
    11. Yan, Erjia & Ding, Ying & Cronin, Blaise & Leydesdorff, Loet, 2013. "A bird's-eye view of scientific trading: Dependency relations among fields of science," Journal of Informetrics, Elsevier, vol. 7(2), pages 249-264.
    12. Erjia Yan, 2015. "Research dynamics, impact, and dissemination: A topic-level analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(11), pages 2357-2372, November.
    13. Leonardo Costa Ribeiro & Glenda Kruss & Gustavo Britto & Américo Tristão Bernardes & Eduardo Motta e Albuquerque, 2014. "A methodology for unveiling global innovation networks: patent citations as clues to cross border knowledge flows," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 61-83, October.
    14. Ponomariov, Branco & Toivanen, Hannes, 2014. "Knowledge flows and bases in emerging economy innovation systems: Brazilian research 2005–2009," Research Policy, Elsevier, vol. 43(3), pages 588-596.
    15. Katy Börner & Shashikant Penumarthy & Mark Meiss & Weimao Ke, 2006. "Mapping the diffusion of scholarly knowledge among major U.S. research institutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(3), pages 415-426, September.
    16. Christine L. Borgman & Ronald E. Rice, 1992. "The convergence of information science and communication: A bibliometric analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 43(6), pages 397-411, July.
    17. Saeed-Ul Hassan & Peter Haddawy, 2015. "Analyzing knowledge flows of scientific literature through semantic links: a case study in the field of energy," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(1), pages 33-46, April.
    18. Loet Leydesdorff & Carole Probst, 2009. "The delineation of an interdisciplinary specialty in terms of a journal set: The case of communication studies," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(8), pages 1709-1718, August.
    19. Miguel R. Guevara & Dominik Hartmann & Manuel Aristarán & Marcelo Mendoza & César A. Hidalgo, 2016. "The research space: using career paths to predict the evolution of the research output of individuals, institutions, and nations," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1695-1709, December.
    20. Martin Meyer, 2002. "RETRACTED ARTICLE: Tracing Knowledge Flows in Innovation Systems—an Informetric Perspective on Future Research Science-based Innovation," Economic Systems Research, Taylor & Francis Journals, vol. 14(4), pages 323-344, December.
    21. Martin Meyer, 2002. "Tracing knowledge flows in innovation systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 54(2), pages 193-212, June.
    22. Charles Oppenheim & Susan P. Renn, 1978. "Highly cited old papers and the reasons why they continue to be cited," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(5), pages 225-231, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xin An & Xin Sun & Shuo Xu, 2022. "Important citations identification with semi-supervised classification model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6533-6555, November.
    2. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    3. Yang, Jinqing & Liu, Zhifeng, 2022. "The effect of citation behaviour on knowledge diffusion and intellectual structure," Journal of Informetrics, Elsevier, vol. 16(1).
    4. Ioan Ianoş & Alexandru-Ionuţ Petrişor, 2020. "An Overview of the Dynamics of Relative Research Performance in Central-Eastern Europe Using a Ranking-Based Analysis Derived from SCImago Data," Publications, MDPI, vol. 8(3), pages 1-25, July.
    5. Chi, Yuxue & Tang, Xianyi & Liu, Yijun, 2022. "Exploring the “awakening effect” in knowledge diffusion: a case study of publications in the library and information science domain," Journal of Informetrics, Elsevier, vol. 16(4).
    6. Lyu, Haihua & Bu, Yi & Zhao, Zhenyue & Zhang, Jiarong & Li, Jiang, 2022. "Citation bias in measuring knowledge flow: Evidence from the web of science at the discipline level," Journal of Informetrics, Elsevier, vol. 16(4).
    7. Xiaorui Jiang & Junjun Liu, 2023. "Extracting the evolutionary backbone of scientific domains: The semantic main path network analysis approach based on citation context analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(5), pages 546-569, May.
    8. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    9. Anton A. Romanov & Aleksey A. Filippov & Valeria V. Voronina & Gleb Guskov & Nadezhda G. Yarushkina, 2021. "Modeling the Context of the Problem Domain of Time Series with Type-2 Fuzzy Sets," Mathematics, MDPI, vol. 9(22), pages 1-16, November.
    10. Saarela, Mirka & Kärkkäinen, Tommi, 2020. "Can we automate expert-based journal rankings? Analysis of the Finnish publication indicator," Journal of Informetrics, Elsevier, vol. 14(2).
    11. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    12. Xinyuan Zhang & Qing Xie & Chaemin Song & Min Song, 2022. "Mining the evolutionary process of knowledge through multiple relationships between keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2023-2053, April.
    13. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
    14. Saeed-Ul Hassan & Mubashir Imran & Sehrish Iqbal & Naif Radi Aljohani & Raheel Nawaz, 2018. "Deep context of citations using machine-learning models in scholarly full-text articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1645-1662, December.
    15. Yaniasih Yaniasih & Indra Budi, 2021. "Systematic Design and Evaluation of a Citation Function Classification Scheme in Indonesian Journals," Publications, MDPI, vol. 9(3), pages 1-14, June.
    16. Yu, Dejian & Yan, Zhaoping, 2023. "Main path analysis considering citation structure and content: Case studies in different domains," Journal of Informetrics, Elsevier, vol. 17(1).
    17. Zhang, Chengzhi & Liu, Lifan & Wang, Yuzhuo, 2021. "Characterizing references from different disciplines: A perspective of citation content analysis," Journal of Informetrics, Elsevier, vol. 15(2).
    18. Federica Bologna & Angelo Iorio & Silvio Peroni & Francesco Poggi, 2023. "Do open citations give insights on the qualitative peer-review evaluation in research assessments? An analysis of the Italian National Scientific Qualification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 19-53, January.
    19. Mao, Jin & Liang, Zhentao & Cao, Yujie & Li, Gang, 2020. "Quantifying cross-disciplinary knowledge flow from the perspective of content: Introducing an approach based on knowledge memes," Journal of Informetrics, Elsevier, vol. 14(4).
    20. Yuzhuo Wang & Chengzhi Zhang & Kai Li, 2022. "A review on method entities in the academic literature: extraction, evaluation, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2479-2520, May.
    21. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yongjun Zhu & Erjia Yan, 2015. "Dynamic subfield analysis of disciplines: an examination of the trading impact and knowledge diffusion patterns of computer science," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(1), pages 335-359, July.
    2. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    3. Yan, Erjia & Ding, Ying & Cronin, Blaise & Leydesdorff, Loet, 2013. "A bird's-eye view of scientific trading: Dependency relations among fields of science," Journal of Informetrics, Elsevier, vol. 7(2), pages 249-264.
    4. Lyu, Haihua & Bu, Yi & Zhao, Zhenyue & Zhang, Jiarong & Li, Jiang, 2022. "Citation bias in measuring knowledge flow: Evidence from the web of science at the discipline level," Journal of Informetrics, Elsevier, vol. 16(4).
    5. Abramo, Giovanni & D’Angelo, Ciriaco Andrea & Di Costa, Flavia, 2020. "The role of geographical proximity in knowledge diffusion, measured by citations to scientific literature," Journal of Informetrics, Elsevier, vol. 14(1).
    6. Ruimin Ma & Erjia Yan, 2016. "Uncovering inter-specialty knowledge communication using author citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 839-854, November.
    7. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    8. Giovanni Abramo & Ciriaco Andrea D’Angelo & Flavia Costa, 2020. "Does the geographic proximity effect on knowledge spillovers vary across research fields?," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 1021-1036, May.
    9. Wang, Jue & Zhang, Liwei, 2018. "Proximal advantage in knowledge diffusion: The time dimension," Journal of Informetrics, Elsevier, vol. 12(3), pages 858-867.
    10. Pan, Xuelian & Yan, Erjia & Cui, Ming & Hua, Weina, 2018. "Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools," Journal of Informetrics, Elsevier, vol. 12(2), pages 481-493.
    11. onder Nomaler & Bart Verspagen, 2008. "Knowledge Flows, Patent Citations and the Impact of Science on Technology," Economic Systems Research, Taylor & Francis Journals, vol. 20(4), pages 339-366.
    12. Liu, Weiwei & Tao, Yuan & Bi, Kexin, 2022. "Capturing information on global knowledge flows from patent transfers: An empirical study using USPTO patents," Research Policy, Elsevier, vol. 51(5).
    13. Yong-Gil Lee & Jeong-Dong Lee & Yong-Il Song & Se-Jun Lee, 2007. "An in-depth empirical analysis of patent citation counts using zero-inflated count data model: The case of KIST," Scientometrics, Springer;Akadémiai Kiadó, vol. 70(1), pages 27-39, January.
    14. Iman Miremadi & Yadollah Saboohi, 2018. "Planning for Investment in Energy Innovation: Developing an Analytical Tool to Explore the Impact of Knowledge Flow," International Journal of Energy Economics and Policy, Econjournals, vol. 8(2), pages 7-19.
    15. Erjia Yan, 2014. "Topic-based Pagerank: toward a topic-level scientific evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(2), pages 407-437, August.
    16. Meijun Liu & Xiao Hu & Jiang Li, 2018. "Knowledge flow in China’s humanities and social sciences," Quality & Quantity: International Journal of Methodology, Springer, vol. 52(2), pages 607-626, March.
    17. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    18. Pu Han & Jin Shi & Xiaoyan Li & Dongbo Wang & Si Shen & Xinning Su, 2014. "International collaboration in LIS: global trends and networks at the country and institution level," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 53-72, January.
    19. Abramo, Giovanni & D’Angelo, Ciriaco Andrea & Di Costa, Flavia, 2021. "On the relation between the degree of internationalization of cited and citing publications: A field level analysis, including and excluding self-citations," Journal of Informetrics, Elsevier, vol. 15(1).
    20. Balland, Pierre-Alexandre & Boschma, Ron, 2022. "Do scientific capabilities in specific domains matter for technological diversification in European regions?," Research Policy, Elsevier, vol. 51(10).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:116:y:2018:i:2:d:10.1007_s11192-018-2767-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.