IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i3d10.1007_s11192-020-03677-1.html
   My bibliography  Save this article

Important citation identification by exploiting the syntactic and contextual information of citations

Author

Listed:
  • Mingyang Wang

    (Northeast Forestry University)

  • Jiaqi Zhang

    (Northeast Forestry University)

  • Shijia Jiao

    (Northeast Forestry University)

  • Xiangrong Zhang

    (Heilongjiang Institute of Technology)

  • Na Zhu

    (Library, Harbin University)

  • Guangsheng Chen

    (Northeast Forestry University)

Abstract

Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.

Suggested Citation

  • Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03677-1
    DOI: 10.1007/s11192-020-03677-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03677-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03677-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ding, Ying & Liu, Xiaozhong & Guo, Chun & Cronin, Blaise, 2013. "The distribution of references across texts: Some implications for citation analysis," Journal of Informetrics, Elsevier, vol. 7(3), pages 583-592.
    2. Lin, Chi-Shiou & Huang, Mu-Hsuan & Chen, Dar-Zen, 2013. "The influences of counting methods on university rankings based on paper count and citation count," Journal of Informetrics, Elsevier, vol. 7(3), pages 611-621.
    3. Pei-Shan Chi & Wolfgang Glänzel, 2017. "An empirical investigation of the associations among usage, scientific collaboration and citation impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 403-412, July.
    4. Wolfgang Glänzel & Bart Thijs, 2018. "The role of baseline granularity for benchmarking citation impact. The case of CSS profiles," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 521-536, July.
    5. Peter Vinkler, 2018. "Structure of the scientific research and science policy," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 737-756, February.
    6. Saeed-Ul Hassan & Iqra Safder & Anam Akram & Faisal Kamiran, 2018. "A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 973-996, August.
    7. Wolfgang Glänzel & Ronald Rousseau & Lin Zhang, 2012. "A visual representation of relative first‐citation times," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(7), pages 1420-1425, July.
    8. Yu, Tian & Yu, Guang & Wang, Ming-Yang, 2014. "Classification method for detecting coercive self-citation in journals," Journal of Informetrics, Elsevier, vol. 8(1), pages 123-135.
    9. Wolfgang Glänzel & Henk F. Moed, 2002. "Journal impact measures in bibliometric research," Scientometrics, Springer;Akadémiai Kiadó, vol. 53(2), pages 171-193, February.
    10. Zhang, Lin & Thijs, Bart & Glänzel, Wolfgang, 2011. "The diffusion of H-related literature," Journal of Informetrics, Elsevier, vol. 5(4), pages 583-593.
    11. Muhammad Raheel & Samreen Ayaz & Muhammad Tanvir Afzal, 2018. "Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1107-1127, March.
    12. Xiaojun Wan & Fang Liu, 2014. "Are all literature citations equally important? Automatic citation strength estimation and its applications," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1929-1938, September.
    13. Feiheng Luo & Aixin Sun & Mojisola Erdt & Aravind Sesagiri Raamkumar & Yin-Leng Theng, 2018. "Exploring prestigious citations sourced from top universities in bibliometrics and altmetrics: a case study in the computer science discipline," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 1-17, January.
    14. Boyack, Kevin W. & van Eck, Nees Jan & Colavizza, Giovanni & Waltman, Ludo, 2018. "Characterizing in-text citations in scientific articles: A large-scale analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 59-73.
    15. CholMyong Pak & Guang Yu & Weibin Wang, 2018. "A study on the citation situation within the citing paper: citation distribution of references according to mention frequency," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 905-918, March.
    16. Jian Wang & Bart Thijs & Wolfgang Glänzel, 2015. "Interdisciplinarity and Impact: Distinct Effects of Variety, Balance, and Disparity," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-18, May.
    17. Christian Sternitzke & Isumo Bergmann, 2009. "Similarity measures for document mapping: A comparative study on the level of an individual scientist," Scientometrics, Springer;Akadémiai Kiadó, vol. 78(1), pages 113-130, January.
    18. Juyoung An & Namhee Kim & Min-Yen Kan & Muthu Kumar Chandrasekaran & Min Song, 2017. "Exploring characteristics of highly cited authors according to citation location and content," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(8), pages 1975-1988, August.
    19. Marc Bertin & Iana Atanassova & Cassidy R. Sugimoto & Vincent Lariviere, 2016. "The linguistic patterns and rhetorical structure of citation context: an approach using n-grams," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1417-1434, December.
    20. Narongrit Sombatsompop & Apisit Kositchaiyong & Teerasak Markpin & Sekson Inrit, 2006. "Scientific evaluations of citation quality of international research articles in the SCI database: Thailand case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 66(3), pages 521-535, March.
    21. Ying Ding & Guo Zhang & Tamy Chambers & Min Song & Xiaolong Wang & Chengxiang Zhai, 2014. "Content-based citation analysis: The next generation of citation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1820-1833, September.
    22. Xiaodan Zhu & Peter Turney & Daniel Lemire & André Vellino, 2015. "Measuring academic influence: Not all citations are equal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(2), pages 408-427, February.
    23. Mingyang Wang & Guang Yu & Shuang An & Daren Yu, 2012. "Discovery of factors influencing citation impact based on a soft fuzzy rough set model," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(3), pages 635-644, December.
    24. Francisco J Valverde-Albacete & Carmen Peláez-Moreno, 2014. "100% Classification Accuracy Considered Harmful: The Normalized Information Transfer Factor Explains the Accuracy Paradox," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-10, January.
    25. Wolfgang Glänzel & Ronald Rousseau & Lin Zhang, 2012. "A visual representation of relative first-citation times," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(7), pages 1420-1425, July.
    26. Munui Kim & Injun Baek & Min Song, 2018. "Topic diffusion analysis of a weighted citation network in biomedical literature," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 69(2), pages 329-342, February.
    27. Wolfgang Glänzel & Bart Thijs, 2004. "The influence of author self-citations on bibliometric macro indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 59(3), pages 281-310, March.
    28. Qurat-ul Ain & Hira Riaz & Muhammad Tanvir Afzal, 2019. "Evaluation of h-index and its citation intensity based variants in the field of mathematics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 187-211, April.
    29. Zehra Taşkın & Umut Al, 2018. "A content-based citation analysis study based on text categorization," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 335-357, January.
    30. Wolfgang Glänzel & Bart Thijs & Koenraad Debackere, 2014. "The application of citation-based performance classes to the disciplinary and multidisciplinary assessment in national comparison and institutional research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 939-952, November.
    31. Mingyang Wang & Shijia Jiao & Kah-Hin Chai & Guangsheng Chen, 2019. "Building journal’s long-term impact: using indicators detected from the sustained active articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 261-283, October.
    32. Faiza Qayyum & Muhammad Tanvir Afzal, 2019. "Identification of important citations by exploiting research articles’ metadata and cue-terms from content," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 21-43, January.
    33. David A. King, 2004. "The scientific impact of nations," Nature, Nature, vol. 430(6997), pages 311-316, July.
    34. Dangzhi Zhao & Andreas Strotmann, 2016. "Dimensions and uncertainties of author citation rankings: Lessons learned from frequency-weighted in-text citation counting," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(3), pages 671-682, March.
    35. Pei-Shan Chi & Wolfgang Glänzel, 2018. "Comparison of citation and usage indicators in research assessment in scientific disciplines and journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 537-554, July.
    36. Samreen Ayaz & Muhammad Tanvir Afzal, 2016. "Identification of conversion factor for completing-h index for the field of mathematics," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1511-1524, December.
    37. Chao Lu & Ying Ding & Chengzhi Zhang, 2017. "Understanding the impact change of a highly cited article: a content-based citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 927-945, August.
    38. Mingyang Wang & Shi Li & Guangsheng Chen, 2017. "Detecting latent referential articles based on their vitality performance in the latest 2 years," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1557-1571, September.
    39. Lei Wang & Bart Thijs & Wolfgang Glänzel, 2015. "Characteristics of international collaboration in sport sciences publications and its influence on citation impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(2), pages 843-862, November.
    40. Henry Small, 2011. "Interpreting maps of science using citation context sentiments: a preliminary investigation," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(2), pages 373-388, May.
    41. Jun Zhang & Zhaolong Ning & Xiaomei Bai & Xiangjie Kong & Jinmeng Zhou & Feng Xia, 2017. "Exploring time factors in measuring the scientific impact of scholars," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1301-1321, September.
    42. Richard C. Anderson & Francis Narin & Paul McAllister, 1978. "Publication ratings versus peer ratings of universities," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(2), pages 91-103, March.
    43. Wolfgang Glänzel & Koenraad Debackere & Bart Thijs & András Schubert, 2006. "A concise review on the role of author self-citations in information science, bibliometrics and science policy," Scientometrics, Springer;Akadémiai Kiadó, vol. 67(2), pages 263-277, May.
    44. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    45. Muhammad Touseef Ikram & Muhammad Tanvir Afzal, 2019. "Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 73-95, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xin An & Xin Sun & Shuo Xu, 2022. "Important citations identification with semi-supervised classification model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6533-6555, November.
    2. Faiza Qayyum & Harun Jamil & Naeem Iqbal & DoHyeun Kim & Muhammad Tanvir Afzal, 2022. "Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6471-6499, November.
    3. Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.
    4. Setio Basuki & Masatoshi Tsuchiya, 2022. "SDCF: semi-automatically structured dataset of citation functions," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4569-4608, August.
    5. Zhongyi Wang & Keying Wang & Jiyue Liu & Jing Huang & Haihua Chen, 2022. "Measuring the innovation of method knowledge elements in scientific literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2803-2827, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    2. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    3. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Tianyu Zhang, 2019. "Evaluating the impact of citations of articles based on knowledge flow patterns hidden in the citations," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-19, November.
    4. Hamid R. Jamali & Majid Nabavi & Saeid Asadi, 2018. "How video articles are cited, the case of JoVE: Journal of Visualized Experiments," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1821-1839, December.
    5. Shengzhi Huang & Jiajia Qian & Yong Huang & Wei Lu & Yi Bu & Jinqing Yang & Qikai Cheng, 2022. "Disclosing the relationship between citation structure and future impact of a publication," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 1025-1042, July.
    6. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    7. Toluwase Victor Asubiaro & Isola Ajiferuke, 2022. "Semantic similarity-based credit attribution on citation paths: a method for allocating residual citation to and investigating depth of influence of scientific communications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6257-6277, November.
    8. Bikun Chen & Dannan Deng & Zhouyan Zhong & Chengzhi Zhang, 2020. "Exploring linguistic characteristics of highly browsed and downloaded academic articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1769-1790, March.
    9. Ruhao Zhang & Junpeng Yuan, 2022. "Enhanced author bibliographic coupling analysis using semantic and syntactic citation information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7681-7706, December.
    10. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    11. Dangzhi Zhao & Andreas Strotmann, 2020. "Telescopic and panoramic views of library and information science research 2011–2018: a comparison of four weighting schemes for author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 255-270, July.
    12. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    13. Dongqing Lyu & Xuanmin Ruan & Juan Xie & Ying Cheng, 2021. "The classification of citing motivations: a meta-synthesis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3243-3264, April.
    14. Weibin Wang & Zheng Wang & Tian Yu & CholMyong Pak & Guang Yu, 2020. "Research on citation mention times and contributions using a neural network," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2383-2400, December.
    15. Dangzhi Zhao & Andreas Strotmann, 2020. "Deep and narrow impact: introducing location filtered citation counting," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 503-517, January.
    16. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    17. Muhammad Touseef Ikram & Muhammad Tanvir Afzal, 2019. "Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 73-95, April.
    18. Zhang, Chengzhi & Liu, Lifan & Wang, Yuzhuo, 2021. "Characterizing references from different disciplines: A perspective of citation content analysis," Journal of Informetrics, Elsevier, vol. 15(2).
    19. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    20. Liyue Chen & Jielan Ding & Vincent Larivière, 2022. "Measuring the citation context of national self‐references," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(5), pages 671-686, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03677-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.