IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v15y2023i5p3919-d1075774.html
   My bibliography  Save this article

Research on the Automatic Subject-Indexing Method of Academic Papers Based on Climate Change Domain Ontology

Author

Listed:
  • Heng Yang

    (Chinese Academy of Sciences, Northwest Institute of Eco-Environment and Resources, Lanzhou 730000, China)

  • Nan Wang

    (Chinese Academy of Sciences, Northwest Institute of Eco-Environment and Resources, Lanzhou 730000, China)

  • Lina Yang

    (Chinese Academy of Sciences, Northwest Institute of Eco-Environment and Resources, Lanzhou 730000, China)

  • Wei Liu

    (Chinese Academy of Sciences, Northwest Institute of Eco-Environment and Resources, Lanzhou 730000, China)

  • Sili Wang

    (Chinese Academy of Sciences, Northwest Institute of Eco-Environment and Resources, Lanzhou 730000, China)

Abstract

It is important to classify academic papers in a fine-grained manner to uncover deeper implicit themes and semantics in papers for better semantic retrieval, paper recommendation, research trend prediction, topic analysis, and a series of other functions. Based on the ontology of the climate change domain, this study used an unsupervised approach to combine two methods, syntactic structure and semantic modeling, to build a framework of subject-indexing techniques for academic papers in the climate change domain. The framework automatically indexes a set of conceptual terms as research topics from the domain ontology by inputting the titles, abstracts and keywords of the papers using natural language processing techniques such as syntactic dependencies, text similarity calculation, pre-trained language models, semantic similarity calculation, and weighting factors such as word frequency statistics and graph path calculation. Finally, we evaluated the proposed method using the gold standard of manually annotated articles and demonstrated significant improvements over the other five alternative methods in terms of precision, recall and F1-score. Overall, the method proposed in this study is able to identify the research topics of academic papers more accurately, and also provides useful references for the application of domain ontologies and unsupervised data annotation.

Suggested Citation

  • Heng Yang & Nan Wang & Lina Yang & Wei Liu & Sili Wang, 2023. "Research on the Automatic Subject-Indexing Method of Academic Papers Based on Climate Change Domain Ontology," Sustainability, MDPI, vol. 15(5), pages 1-13, February.
  • Handle: RePEc:gam:jsusta:v:15:y:2023:i:5:p:3919-:d:1075774
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/15/5/3919/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/15/5/3919/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kevin W. Boyack & Richard Klavans, 2014. "Creation of a highly detailed, dynamic, global model and map of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 670-685, April.
    2. Jean Vincent Fonou-Dombeu & Nadia Naidoo & Micara Ramnanan & Rachan Gowda & Sahil Ramkaran Lawton, 2021. "OntoCSA: A Climate-Smart Agriculture Ontology," International Journal of Agricultural and Environmental Information Systems (IJAEIS), IGI Global, vol. 12(4), pages 1-20, October.
    3. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
    4. Jianhua Hou & Xiucai Yang & Chaomei Chen, 2018. "Emerging trends and new developments in information science: a document co-citation analysis (2009–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 869-892, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carolina Navarro-Lopez & Salvador Linares-Mustaros & Carles Mulet-Forteza, 2022. "“The Statistical Analysis of Compositional Data†by John Aitchison (1986): A Bibliometric Overview," SAGE Open, , vol. 12(2), pages 21582440221, April.
    2. June Young Lee & Sejung Ahn & Dohyun Kim, 2021. "Deep learning-based prediction of future growth potential of technologies," PLOS ONE, Public Library of Science, vol. 16(6), pages 1-16, June.
    3. Minxi Wang & Ping Liu & Zhaoliang Gu & Hong Cheng & Xin Li, 2019. "A Scientometric Review of Resource Recycling Industry," IJERPH, MDPI, vol. 16(23), pages 1-18, November.
    4. Balázs Győrffy & Andrea Magda Nagy & Péter Herman & Ádám Török, 2018. "Factors influencing the scientific performance of Momentum grant holders: an evaluation of the first 117 research groups," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 409-426, October.
    5. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    6. Hong Shi & Mengmeng Cheng & Yi Feng & Chenghui Qiu & Caiyue Song & Nenglin Yuan & Chuanzhi Kang & Kaijie Yang & Jie Yuan & Yonghao Li, 2023. "Thermal Management Techniques for Lithium-Ion Batteries Based on Phase Change Materials: A Systematic Review and Prospective Recommendations," Energies, MDPI, vol. 16(2), pages 1-23, January.
    7. Andrej Kastrin & Dimitar Hristovski, 2021. "Scientometric analysis and knowledge mapping of literature-based discovery (1986–2020)," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1415-1451, February.
    8. Pinho, Celso R.A. & Pinho, Maria Luiza C.A. & Deligonul, Seyda Z. & Tamer Cavusgil, S., 2022. "The agility construct in the literature: Conceptualization and bibliometric assessment," Journal of Business Research, Elsevier, vol. 153(C), pages 517-532.
    9. Boyack, Kevin W. & Klavans, Richard, 2014. "Including cited non-source items in a large-scale map of science: What difference does it make?," Journal of Informetrics, Elsevier, vol. 8(3), pages 569-580.
    10. Ruiz-Castillo, Javier & Costas, Rodrigo, 2014. "The skewness of scientific productivity," Journal of Informetrics, Elsevier, vol. 8(4), pages 917-934.
    11. Qian, Yue & Liu, Yu & Sheng, Quan Z., 2020. "Understanding hierarchical structural evolution in a scientific discipline: A case study of artificial intelligence," Journal of Informetrics, Elsevier, vol. 14(3).
    12. Chanin Yoopetch & Suthep Nimsai & Boonying Kongarchapatara, 2022. "Bibliometric Analysis of Corporate Social Responsibility in Tourism," Sustainability, MDPI, vol. 15(1), pages 1-16, December.
    13. Jingwei Zheng & Ke Zhang & Boya Han & Jiayi Hou, 2023. "Research Interdisciplinarity and Citation Impact: A Network Analysis of Social Networking Sites Research," SAGE Open, , vol. 13(3), pages 21582440231, August.
    14. Haizhen Cao & Hongxiang Ou & Weiyi Ju & Mengli Pan & Honglai Xue & Fang Zhu, 2023. "Visual Analysis of International Environmental Security Management Research (1997–2021) Based on VOSviewer and CiteSpace," IJERPH, MDPI, vol. 20(3), pages 1-18, January.
    15. Fabian Meyer-Brötz & Edgar Schiebel & Leo Brecht, 2017. "Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1307-1325, June.
    16. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    17. Kevin W. Boyack, 2017. "Thesaurus-based methods for mapping contents of publication sets," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1141-1155, May.
    18. Hric, Darko & Kaski, Kimmo & Kivelä, Mikko, 2018. "Stochastic block model reveals maps of citation patterns and their evolution in time," Journal of Informetrics, Elsevier, vol. 12(3), pages 757-783.
    19. Roksana Jahan Tumpa & Samer Skaik & Miriam Ham & Ghulam Chaudhry, 2022. "A Holistic Overview of Studies to Improve Group-Based Assessments in Higher Education: A Systematic Literature Review," Sustainability, MDPI, vol. 14(15), pages 1-23, August.
    20. Tsung-Ming Hsiao & Kuang-hua Chen, 2020. "The dynamics of research subfields for library and information science: an investigation based on word bibliographic coupling," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 717-737, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:15:y:2023:i:5:p:3919-:d:1075774. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.