IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v128y2023i9d10.1007_s11192-023-04782-7.html
   My bibliography  Save this article

Self-supervised scientific document recommendation based on contrastive learning

Author

Listed:
  • Shicheng Tan

    (Anhui University
    Anhui University
    Anhui University)

  • Tao Zhang

    (University of Illinois at Chicago)

  • Shu Zhao

    (Anhui University
    Anhui University
    Anhui University)

  • Yanping Zhang

    (Anhui University
    Anhui University
    Anhui University)

Abstract

Scientific document recommendation aims to recommend scientific documents that have similar content to a given target scientific document (e.g., paper or patent, etc.). With the explosive growth in scientific documents, how recommending relevant scientific documents from the massive number of scientific documents has become an extremely challenging problem. Existing unsupervised scientific document recommendation works use generic approaches of text representation learning, ignoring the relationships between paragraphs within scientific documents, which is important for highly logical scientific documents. This paper proposes a self-supervised learning method, coupled text pair embedding (CTPE) model, which captures paragraph relations within scientific documents based on contrastive learning. First, we divide the scientific document into two parts. The two parts from the same document are positive samples, and these from different documents are negative samples. Then, we uncover the paragraph relations by contrasting intra-document and inter-document pairs such that intra pairs have the maximum agreement via a contrastive loss in the document embedding space. Finally, we propose a similarity calculation among document embeddings to achieve scientific document recommendations. We perform experiments on three datasets for one patent and two paper recommendation tasks. The experimental results verify the effectiveness of the proposed model. The proposed model can help researchers to efficiently discover relevant literature, foster interdisciplinary connections, and guide their research efforts in the scientometrics community. (The code is available at https://github.com/aitsc/text-representation .)

Suggested Citation

  • Shicheng Tan & Tao Zhang & Shu Zhao & Yanping Zhang, 2023. "Self-supervised scientific document recommendation based on contrastive learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5027-5049, September.
  • Handle: RePEc:spr:scient:v:128:y:2023:i:9:d:10.1007_s11192-023-04782-7
    DOI: 10.1007/s11192-023-04782-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-023-04782-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-023-04782-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Byungun Yoon & Songhee Kim & Sunhye Kim & Hyeonju Seol, 2022. "Doc2vec-based link prediction approach using SAO structures: application to patent network," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5385-5414, September.
    2. Lea Helmers & Franziska Horn & Franziska Biegler & Tim Oppermann & Klaus-Robert Müller, 2019. "Automating the search for a patent’s prior art with a full text similarity search," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-17, March.
    3. Rodrigo Nogueira & Zhiying Jiang & Kyunghyun Cho & Jimmy Lin, 2020. "Navigation-based candidate expansion and pretrained language models for citation recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 3001-3016, December.
    4. Chanathip Pornprasit & Xin Liu & Pattararat Kiattipadungkul & Natthawut Kertkeidkachorn & Kyoung-Sook Kim & Thanapon Noraset & Saeed-Ul Hassan & Suppawong Tuarob, 2022. "Enhancing citation recommendation using citation network embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 233-264, January.
    5. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    6. Zafar Ali & Irfan Ullah & Amin Khan & Asim Ullah Jan & Khan Muhammad, 2021. "An overview and evaluation of citation recommendation models," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4083-4119, May.
    7. Jaewoong Choi & Jiho Lee & Janghyeok Yoon & Sion Jang & Jaeyoung Kim & Sungchul Choi, 2022. "A two-stage deep learning-based system for patent citation recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6615-6636, November.
    8. Yonghe Lu & Meilu Yuan & Jiaxin Liu & Minghong Chen, 2023. "Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1367-1393, February.
    9. Zafar Ali & Irfan Ullah & Amin Ul Haq & Asim Ullah Jan & Khan Muhammad, 2021. "Correction to: An overview and evaluation of citation recommendation models," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8771-8771, October.
    10. Choi, Jaewoong & Yoon, Janghyeok, 2022. "Measuring knowledge exploration distance at the patent level: Application of network embedding and citation analysis," Journal of Informetrics, Elsevier, vol. 16(2).
    11. Jie Chen & Jialin Chen & Shu Zhao & Yanping Zhang & Jie Tang, 2020. "Exploiting word embedding for heterogeneous topic model towards patent recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2091-2108, December.
    12. Shutian Ma & Heng Zhang & Chengzhi Zhang & Xiaozhong Liu, 2021. "Chronological citation recommendation with time preference," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 2991-3010, April.
    13. Tianshuang Qiu & Chuanming Yu & Yunci Zhong & Lu An & Gang Li, 2021. "A scientific citation recommendation model integrating network and text representations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(11), pages 9199-9221, November.
    14. An, Xin & Li, Jinghong & Xu, Shuo & Chen, Liang & Sun, Wei, 2021. "An improved patent similarity measurement based on entities and semantic relations," Journal of Informetrics, Elsevier, vol. 15(2).
    15. Hei-Chia Wang & Jen-Wei Cheng & Che-Tsung Yang, 2022. "SentCite: a sentence-level citation recommender based on the salient similarity among multiple segments," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2521-2546, May.
    16. Jong Wook Lee & So Young Sohn, 2021. "Patent data based search framework for IT R&D employees for convergence technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5687-5705, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yonghe Lu & Meilu Yuan & Jiaxin Liu & Minghong Chen, 2023. "Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1367-1393, February.
    2. Zafar Ali & Guilin Qi & Pavlos Kefalas & Shah Khusro & Inayat Khan & Khan Muhammad, 2022. "SPR-SMN: scientific paper recommendation employing SPECTER with memory network," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6763-6785, November.
    3. Guangtong Li & L. Siddharth & Jianxi Luo, 2023. "Embedding knowledge graph of patent metadata to measure knowledge proximity," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(4), pages 476-490, April.
    4. Chaker Jebari & Enrique Herrera-Viedma & Manuel Jesus Cobo, 2023. "Context-aware citation recommendation of scientific papers: comparative study, gaps and trends," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4243-4268, August.
    5. Jie Chen & Jialin Chen & Shu Zhao & Yanping Zhang & Jie Tang, 2020. "Exploiting word embedding for heterogeneous topic model towards patent recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2091-2108, December.
    6. Irina Wedel & Michael Palk & Stefan Voß, 2022. "A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter," Information Systems Frontiers, Springer, vol. 24(5), pages 1635-1646, October.
    7. Mohammed Salem Binwahlan, 2023. "Polynomial Networks Model for Arabic Text Summarization," International Journal of Research and Scientific Innovation, International Journal of Research and Scientific Innovation (IJRSI), vol. 10(2), pages 74-84, February.
    8. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    9. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    10. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    11. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    12. Klaus Gugler & Florian Szücs & Ulrich Wohak, 2023. "Start-up Acquisitions, Venture Capital and Innovation: A Comparative Study of Google, Apple, Facebook, Amazon and Microsoft," Department of Economics Working Papers wuwp340, Vienna University of Economics and Business, Department of Economics.
    13. Juan Shi & Kin Keung Lai & Ping Hu & Gang Chen, 2018. "Factors dominating individual information disseminating behavior on social networking sites," Information Technology and Management, Springer, vol. 19(2), pages 121-139, June.
    14. Ganesh Dash & Chetan Sharma & Shamneesh Sharma, 2023. "Sustainable Marketing and the Role of Social Media: An Experimental Study Using Natural Language Processing (NLP)," Sustainability, MDPI, vol. 15(6), pages 1-16, March.
    15. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    16. Shr-Wei Kao & Pin Luarn, 2020. "Topic Modeling Analysis of Social Enterprises: Twitter Evidence," Sustainability, MDPI, vol. 12(8), pages 1-20, April.
    17. Gissler, Stefan & Oldfather, Jeremy & Ruffino, Doriana, 2016. "Lending on hold: Regulatory uncertainty and bank lending standards," Journal of Monetary Economics, Elsevier, vol. 81(C), pages 89-101.
    18. Wittek, Peter, 2013. "Two-way incremental seriation in the temporal domain with three-dimensional visualization: Making sense of evolving high-dimensional datasets," Computational Statistics & Data Analysis, Elsevier, vol. 66(C), pages 193-201.
    19. Alina Evstigneeva & Mark Sidorovskiy, 2021. "Assessment of Clarity of Bank of Russia Monetary Policy Communication by Neural Network Approach," Russian Journal of Money and Finance, Bank of Russia, vol. 80(3), pages 3-33, September.
    20. Arno de Caigny & Kristof Coussement & Koen W. de Bock & Stefan Lessmann, 2019. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," Post-Print hal-02275958, HAL.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:128:y:2023:i:9:d:10.1007_s11192-023-04782-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.