IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i3d10.1007_s11192-020-03666-4.html
   My bibliography  Save this article

Exploiting word embedding for heterogeneous topic model towards patent recommendation

Author

Listed:
  • Jie Chen

    (Ministry of Education
    Anhui University)

  • Jialin Chen

    (Ministry of Education
    Anhui University)

  • Shu Zhao

    (Ministry of Education
    Anhui University)

  • Yanping Zhang

    (Ministry of Education
    Anhui University)

  • Jie Tang

    (Tsinghua University)

Abstract

Patent recommendation aims to recommend patent documents that have similar content to a given target patent. With the explosive growth in patent applications, how to recommend relevant patents from the massive number of patents has become an extremely challenging problem. The main obstacle in patent recommendation is how to distinguish the meanings of the same word in different contexts or associate multiple words that express the same meaning. In this paper, we propose a Heterogeneous Topic model exploiting Word embedding to enhance word semantics (HTW). First, we model the relationship among text, inventors, and applicants around the topic to build a heterogeneous topic model and learn the patent feature representation to capture contextual word semantics. Second, a word embedding is constructed to extract the deep semantics for associating multiple words that express the same meaning. Finally, with words as connections, the mapping from patent feature representations to patent embedding is established through a matrix operation, which integrates the information between the word embedding and patent feature representation. HTW considers the heterogeneity of patents and enhances the distinction or association among words simultaneously. The experimental results on real-world datasets show that HTW exceeds typical keyword-based methods, topic models, and embedding models on patent recommendations.

Suggested Citation

  • Jie Chen & Jialin Chen & Shu Zhao & Yanping Zhang & Jie Tang, 2020. "Exploiting word embedding for heterogeneous topic model towards patent recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2091-2108, December.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03666-4
    DOI: 10.1007/s11192-020-03666-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03666-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03666-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sam Arts & Bruno Cassiman & Juan Carlos Gomez, 2018. "Text matching to measure patent similarity," Strategic Management Journal, Wiley Blackwell, vol. 39(1), pages 62-84, January.
    2. Baitong Chen & Ying Ding & Feicheng Ma, 2018. "Semantic word shifts in a scientific domain," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 211-226, October.
    3. Lea Helmers & Franziska Horn & Franziska Biegler & Tim Oppermann & Klaus-Robert Müller, 2019. "Automating the search for a patent’s prior art with a full text similarity search," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-17, March.
    4. Shaobo Li & Jie Hu & Yuxin Cui & Jianjun Hu, 2018. "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 721-744, November.
    5. Li, Guan-Cheng & Lai, Ronald & D’Amour, Alexander & Doolin, David M. & Sun, Ye & Torvik, Vetle I. & Yu, Amy Z. & Fleming, Lee, 2014. "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010)," Research Policy, Elsevier, vol. 43(6), pages 941-955.
    6. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    7. Chen, Lixin, 2017. "Do patent citations indicate knowledge linkage? The evidence from text similarities between patents and their citations," Journal of Informetrics, Elsevier, vol. 11(1), pages 63-79.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shicheng Tan & Tao Zhang & Shu Zhao & Yanping Zhang, 2023. "Self-supervised scientific document recommendation based on contrastive learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5027-5049, September.
    2. Manuel A. Vázquez & Jorge Pereira-Delgado & Jesús Cid-Sueiro & Jerónimo Arenas-García, 2022. "Validation of scientific topic models using graph analysis and corpus metadata," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5441-5458, September.
    3. Arousha Haghighian Roudsari & Jafar Afshar & Wookey Lee & Suan Lee, 2022. "PatentNet: multi-label classification of patent documents using deep learning based language understanding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 207-231, January.
    4. Lu Huang & Xiang Chen & Yi Zhang & Changtian Wang & Xiaoli Cao & Jiarun Liu, 2022. "Identification of topic evolution: network analytics with piecewise linear representation and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5353-5383, September.
    5. Qiang Gao & Man Jiang, 2024. "Exploring technology fusion by combining latent Dirichlet allocation with Doc2vec: a case of digital medicine and machine learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4043-4070, July.
    6. Ascione, Grazia Sveva, 2023. "Technological diversity to address complex challenges: the contribution of American universities to sdgs," MPRA Paper 119452, University Library of Munich, Germany.
    7. Ting Xiong & Liang Zhou & Ying Zhao & Xiaojuan Zhang, 2022. "Mining semantic information of co-word network to improve link prediction performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 2981-3004, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cassiman, Bruno & Veugelers, Reinhilde & Arts, Sam, 2018. "Mind the gap: Capturing value from basic research through combining mobile inventors and partnerships," Research Policy, Elsevier, vol. 47(9), pages 1811-1824.
    2. Watzinger, Martin & Schnitzer, Monika, 2019. "Standing on the Shoulders of Science," Rationality and Competition Discussion Paper Series 215, CRC TRR 190 Rationality and Competition.
    3. Higham, Kyle & de Rassenfosse, Gaétan & Jaffe, Adam B., 2021. "Patent Quality: Towards a Systematic Framework for Analysis and Measurement," Research Policy, Elsevier, vol. 50(4).
    4. Choi, Seokkyu & Lee, Hyeonju & Park, Eunjeong & Choi, Sungchul, 2022. "Deep learning for patent landscaping using transformer and graph embedding," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    5. Kong, Nancy & Dulleck, Uwe & Jaffe, Adam B. & Sun, Shupeng & Vajjala, Sowmya, 2023. "Linguistic metrics for patent disclosure: Evidence from university versus corporate patents," Research Policy, Elsevier, vol. 52(2).
    6. Hain, Daniel S. & Jurowetzki, Roman & Buchmann, Tobias & Wolf, Patrick, 2022. "A text-embedding-based approach to measuring patent-to-patent technological similarity," Technological Forecasting and Social Change, Elsevier, vol. 177(C).
    7. Bekamiri, Hamid & Hain, Daniel S. & Jurowetzki, Roman, 2024. "PatentSBERTa: A deep NLP based hybrid model for patent distance and classification using augmented SBERT," Technological Forecasting and Social Change, Elsevier, vol. 206(C).
    8. Shicheng Tan & Tao Zhang & Shu Zhao & Yanping Zhang, 2023. "Self-supervised scientific document recommendation based on contrastive learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5027-5049, September.
    9. Irina Wedel & Michael Palk & Stefan Voß, 2022. "A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter," Information Systems Frontiers, Springer, vol. 24(5), pages 1635-1646, October.
    10. Li, Mingxiang, 2021. "Exploring novel technologies through board interlocks: Spillover vs. broad exploration," Research Policy, Elsevier, vol. 50(9).
    11. Mohammed Salem Binwahlan, 2023. "Polynomial Networks Model for Arabic Text Summarization," International Journal of Research and Scientific Innovation, International Journal of Research and Scientific Innovation (IJRSI), vol. 10(2), pages 74-84, February.
    12. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    13. Verhoeven, Dennis & Bakker, Jurriën & Veugelers, Reinhilde, 2016. "Measuring technological novelty with patent-based indicators," Research Policy, Elsevier, vol. 45(3), pages 707-723.
    14. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    15. Chattergoon, B. & Kerr, W.R., 2022. "Winner takes all? Tech clusters, population centers, and the spatial transformation of U.S. invention," Research Policy, Elsevier, vol. 51(2).
    16. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    17. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    18. Klaus Gugler & Florian Szücs & Ulrich Wohak, 2023. "Start-up Acquisitions, Venture Capital and Innovation: A Comparative Study of Google, Apple, Facebook, Amazon and Microsoft," Department of Economics Working Papers wuwp340, Vienna University of Economics and Business, Department of Economics.
    19. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    20. Ajay Bhaskarbhatla & Luis Cabral & Deepak Hegde & Thomas (T.L.P.R.) Peeters, 2017. "Human Capital, Firm Capabilities, and Innovation," Tinbergen Institute Discussion Papers 17-115/VII, Tinbergen Institute, revised 03 Mar 2020.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03666-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.