IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v100y2014i3d10.1007_s11192-014-1328-1.html
   My bibliography  Save this article

Empirical study of constructing a knowledge organization system of patent documents using topic modeling

Author

Listed:
  • Zhengyin Hu

    (Chengdu Document and Information Center, Chinese Academy of Sciences
    University of Chinese Academy of Sciences)

  • Shu Fang

    (Chengdu Document and Information Center, Chinese Academy of Sciences)

  • Tian Liang

    (Chengdu Document and Information Center, Chinese Academy of Sciences)

Abstract

A knowledge organization system (KOS) can help easily indicate the deep knowledge structure of a patent document set. Compared to classification code systems, a personalized KOS made up of topics can represent the technology information in a more agile, detailed manner. This paper presents an approach to automatically construct a KOS of patent documents based on term clumping, Latent Dirichlet Allocation (LDA) model, K-Means clustering and Principal Components Analysis (PCA). Term clumping is adopted to generate a better bag-of-words for topic modeling and LDA model is applied to generate raw topics. Then by iteratively using K-Means clustering and PCA on the document set and topics matrix, we generated new upper topics and computed the relationships between topics to construct a KOS. Finally, documents are mapped to the KOS. The nodes of the KOS are topics which are represented by terms and their weights and the leaves are patent documents. We evaluated the approach with a set of Large Aperture Optical Elements (LAOE) patent documents as an empirical study and constructed the LAOE KOS. The method used discovered the deep semantic relationships between the topics and helped better describe the technology themes of LAOE. Based on the KOS, two types of applications were implemented: the automatic classification of patents documents and the categorical refinements above search results.

Suggested Citation

  • Zhengyin Hu & Shu Fang & Tian Liang, 2014. "Empirical study of constructing a knowledge organization system of patent documents using topic modeling," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(3), pages 787-799, September.
  • Handle: RePEc:spr:scient:v:100:y:2014:i:3:d:10.1007_s11192-014-1328-1
    DOI: 10.1007/s11192-014-1328-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-014-1328-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-014-1328-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Suominen, Arho & Toivanen, Hannes & Seppänen, Marko, 2017. "Firms' knowledge profiles: Mapping patent data with unsupervised learning," Technological Forecasting and Social Change, Elsevier, vol. 115(C), pages 131-142.
    2. Carlos Vílchez-Román & Arístides Vara-Horna, 2021. "Usage, content and citation in open access publication: any interaction effects?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9457-9476, December.
    3. Mejía, Cristian & Kajikawa, Yuya, 2019. "Technology news and their linkage to production of knowledge in robotics research," Technological Forecasting and Social Change, Elsevier, vol. 143(C), pages 114-124.
    4. Qingqiang Wu & Yichen Kuang & Qingqi Hong & Yingying She, 2019. "Frontier knowledge discovery and visualization in cancer field based on KOS and LDA," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(3), pages 979-1010, March.
    5. Na Kyeong Lee & Yukyeong Han & Wei Xong & Min Song, 2020. "Two layer-based trajectory analysis of the research trend in automotive fuel industry," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1701-1719, September.
    6. Takano, Yasutomo & Mejia, Cristian & Kajikawa, Yuya, 2016. "Unconnected component inclusion technique for patent network analysis: Case study of Internet of Things-related technologies," Journal of Informetrics, Elsevier, vol. 10(4), pages 967-980.
    7. Righi, Riccardo & Samoili, Sofia & López Cobo, Montserrat & Vázquez-Prada Baillet, Miguel & Cardona, Melisande & De Prato, Giuditta, 2020. "The AI techno-economic complex System: Worldwide landscape, thematic subdomains and technological collaborations," Telecommunications Policy, Elsevier, vol. 44(6).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:100:y:2014:i:3:d:10.1007_s11192-014-1328-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.