Author
Listed:
- Song, Zhenzhen
- Liu, Ziwei
- Li, Hongji
Abstract
Aiming at the challenges of cross-modal feature fusion, low computational efficiency in long patent text modeling, and insufficient hierarchical semantic coherence in patent text semantic mining, this study proposes a novel deep learning framework termed HGM-Net. The framework integrates Hierarchical Comparative Learning (HCL), a Multi-modal Graph Attention Network (M-GAT), and Multi-Granularity Sparse Attention (MSA) to achieve robust, efficient, and semantically consistent patent representation learning. Specifically, HCL introduces dynamic masking, contrastive learning, and cross-structural similarity constraints across word-, sentence-, and paragraph-level hierarchies, enabling the model to jointly capture fine-grained local semantics and high-level thematic consistency. Contrastive and cross-structural similarity constraints are particularly enforced at the word and paragraph levels, effectively enhancing semantic discrimination and global coherence within complex patent documents. Furthermore, M-GAT models patent classification codes, citation relationships, and textual semantics as heterogeneous graph structures, and employs cross-modal gated attention mechanisms to dynamically fuse multi-source and multi-modal features, thereby improving representation completeness and robustness. To address the high computational cost of long-text processing, MSA adopts a hierarchical sparse attention strategy that selectively allocates attention across multiple granularities, including words, phrases, sentences, and paragraphs, significantly reducing computational overhead while preserving critical semantic information. Extensive experimental evaluations on patent classification and similarity matching tasks demonstrate that HGM-Net consistently outperforms existing state-of-the-art deep learning approaches. The results validate the effectiveness and generalization capability of the proposed framework, highlighting its theoretical innovation and practical value in improving patent examination efficiency and enabling large-scale technology relevance mining.
Suggested Citation
Song, Zhenzhen & Liu, Ziwei & Li, Hongji, 2026.
"Research on Feature Fusion and Multimodal Patent Text Based on Graph Attention Network,"
Journal of Computer, Signal, and System Research, George Brown Press, vol. 3(1), pages 93-100.
Handle:
RePEc:dbb:jcssra:v:3:y:2026:i:1:p:93-100
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:dbb:jcssra:v:3:y:2026:i:1:p:93-100. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Guangyi Li (email available below). General contact details of provider: https://www.gbspress.com/index.php/JCSSR .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.