IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0323966.html
   My bibliography  Save this article

Optimizing document management and retrieval with multimodal transformers and knowledge graphs

Author

Listed:
  • Yali Chen
  • Bin Hu
  • Yajuan Liu

Abstract

In the digital age, multimodal archival data is experiencing explosive growth, and how to efficiently and accurately retrieve information from it has become a key challenge. Traditional retrieval methods struggle to effectively handle multi-source heterogeneous multimodal data, leading to poor retrieval accuracy and efficiency. To address this issue, this paper proposes the MDKG-RL model, which organically integrates knowledge graph reasoning, deep reinforcement learning dynamic optimization, and multimodal Transformer architecture to achieve deep semantic understanding of multimodal data and intelligent optimization of retrieval strategies. The experiments, based on the ICDAR 2023 and AIDA Corpus datasets, show that MDKG-RL achieves a mean reciprocal rank (MRR) of 0.85, a normalized discounted cumulative gain (NDCG) of 0.88, and an entity linking accuracy of 92.4%. Compared to the baseline model, MRR improves by 13.3%, NDCG increases by 12.8%, and response time is reduced by 38.2%, significantly outperforming other comparison models. Ablation experiments also confirm the indispensability of each module. Visual analysis further demonstrates the model’s clear advantages in retrieval accuracy and efficiency, though error analysis reveals its shortcomings in handling long-tail entities and cross-modal ambiguity. The MDKG-RL model provides an innovative and effective solution for multimodal archival retrieval, not only improving retrieval performance but also laying the foundation for future research. In the future, model performance and generalization capabilities can be further enhanced by expanding data, optimizing strategies, and extending application scenarios, thereby promoting the development and application of multimodal retrieval technology in the fields of information management and knowledge discovery.

Suggested Citation

  • Yali Chen & Bin Hu & Yajuan Liu, 2025. "Optimizing document management and retrieval with multimodal transformers and knowledge graphs," PLOS ONE, Public Library of Science, vol. 20(6), pages 1-27, June.
  • Handle: RePEc:plo:pone00:0323966
    DOI: 10.1371/journal.pone.0323966
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0323966
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0323966&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0323966?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0323966. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.