IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-61040-5.html
   My bibliography  Save this article

Efficient GPT-4V level multimodal large language model for deployment on edge devices

Author

Listed:
  • Yuan Yao

    (Tsinghua University
    Shanghai Qi Zhi Institute
    National University of Singapore)

  • Tianyu Yu

    (Tsinghua University)

  • Ao Zhang

    (National University of Singapore)

  • Chongyi Wang

    (ModelBest Inc.)

  • Junbo Cui

    (ModelBest Inc.)

  • Hongji Zhu

    (ModelBest Inc.)

  • Tianchi Cai

    (ModelBest Inc.)

  • Chi Chen

    (Tsinghua University)

  • Haoyu Li

    (Tsinghua University)

  • Weilin Zhao

    (Tsinghua University)

  • Zhihui He

    (Tsinghua University)

  • Qianyu Chen

    (The Chinese University of Hong Kong)

  • Ronghua Zhou

    (ModelBest Inc.)

  • Zhensheng Zou

    (ModelBest Inc.)

  • Haoye Zhang

    (Tsinghua University)

  • Shengding Hu

    (Tsinghua University)

  • Zhi Zheng

    (ModelBest Inc.)

  • Jie Zhou

    (ModelBest Inc.)

  • Jie Cai

    (ModelBest Inc.)

  • Xu Han

    (Tsinghua University)

  • Guoyang Zeng

    (ModelBest Inc.)

  • Dahai Li

    (ModelBest Inc.)

  • Zhiyuan Liu

    (Tsinghua University)

  • Maosong Sun

    (Tsinghua University)

Abstract

Multimodal large language models have revolutionized AI research and industry, paving the way toward the next milestone. However, their large sizes and high computational costs restrict deployment to cloud servers, limiting use in mobile, offline, energy-sensitive, or privacy-critical scenarios. We present MiniCPM-V, efficient models for edge devices that integrate advancements in architecture, training, and data. The 8B model outperforms GPT-4V, Gemini Pro, and Claude 3 across 11 public benchmarks, processes high-resolution images at any aspect ratio, achieves robust optical character recognition, exhibits low hallucination rates, and supports over 30 languages while running efficiently on mobile phones. This progress reflects a broader trend: The sizes for high-performing models are rapidly decreasing alongside growing edge computation capacity, enabling advanced multimodal models to operate locally on consumer hardware. Such developments unlock applications across diverse real-world scenarios, from enhanced mobile AI to privacy-preserving solutions, marking a critical step toward democratizing powerful multimodal intelligence.

Suggested Citation

  • Yuan Yao & Tianyu Yu & Ao Zhang & Chongyi Wang & Junbo Cui & Hongji Zhu & Tianchi Cai & Chi Chen & Haoyu Li & Weilin Zhao & Zhihui He & Qianyu Chen & Ronghua Zhou & Zhensheng Zou & Haoye Zhang & Sheng, 2025. "Efficient GPT-4V level multimodal large language model for deployment on edge devices," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61040-5
    DOI: 10.1038/s41467-025-61040-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-61040-5
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-61040-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61040-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.