IDEAS home Printed from https://ideas.repec.org/a/gam/jagris/v15y2025i11p1173-d1667641.html
   My bibliography  Save this article

E-CLIP: An Enhanced CLIP-Based Visual Language Model for Fruit Detection and Recognition

Author

Listed:
  • Yi Zhang

    (College of Informatics, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China)

  • Yang Shao

    (College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China)

  • Chen Tang

    (College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China)

  • Zhenqing Liu

    (College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China)

  • Zhengda Li

    (College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China
    Wuhan X-Agriculture Intelligent Technology Co., Ltd., Wuhan 430070, China)

  • Ruifang Zhai

    (College of Informatics, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China)

  • Hui Peng

    (College of Informatics, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China)

  • Peng Song

    (College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizi Mountain Street, Hongshan District, Wuhan 430070, China
    Hubei Hongshan Laboratory, Wuhan 430070, China
    National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China)

Abstract

With the progress of agricultural modernization, intelligent fruit harvesting is gaining importance. While fruit detection and recognition are essential for robotic harvesting, existing methods suffer from limited generalizability, including adapting to complex environments and handling new fruit varieties. This problem stems from their reliance on unimodal visual data, which creates a semantic gap between image features and contextual understanding. To solve these issues, this study proposes a multi-modal fruit detection and recognition framework based on visual language models (VLMs). By integrating multi-modal information, the proposed model enhances robustness and generalization across diverse environmental conditions and fruit types. The framework accepts natural language instructions as input, facilitating effective human–machine interaction. Through its core module, Enhanced Contrastive Language–Image Pre-Training (E-CLIP), which employs image–image and image–text contrastive learning mechanisms, the framework achieves robust recognition of various fruit types and their maturity levels. Experimental results demonstrate the excellent performance of the model, achieving an F1 score of 0.752, and an mAP@0.5 of 0.791. The model also exhibits robustness under occlusion and varying illumination conditions, attaining a zero-shot mAP@0.5 of 0.626 for unseen fruits. In addition, the system operates at an inference speed of 54.82 FPS, effectively balancing speed and accuracy, and shows practical potential for smart agriculture. This research provides new insights and methods for the practical application of smart agriculture.

Suggested Citation

  • Yi Zhang & Yang Shao & Chen Tang & Zhenqing Liu & Zhengda Li & Ruifang Zhai & Hui Peng & Peng Song, 2025. "E-CLIP: An Enhanced CLIP-Based Visual Language Model for Fruit Detection and Recognition," Agriculture, MDPI, vol. 15(11), pages 1-32, May.
  • Handle: RePEc:gam:jagris:v:15:y:2025:i:11:p:1173-:d:1667641
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2077-0472/15/11/1173/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2077-0472/15/11/1173/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Peng Wang & Tong Niu & Dongjian He, 2021. "Tomato Young Fruits Detection Method under Near Color Background Based on Improved Faster R-CNN with Attention Mechanism," Agriculture, MDPI, vol. 11(11), pages 1-13, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Meftah Salem M. Alfatni & Siti Khairunniza-Bejo & Mohammad Hamiruce B. Marhaban & Osama M. Ben Saaed & Aouache Mustapha & Abdul Rashid Mohamed Shariff, 2022. "Towards a Real-Time Oil Palm Fruit Maturity System Using Supervised Classifiers Based on Feature Analysis," Agriculture, MDPI, vol. 12(9), pages 1-28, September.
    2. Shuo Dai & Tao Bai & Yunjie Zhao, 2025. "Keypoint Detection and 3D Localization Method for Ridge-Cultivated Strawberry Harvesting Robots," Agriculture, MDPI, vol. 15(4), pages 1-20, February.
    3. Yu Zhou & Zhenye Li & Sheng Xue & Min Wu & Tingting Zhu & Chao Ni, 2025. "Lightweight SCD-YOLOv5s: The Detection of Small Defects on Passion Fruit with Improved YOLOv5s," Agriculture, MDPI, vol. 15(10), pages 1-26, May.
    4. Jingmin Shi & Fanhuai Shi & Xixia Huang, 2023. "Prediction of Maturity Date of Leafy Greens Based on Causal Inference and Convolutional Neural Network," Agriculture, MDPI, vol. 13(2), pages 1-16, February.

    More about this item

    Keywords

    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jagris:v:15:y:2025:i:11:p:1173-:d:1667641. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.