IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v8y2020i9p1606-d415345.html
   My bibliography  Save this article

Towards Mapping Images to Text Using Deep-Learning Architectures

Author

Listed:
  • Daniela Onita

    (Department of Computer Science, University of Bucharest 90, Panduri Street, Sector 5, 050663 Bucharest, Romania
    Department of Computer Science and Engineering, “1 Decembrie 1918” University of Alba Iulia 5, Gabriel Bethlen, 515900 Alba Iulia, Romania)

  • Adriana Birlutiu

    (Department of Computer Science and Engineering, “1 Decembrie 1918” University of Alba Iulia 5, Gabriel Bethlen, 515900 Alba Iulia, Romania)

  • Liviu P. Dinu

    (Department of Computer Science, University of Bucharest 90, Panduri Street, Sector 5, 050663 Bucharest, Romania)

Abstract

Images and text represent types of content that are used together for conveying a message. The process of mapping images to text can provide very useful information and can be included in many applications from the medical domain, applications for blind people, social networking, etc. In this paper, we investigate an approach for mapping images to text using a Kernel Ridge Regression model. We considered two types of features: simple RGB pixel-value features and image features extracted with deep-learning approaches. We investigated several neural network architectures for image feature extraction: VGG16, Inception V3, ResNet50, Xception. The experimental evaluation was performed on three data sets from different domains. The texts associated with images represent objective descriptions for two of the three data sets and subjective descriptions for the other data set. The experimental results show that the more complex deep-learning approaches that were used for feature extraction perform better than simple RGB pixel-value approaches. Moreover, the ResNet50 network architecture performs best in comparison to the other three deep network architectures considered for extracting image features. The model error obtained using the ResNet50 network is less by approx. 0.30 than other neural network architectures. We extracted natural language descriptors of images and we made a comparison between original and generated descriptive words. Furthermore, we investigated if there is a difference in performance between the type of text associated with the images: subjective or objective. The proposed model generated more similar descriptions to the original ones for the data set containing objective descriptions whose vocabulary is simpler, bigger and clearer.

Suggested Citation

  • Daniela Onita & Adriana Birlutiu & Liviu P. Dinu, 2020. "Towards Mapping Images to Text Using Deep-Learning Architectures," Mathematics, MDPI, vol. 8(9), pages 1-18, September.
  • Handle: RePEc:gam:jmathe:v:8:y:2020:i:9:p:1606-:d:415345
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/8/9/1606/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/8/9/1606/
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Antoinette Deborah Martin & Ezat Ahmadzadeh & Inkyu Moon, 2022. "Privacy-Preserving Image Captioning with Deep Learning and Double Random Phase Encoding," Mathematics, MDPI, vol. 10(16), pages 1-14, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:8:y:2020:i:9:p:1606-:d:415345. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.