Image Captioning: Similarity-Based Image Retrieval
In recent times, the task of generating descriptive sentences automatically from images has fetched an interest in natural language processing and computer vision research. Image captioning is a rudimentary task which requires semantic understanding of images and the ability of generating description sentences with precise and accurate structure. In this study, the authors propose an algorithm that uses Xception (extreme Inception) architecture for image feature extraction and a Long-Short Term Memory (LSTM) to accurately structure meaningful sentences using the generated keywords. The authors evaluate the algorithm on the public benchmark: Flickr 8K. After performing experiments on the Flickr 8K dataset, it was found that the proposed system generates meaningful and accurate captions in a majority of cases and to reduce the effects of overfitting, hyperparameter tuning using dropout and number of LSTM layers were also applied. Thereafter, these generated captions were stored in a database and then a similarity based image retrieval system was developed. This system aims at retrieving images from the database that are similar to the query caption using cosine similarity function.
Keywords - Image Captioning, Xception, Convolutional Neural Network (CNN), Long-Short Term Memory (LSTM), Deep Learning, Neural Networks.