Image Caption Generator using Siamese Graph Convolutional Networks and LSTM
Source
ACM International Conference Proceeding Series
Date Issued
2022-01-08
Author(s)
Kumar, Athul
Agrawal, Aarchi
Ashin Shanly, K. S.
Das, Sudip
Harilal, Nidhin
Abstract
Image captions are those crisp descriptions that you see under images. They generally provide the viewer with a brief idea about the image context. To generate an accurate description of the scene, the model requires a semantic and spatial understanding of the contents in the scene. This paper proposes a novel approach using Siamese Graph Convolutional Network (S-GCN), making use of a non-parametric Kernel Activation function (KAF) followed by an LSTM with attention to generate natural language captions for the input image. Siamese-GCN captures deep semantic relations and makes the model more robust to class imbalances. We use an extended kernel activation function and regularize with standard lp-norm techniques, improving the overall model performance by a significant margin. The model is trained and tested on the Flickr30K data set and evaluated on BLEU-4 scores.
