GlosysIC Framework

This repository is our implementation of our MIKE 2019 paper:
GlosysIC Framework: Transformer for Image Captioning with Sequential Attention MIKE 2019

Architecture

Abstract

Over the past decade, the field of Image captioning has witnessed a lot of intensive research interests. This paper proposes “GlosysIC Framework: Transformer for Image Captioning with Sequential Attention” to build a novel framework that harnesses the combination of Convolutional Neural Network (CNN) to encode image and transformer to generate sentences. Compared to the existing image captioning approaches, GlosysIC framework serializes the Multi head attention modules with the image representations. Furthermore, we present GlosysIC architectural framework encompassing multiple CNN architectures and attention based transformer for generating effective descriptions of images. The proposed system was exhaustively trained on the benchmark MSCOCO image captioning dataset using RTX 2060 GPU and V100 GPU from Google Cloud Platform in terms of PyTorch Deep Learning library. Experimental results illustrate that GlosysIC significantly outperforms the previous state-of-the-art models.

Sequential Attention

Requirements

torch>=1.2.0
torchvision>=0.3.0

Steps to run

Download the datasets (Preprocessed COCO dataset) for training from here and place them in the $(Root)/datasets directory. (The dataset given in the link is just a subset of the entire dataset. If you require the entire data, drop us a message and we'll provide them to you)
(Optional) Edit the training parameters from base_model.py.
To start the training process, run:

python train.py

To generate caption on custom test image, run:

python caption.py --image "image.jpg"

Results

Ground Truth Captions Generated Captions

Evaluation

Model	BLEU-1	BLEU-2	BLEU-3	BLEU-4	CIDEr	ROUGE-L	METEOR
GlosysIC	72.5	53.4	38.7	28.15	94.0	54.0	25.8

Our model's scores on various evaluation metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GlosysIC Framework

Architecture

Abstract

Sequential Attention

Requirements

Steps to run

Results

Evaluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

GlosysIC Framework

Architecture

Abstract

Sequential Attention

Requirements

Steps to run

Results

Evaluation