Preprocessing images #5

arunikayadav42 · 2020-05-20T09:56:16Z

Hi @peri044 I wanted to train the STT network with my data. I want to preprocess the images. Can you please point me out to the necessary python script that can help me do the same?

peri044 · 2020-05-20T10:14:57Z

@arunikayadav42 Check out the extract_image_features.py script in the STT repo. It has the necessary calls for preprocessing input images. The backbone specific preprocessor implementations are in the preprocessing directory

arunikayadav42 · 2020-06-10T13:50:19Z

@peri044 I had a doubt regarding generating the parapharses. In the README file you have mentioned that we create train_enc.txt and train_dec.txt using the captions_train2014.json file. Then how are those captions mapped to the corresponding image in the train.npy features from the SCAN repository.

peri044 · 2020-06-11T08:56:28Z

@arunikayadav42 I don't remember the exact data structure details of SCAN data as it has been a while. The way I create paraphrases (train_enc.txt and train_dec.txt) is here . The gist of the process is each image (with an image ID) has 5 captions and you have 20 combinations of sentences tied to the image ID. Using the same image ID, you can extract the SCAN features (downloaded from their repository) for the corresponding image which can be tied to combinations of captions.

arunikayadav42 · 2020-06-11T10:09:32Z

@peri044 so my only questions is that when we have the 20 combinations and we go on to store them into the tf record files then for each of these combinations we need to have the corresponding image feature and all of them get store to the tfrecord file. Isn't it?

For instance if the image id is coco_train_1 , then the feature from scan data for this image id will be clubbed with each of the 20 combinations for the captions of this image , right?

So at this line https://github.com/peri044/STT/blob/master/data/coco_data_loader.py#L105 . should it not be (img_idx * 20, img_idx * 20 + 20) instead of (img_idx * 5, img_idx * 5 + 5)

peri044 · 2020-06-11T22:05:31Z

Yes. The image feature (for the image id) is replicated for each of the 20 combinations of the captions.

Probably, the data loader script you linked is not the one I used during my experiments. Currently the data loader scripts are all over the place in data folder. I don't remember the exact ones I used due to quick experimentation. You can probably refer to https://github.com/peri044/STT/blob/master/data/coco_extras/coco_feat_stt.py#L50 which writes an image feature for every sentence combination in a tfrecord.
All the modules for data loader/TF record generation are in the data directory. They aren't well organized on a model basis (eg: stt, stt-att, scan etc). However, all the components that are used in the experiments of the paper can be found (scattered) in the data directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing images #5

Preprocessing images #5

arunikayadav42 commented May 20, 2020

peri044 commented May 20, 2020

arunikayadav42 commented Jun 10, 2020

peri044 commented Jun 11, 2020

arunikayadav42 commented Jun 11, 2020 •

edited

Loading

peri044 commented Jun 11, 2020

Preprocessing images #5

Preprocessing images #5

Comments

arunikayadav42 commented May 20, 2020

peri044 commented May 20, 2020

arunikayadav42 commented Jun 10, 2020

peri044 commented Jun 11, 2020

arunikayadav42 commented Jun 11, 2020 • edited Loading

peri044 commented Jun 11, 2020

arunikayadav42 commented Jun 11, 2020 •

edited

Loading