Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing images #5

Open
arunikayadav42 opened this issue May 20, 2020 · 5 comments
Open

Preprocessing images #5

arunikayadav42 opened this issue May 20, 2020 · 5 comments

Comments

@arunikayadav42
Copy link

Hi @peri044 I wanted to train the STT network with my data. I want to preprocess the images. Can you please point me out to the necessary python script that can help me do the same?

@peri044
Copy link
Owner

peri044 commented May 20, 2020

@arunikayadav42 Check out the extract_image_features.py script in the STT repo. It has the necessary calls for preprocessing input images. The backbone specific preprocessor implementations are in the preprocessing directory

@arunikayadav42
Copy link
Author

@peri044 I had a doubt regarding generating the parapharses. In the README file you have mentioned that we create train_enc.txt and train_dec.txt using the captions_train2014.json file. Then how are those captions mapped to the corresponding image in the train.npy features from the SCAN repository.

@peri044
Copy link
Owner

peri044 commented Jun 11, 2020

@arunikayadav42 I don't remember the exact data structure details of SCAN data as it has been a while. The way I create paraphrases (train_enc.txt and train_dec.txt) is here . The gist of the process is each image (with an image ID) has 5 captions and you have 20 combinations of sentences tied to the image ID. Using the same image ID, you can extract the SCAN features (downloaded from their repository) for the corresponding image which can be tied to combinations of captions.

@arunikayadav42
Copy link
Author

arunikayadav42 commented Jun 11, 2020

@peri044 so my only questions is that when we have the 20 combinations and we go on to store them into the tf record files then for each of these combinations we need to have the corresponding image feature and all of them get store to the tfrecord file. Isn't it?

For instance if the image id is coco_train_1 , then the feature from scan data for this image id will be clubbed with each of the 20 combinations for the captions of this image , right?

So at this line https://github.com/peri044/STT/blob/master/data/coco_data_loader.py#L105 . should it not be (img_idx * 20, img_idx * 20 + 20) instead of (img_idx * 5, img_idx * 5 + 5)

@peri044
Copy link
Owner

peri044 commented Jun 11, 2020

Yes. The image feature (for the image id) is replicated for each of the 20 combinations of the captions.

Probably, the data loader script you linked is not the one I used during my experiments. Currently the data loader scripts are all over the place in data folder. I don't remember the exact ones I used due to quick experimentation. You can probably refer to https://github.com/peri044/STT/blob/master/data/coco_extras/coco_feat_stt.py#L50 which writes an image feature for every sentence combination in a tfrecord.
All the modules for data loader/TF record generation are in the data directory. They aren't well organized on a model basis (eg: stt, stt-att, scan etc). However, all the components that are used in the experiments of the paper can be found (scattered) in the data directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants