-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing images #5
Comments
@arunikayadav42 Check out the extract_image_features.py script in the STT repo. It has the necessary calls for preprocessing input images. The backbone specific preprocessor implementations are in the preprocessing directory |
@peri044 I had a doubt regarding generating the parapharses. In the README file you have mentioned that we create train_enc.txt and train_dec.txt using the captions_train2014.json file. Then how are those captions mapped to the corresponding image in the train.npy features from the SCAN repository. |
@arunikayadav42 I don't remember the exact data structure details of SCAN data as it has been a while. The way I create paraphrases (train_enc.txt and train_dec.txt) is here . The gist of the process is each image (with an image ID) has 5 captions and you have 20 combinations of sentences tied to the image ID. Using the same image ID, you can extract the SCAN features (downloaded from their repository) for the corresponding image which can be tied to combinations of captions. |
@peri044 so my only questions is that when we have the 20 combinations and we go on to store them into the tf record files then for each of these combinations we need to have the corresponding image feature and all of them get store to the tfrecord file. Isn't it? For instance if the image id is coco_train_1 , then the feature from scan data for this image id will be clubbed with each of the 20 combinations for the captions of this image , right? So at this line https://github.com/peri044/STT/blob/master/data/coco_data_loader.py#L105 . should it not be (img_idx * 20, img_idx * 20 + 20) instead of (img_idx * 5, img_idx * 5 + 5) |
Yes. The image feature (for the image id) is replicated for each of the 20 combinations of the captions. Probably, the data loader script you linked is not the one I used during my experiments. Currently the data loader scripts are all over the place in |
Hi @peri044 I wanted to train the STT network with my data. I want to preprocess the images. Can you please point me out to the necessary python script that can help me do the same?
The text was updated successfully, but these errors were encountered: