Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Setting #5

Open
z-jiaming opened this issue Aug 8, 2023 · 1 comment
Open

Training Setting #5

z-jiaming opened this issue Aug 8, 2023 · 1 comment

Comments

@z-jiaming
Copy link

Thanks for your work!

I want to ask about the training setting. Most previous works training on YouTube-VOS and DAVIS in the main-training after the image pre-training.

But your paper said "All the models are initialized from their best DAVIS’17 checkpoint (usually pre-trained on large-scale image and/or video collection) and fine-tuned on the training set of VOST, unless stated otherwise." and there is NO DAVIS dataset in Fig.5 about training datasets.

Do you remove the DAVIS dataset because you tested that it is better or just because you want to use the DAVIS validation set ( initialized from their best DAVIS’17 checkpoint)?

Thanks a lot for your answer!

@raghavgoyal14
Copy link

raghavgoyal14 commented Oct 19, 2023

Also, interested in the question. Going by the camera-ready version of the paper,

  • The best reported results on VOST using AOT in Table 2 is $J_{tr}$ 36.4 (presumably using Img + DAVIS)
  • But the pre-trained model provided in the repo is "pre-trained on static imges and YouTubeVOS" (link: https://github.com/TRI-ML/VOST/tree/main/aot_plus)
  • And lastly, in the Figure 5 (attached), DAVIS dataset isn't mentioned at all.
    • The interesting part is that the model pre-trained on Img (static) only is achieving better performance (close to $J_{tr}$ 38), compared to $J_{tr}$ 36.4

image

cc @davidhalladay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants