Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can all the Visual Tokenizer weights released work for the same Infinity-2B model? #15

Closed
EternalEvan opened this issue Dec 26, 2024 · 4 comments

Comments

@EternalEvan
Copy link

EternalEvan commented Dec 26, 2024

Thanks for your excellent work! I noticed that you have released many Visual Tokenizer weights with different codebook size. I wonder if all these tokenizer work well with the Infinity-2B weight? I have tried the recommended infinity_vae_d32reg.pth and it performers well. Thanks!

@JeyesHan
Copy link
Collaborator

JeyesHan commented Dec 27, 2024

@EternalEvan Thanks for your appreciation to Infinity. Infinity-2B weight is trained and therefore works with [infinity_vae_d32reg.pth]. Using other vae weights will generate bad images. If you want to try other vae weights, you can fine-tune Infinity-2B with them. The results will improve very quickly.

@EternalEvan
Copy link
Author

@EternalEvan Thanks for your appreciation to Infinity. Infinity-2B weight is trained and therefore works with [infinity_vae_d32reg.pth]. Using other vae weights will generate bad images. If you want to try other vae weights, you can fine-tune Infinity-2B with them. The results will improve very quickly.

ok, I will try tuning Infinity-2B with them. Thanks!

@RealAntonVoronov
Copy link

Can you explain, what does _reg mean in VAE_d32? I see in your table with metrics (and confirm by comparing reconstructions from both VAEs) that d32 without _reg works better. What is the reason behind choosing d32_reg for final model?

@JeyesHan
Copy link
Collaborator

@RealAntonVoronov We experimentally found that as the vocabulary size increases, VAE relies more on the last few scales. In the model with '_reg', we added some regularizations (to be more specific, adding reconstruction loss to the earlier scales). The '_reg' model shows a slight decrease in reconstruction metrics compared to that one without regularization. However, it reduces the dependence on the last few scales, which is beneficial for generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants