Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you share the specific config for training INT8 version of GPT-2? #2

Open
zenghao-zh opened this issue Aug 26, 2024 · 1 comment

Comments

@zenghao-zh
Copy link

Thank you for your excellent work! I've been trying to reproduce your results using GPT-2-base, following the methods outlined in your paper. However, during training, I encountered NaN loss values, and I also noticed that the INT8 model converged more slowly compared to the float model. Could you please share the specific training configuration you used for the INT8 version of GPT-2? Your assistance would be greatly appreciated.

Thank you in advance!

image

@lightb0x
Copy link

I could get non-NaN results on pretraining GPT-2 124M but with much higher loss than authors reported.
Specifically, I got ~5.17 with INT8 Jetfire vs. ~2.84 of nanoGPT bf16.
I used same learning rate of 6e-4, as provided in nanoGPT for GPT-2 124M.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants