Could you share the specific config for training INT8 version of GPT-2? #2

zenghao-zh · 2024-08-26T01:05:31Z

Thank you for your excellent work! I've been trying to reproduce your results using GPT-2-base, following the methods outlined in your paper. However, during training, I encountered NaN loss values, and I also noticed that the INT8 model converged more slowly compared to the float model. Could you please share the specific training configuration you used for the INT8 version of GPT-2? Your assistance would be greatly appreciated.

Thank you in advance!

lightb0x · 2024-12-27T08:14:06Z

I could get non-NaN results on pretraining GPT-2 124M but with much higher loss than authors reported.
Specifically, I got ~5.17 with INT8 Jetfire vs. ~2.84 of nanoGPT bf16.
I used same learning rate of 6e-4, as provided in nanoGPT for GPT-2 124M.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you share the specific config for training INT8 version of GPT-2? #2

Could you share the specific config for training INT8 version of GPT-2? #2

zenghao-zh commented Aug 26, 2024

lightb0x commented Dec 27, 2024

Could you share the specific config for training INT8 version of GPT-2? #2

Could you share the specific config for training INT8 version of GPT-2? #2

Comments

zenghao-zh commented Aug 26, 2024

lightb0x commented Dec 27, 2024