LanguageCrossEntropy logs nan when bash pruning.sh #34

YanxiZSQ · 2023-12-06T11:01:16Z

I have a issue, I used the two data sets you provided: [book,github]
the mds_sample_redpajama is this:

and I fix the pruning.sh

then I train the model , This still happens:

xiamengzhou · 2023-12-19T15:03:58Z

It's weird to me why it happens.. Have you tried the original set up with 7b domains? Does it cause problems? Meanwhile I will try out the 2 domain set up once I get some compute ready.

PengWenChen · 2023-12-21T05:48:05Z

Hi @xiamengzhou,
I also encounter this issue with the original dynamic loading setup in pruning.sh.
set_names=[cc,github,book,stackexchange,wiki,arxiv,c4]
proportion=[0.67,0.045,0.045,0.02,0.045,0.025,0.15]

And NaN happens in the first batch when calculating metric/train/stackexchange_LanguageCrossEntropy.

The environment I use is the same as yours except that flash attn is 2.3.6.
The sample data for pruning is 0.1B.

xiamengzhou · 2023-12-21T16:00:41Z

Could you try the processed data I have here: https://drive.google.com/drive/folders/1WPIRx2NGkNBDswqZZh-hwI1h-QiKVCuN
And see if the same issue occurs again?
@PengWenChen @YanxiZSQ

PengWenChen · 2023-12-22T02:36:59Z

Hi @xiamengzhou! Thanks for your reply.
However, I can not access google drive where I am working :(
Could you please upload the processed data to this repository?
It would really help a lot!

PengWenChen · 2024-01-04T03:24:18Z

Hi, @xiamengzhou!
The proportion updating fails because of NaN loss on evaluation data. And it is because of the missing data of some subdatasets.
I solved this issue by increasing the number of evaluation sequence to 3500!

However, during normal training (update L_prune), the nan still happens due to the same reason (missing data of some subdatasets), but L_prune can still be updated.
I would like to confirm the correctness of this part!
Is this normal to get nan in train/metric/xx_LanguageCrossEntropy ?
Thank you.

xiamengzhou · 2024-01-10T02:42:33Z

Hi! It's normal to get nan for some batches when the sampled batch does not contain data for a specific domain, usually because the sampling ratio for that domain is low.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LanguageCrossEntropy logs nan when bash pruning.sh #34

LanguageCrossEntropy logs nan when bash pruning.sh #34

YanxiZSQ commented Dec 6, 2023

xiamengzhou commented Dec 19, 2023 •

edited

Loading

PengWenChen commented Dec 21, 2023 •

edited

Loading

xiamengzhou commented Dec 21, 2023

PengWenChen commented Dec 22, 2023

PengWenChen commented Jan 4, 2024 •

edited

Loading

xiamengzhou commented Jan 10, 2024

LanguageCrossEntropy logs nan when bash pruning.sh #34

LanguageCrossEntropy logs nan when bash pruning.sh #34

Comments

YanxiZSQ commented Dec 6, 2023

xiamengzhou commented Dec 19, 2023 • edited Loading

PengWenChen commented Dec 21, 2023 • edited Loading

xiamengzhou commented Dec 21, 2023

PengWenChen commented Dec 22, 2023

PengWenChen commented Jan 4, 2024 • edited Loading

xiamengzhou commented Jan 10, 2024

xiamengzhou commented Dec 19, 2023 •

edited

Loading

PengWenChen commented Dec 21, 2023 •

edited

Loading

PengWenChen commented Jan 4, 2024 •

edited

Loading