Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss goes nan after 14 epochs #15

Open
DianCh opened this issue Feb 7, 2022 · 1 comment
Open

Loss goes nan after 14 epochs #15

DianCh opened this issue Feb 7, 2022 · 1 comment

Comments

@DianCh
Copy link

DianCh commented Feb 7, 2022

Hi, thank you for releasing such a wonderful work. I tried to replicate the results using the following command:

python -m torch.distributed.launch --nproc_per_node 8 main_simmim.py --cfg configs/swin_base__100ep/simmim_pretrain__swin_base__img192_window6__100ep.yaml --data-path /mnt/fsx/datasets/imagenet/train --accumulation-steps 2

which gave me nan loss after 14 epochs:

[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 22): INFO >>>>>>>>>> Build Optimizer for Pre-training Stage
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 27): INFO No weight decay: {'encoder.mask_token', 'encoder.absolute_pos_embed'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 30): INFO No weight decay keywords: {'encoder.relative_position_bias_table'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 63): INFO No decay params: ['encoder.mask_token', 'encoder.patch_embed.proj.bias', 'encoder.patch_embed.norm.weight', 'encoder.patch_embed.norm.bias', 'encoder.layers.0.blocks.0.norm1.weight', 'encoder.layers.0.blocks.0.norm1.bias', 'encoder.layers.0.blocks.0.attn.qkv.bias', 'encoder.layers.0.blocks.0.attn.proj.bias', 'encoder.layers.0.blocks.0.norm2.weight', 'encoder.layers.0.blocks.0.norm2.bias', 'encoder.layers.0.blocks.0.mlp.fc1.bias', 'encoder.layers.0.blocks.0.mlp.fc2.bias', 'encoder.layers.0.blocks.1.norm1.weight', 'encoder.layers.0.blocks.1.norm1.bias', 'encoder.layers.0.blocks.1.attn.qkv.bias', 'encoder.layers.0.blocks.1.attn.proj.bias', 'encoder.layers.0.blocks.1.norm2.weight', 'encoder.layers.0.blocks.1.norm2.bias', 'encoder.layers.0.blocks.1.mlp.fc1.bias', 'encoder.layers.0.blocks.1.mlp.fc2.bias', 'encoder.layers.0.downsample.norm.weight', 'encoder.layers.0.downsample.norm.bias', 'encoder.layers.1.blocks.0.norm1.weight', 'encoder.layers.1.blocks.0.norm1.bias', 'encoder.layers.1.blocks.0.attn.qkv.bias', 'encoder.layers.1.blocks.0.attn.proj.bias', 'encoder.layers.1.blocks.0.norm2.weight', 'encoder.layers.1.blocks.0.norm2.bias', 'encoder.layers.1.blocks.0.mlp.fc1.bias', 'encoder.layers.1.blocks.0.mlp.fc2.bias', 'encoder.layers.1.blocks.1.norm1.weight', 'encoder.layers.1.blocks.1.norm1.bias', 'encoder.layers.1.blocks.1.attn.qkv.bias', 'encoder.layers.1.blocks.1.attn.proj.bias', 'encoder.layers.1.blocks.1.norm2.weight', 'encoder.layers.1.blocks.1.norm2.bias', 'encoder.layers.1.blocks.1.mlp.fc1.bias', 'encoder.layers.1.blocks.1.mlp.fc2.bias', 'encoder.layers.1.downsample.norm.weight', 'encoder.layers.1.downsample.norm.bias', 'encoder.layers.2.blocks.0.norm1.weight', 'encoder.layers.2.blocks.0.norm1.bias', 'encoder.layers.2.blocks.0.attn.qkv.bias', 'encoder.layers.2.blocks.0.attn.proj.bias', 'encoder.layers.2.blocks.0.norm2.weight', 'encoder.layers.2.blocks.0.norm2.bias', 'encoder.layers.2.blocks.0.mlp.fc1.bias', 'encoder.layers.2.blocks.0.mlp.fc2.bias', 'encoder.layers.2.blocks.1.norm1.weight', 'encoder.layers.2.blocks.1.norm1.bias', 'encoder.layers.2.blocks.1.attn.qkv.bias', 'encoder.layers.2.blocks.1.attn.proj.bias', 'encoder.layers.2.blocks.1.norm2.weight', 'encoder.layers.2.blocks.1.norm2.bias', 'encoder.layers.2.blocks.1.mlp.fc1.bias', 'encoder.layers.2.blocks.1.mlp.fc2.bias', 'encoder.layers.2.blocks.2.norm1.weight', 'encoder.layers.2.blocks.2.norm1.bias', 'encoder.layers.2.blocks.2.attn.qkv.bias', 'encoder.layers.2.blocks.2.attn.proj.bias', 'encoder.layers.2.blocks.2.norm2.weight', 'encoder.layers.2.blocks.2.norm2.bias', 'encoder.layers.2.blocks.2.mlp.fc1.bias', 'encoder.layers.2.blocks.2.mlp.fc2.bias', 'encoder.layers.2.blocks.3.norm1.weight', 'encoder.layers.2.blocks.3.norm1.bias', 'encoder.layers.2.blocks.3.attn.qkv.bias', 'encoder.layers.2.blocks.3.attn.proj.bias', 'encoder.layers.2.blocks.3.norm2.weight', 'encoder.layers.2.blocks.3.norm2.bias', 'encoder.layers.2.blocks.3.mlp.fc1.bias', 'encoder.layers.2.blocks.3.mlp.fc2.bias', 'encoder.layers.2.blocks.4.norm1.weight', 'encoder.layers.2.blocks.4.norm1.bias', 'encoder.layers.2.blocks.4.attn.qkv.bias', 'encoder.layers.2.blocks.4.attn.proj.bias', 'encoder.layers.2.blocks.4.norm2.weight', 'encoder.layers.2.blocks.4.norm2.bias', 'encoder.layers.2.blocks.4.mlp.fc1.bias', 'encoder.layers.2.blocks.4.mlp.fc2.bias', 'encoder.layers.2.blocks.5.norm1.weight', 'encoder.layers.2.blocks.5.norm1.bias', 'encoder.layers.2.blocks.5.attn.qkv.bias', 'encoder.layers.2.blocks.5.attn.proj.bias', 'encoder.layers.2.blocks.5.norm2.weight', 'encoder.layers.2.blocks.5.norm2.bias', 'encoder.layers.2.blocks.5.mlp.fc1.bias', 'encoder.layers.2.blocks.5.mlp.fc2.bias', 'encoder.layers.2.blocks.6.norm1.weight', 'encoder.layers.2.blocks.6.norm1.bias', 'encoder.layers.2.blocks.6.attn.qkv.bias', 'encoder.layers.2.blocks.6.attn.proj.bias', 'encoder.layers.2.blocks.6.norm2.weight', 'encoder.layers.2.blocks.6.norm2.bias', 'encoder.layers.2.blocks.6.mlp.fc1.bias', 'encoder.layers.2.blocks.6.mlp.fc2.bias', 'encoder.layers.2.blocks.7.norm1.weight', 'encoder.layers.2.blocks.7.norm1.bias', 'encoder.layers.2.blocks.7.attn.qkv.bias', 'encoder.layers.2.blocks.7.attn.proj.bias', 'encoder.layers.2.blocks.7.norm2.weight', 'encoder.layers.2.blocks.7.norm2.bias', 'encoder.layers.2.blocks.7.mlp.fc1.bias', 'encoder.layers.2.blocks.7.mlp.fc2.bias', 'encoder.layers.2.blocks.8.norm1.weight', 'encoder.layers.2.blocks.8.norm1.bias', 'encoder.layers.2.blocks.8.attn.qkv.bias', 'encoder.layers.2.blocks.8.attn.proj.bias', 'encoder.layers.2.blocks.8.norm2.weight', 'encoder.layers.2.blocks.8.norm2.bias', 'encoder.layers.2.blocks.8.mlp.fc1.bias', 'encoder.layers.2.blocks.8.mlp.fc2.bias', 'encoder.layers.2.blocks.9.norm1.weight', 'encoder.layers.2.blocks.9.norm1.bias', 'encoder.layers.2.blocks.9.attn.qkv.bias', 'encoder.layers.2.blocks.9.attn.proj.bias', 'encoder.layers.2.blocks.9.norm2.weight', 'encoder.layers.2.blocks.9.norm2.bias', 'encoder.layers.2.blocks.9.mlp.fc1.bias', 'encoder.layers.2.blocks.9.mlp.fc2.bias', 'encoder.layers.2.blocks.10.norm1.weight', 'encoder.layers.2.blocks.10.norm1.bias', 'encoder.layers.2.blocks.10.attn.qkv.bias', 'encoder.layers.2.blocks.10.attn.proj.bias', 'encoder.layers.2.blocks.10.norm2.weight', 'encoder.layers.2.blocks.10.norm2.bias', 'encoder.layers.2.blocks.10.mlp.fc1.bias', 'encoder.layers.2.blocks.10.mlp.fc2.bias', 'encoder.layers.2.blocks.11.norm1.weight', 'encoder.layers.2.blocks.11.norm1.bias', 'encoder.layers.2.blocks.11.attn.qkv.bias', 'encoder.layers.2.blocks.11.attn.proj.bias', 'encoder.layers.2.blocks.11.norm2.weight', 'encoder.layers.2.blocks.11.norm2.bias', 'encoder.layers.2.blocks.11.mlp.fc1.bias', 'encoder.layers.2.blocks.11.mlp.fc2.bias', 'encoder.layers.2.blocks.12.norm1.weight', 'encoder.layers.2.blocks.12.norm1.bias', 'encoder.layers.2.blocks.12.attn.qkv.bias', 'encoder.layers.2.blocks.12.attn.proj.bias', 'encoder.layers.2.blocks.12.norm2.weight', 'encoder.layers.2.blocks.12.norm2.bias', 'encoder.layers.2.blocks.12.mlp.fc1.bias', 'encoder.layers.2.blocks.12.mlp.fc2.bias', 'encoder.layers.2.blocks.13.norm1.weight', 'encoder.layers.2.blocks.13.norm1.bias', 'encoder.layers.2.blocks.13.attn.qkv.bias', 'encoder.layers.2.blocks.13.attn.proj.bias', 'encoder.layers.2.blocks.13.norm2.weight', 'encoder.layers.2.blocks.13.norm2.bias', 'encoder.layers.2.blocks.13.mlp.fc1.bias', 'encoder.layers.2.blocks.13.mlp.fc2.bias', 'encoder.layers.2.blocks.14.norm1.weight', 'encoder.layers.2.blocks.14.norm1.bias', 'encoder.layers.2.blocks.14.attn.qkv.bias', 'encoder.layers.2.blocks.14.attn.proj.bias', 'encoder.layers.2.blocks.14.norm2.weight', 'encoder.layers.2.blocks.14.norm2.bias', 'encoder.layers.2.blocks.14.mlp.fc1.bias', 'encoder.layers.2.blocks.14.mlp.fc2.bias', 'encoder.layers.2.blocks.15.norm1.weight', 'encoder.layers.2.blocks.15.norm1.bias', 'encoder.layers.2.blocks.15.attn.qkv.bias', 'encoder.layers.2.blocks.15.attn.proj.bias', 'encoder.layers.2.blocks.15.norm2.weight', 'encoder.layers.2.blocks.15.norm2.bias', 'encoder.layers.2.blocks.15.mlp.fc1.bias', 'encoder.layers.2.blocks.15.mlp.fc2.bias', 'encoder.layers.2.blocks.16.norm1.weight', 'encoder.layers.2.blocks.16.norm1.bias', 'encoder.layers.2.blocks.16.attn.qkv.bias', 'encoder.layers.2.blocks.16.attn.proj.bias', 'encoder.layers.2.blocks.16.norm2.weight', 'encoder.layers.2.blocks.16.norm2.bias', 'encoder.layers.2.blocks.16.mlp.fc1.bias', 'encoder.layers.2.blocks.16.mlp.fc2.bias', 'encoder.layers.2.blocks.17.norm1.weight', 'encoder.layers.2.blocks.17.norm1.bias', 'encoder.layers.2.blocks.17.attn.qkv.bias', 'encoder.layers.2.blocks.17.attn.proj.bias', 'encoder.layers.2.blocks.17.norm2.weight', 'encoder.layers.2.blocks.17.norm2.bias', 'encoder.layers.2.blocks.17.mlp.fc1.bias', 'encoder.layers.2.blocks.17.mlp.fc2.bias', 'encoder.layers.2.downsample.norm.weight', 'encoder.layers.2.downsample.norm.bias', 'encoder.layers.3.blocks.0.norm1.weight', 'encoder.layers.3.blocks.0.norm1.bias', 'encoder.layers.3.blocks.0.attn.qkv.bias', 'encoder.layers.3.blocks.0.attn.proj.bias', 'encoder.layers.3.blocks.0.norm2.weight', 'encoder.layers.3.blocks.0.norm2.bias', 'encoder.layers.3.blocks.0.mlp.fc1.bias', 'encoder.layers.3.blocks.0.mlp.fc2.bias', 'encoder.layers.3.blocks.1.norm1.weight', 'encoder.layers.3.blocks.1.norm1.bias', 'encoder.layers.3.blocks.1.attn.qkv.bias', 'encoder.layers.3.blocks.1.attn.proj.bias', 'encoder.layers.3.blocks.1.norm2.weight', 'encoder.layers.3.blocks.1.norm2.bias', 'encoder.layers.3.blocks.1.mlp.fc1.bias', 'encoder.layers.3.blocks.1.mlp.fc2.bias', 'encoder.norm.weight', 'encoder.norm.bias', 'decoder.0.bias']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 64): INFO Has decay params: ['encoder.patch_embed.proj.weight', 'encoder.layers.0.blocks.0.attn.relative_position_bias_table', 'encoder.layers.0.blocks.0.attn.qkv.weight', 'encoder.layers.0.blocks.0.attn.proj.weight', 'encoder.layers.0.blocks.0.mlp.fc1.weight', 'encoder.layers.0.blocks.0.mlp.fc2.weight', 'encoder.layers.0.blocks.1.attn.relative_position_bias_table', 'encoder.layers.0.blocks.1.attn.qkv.weight', 'encoder.layers.0.blocks.1.attn.proj.weight', 'encoder.layers.0.blocks.1.mlp.fc1.weight', 'encoder.layers.0.blocks.1.mlp.fc2.weight', 'encoder.layers.0.downsample.reduction.weight', 'encoder.layers.1.blocks.0.attn.relative_position_bias_table', 'encoder.layers.1.blocks.0.attn.qkv.weight', 'encoder.layers.1.blocks.0.attn.proj.weight', 'encoder.layers.1.blocks.0.mlp.fc1.weight', 'encoder.layers.1.blocks.0.mlp.fc2.weight', 'encoder.layers.1.blocks.1.attn.relative_position_bias_table', 'encoder.layers.1.blocks.1.attn.qkv.weight', 'encoder.layers.1.blocks.1.attn.proj.weight', 'encoder.layers.1.blocks.1.mlp.fc1.weight', 'encoder.layers.1.blocks.1.mlp.fc2.weight', 'encoder.layers.1.downsample.reduction.weight', 'encoder.layers.2.blocks.0.attn.relative_position_bias_table', 'encoder.layers.2.blocks.0.attn.qkv.weight', 'encoder.layers.2.blocks.0.attn.proj.weight', 'encoder.layers.2.blocks.0.mlp.fc1.weight', 'encoder.layers.2.blocks.0.mlp.fc2.weight', 'encoder.layers.2.blocks.1.attn.relative_position_bias_table', 'encoder.layers.2.blocks.1.attn.qkv.weight', 'encoder.layers.2.blocks.1.attn.proj.weight', 'encoder.layers.2.blocks.1.mlp.fc1.weight', 'encoder.layers.2.blocks.1.mlp.fc2.weight', 'encoder.layers.2.blocks.2.attn.relative_position_bias_table', 'encoder.layers.2.blocks.2.attn.qkv.weight', 'encoder.layers.2.blocks.2.attn.proj.weight', 'encoder.layers.2.blocks.2.mlp.fc1.weight', 'encoder.layers.2.blocks.2.mlp.fc2.weight', 'encoder.layers.2.blocks.3.attn.relative_position_bias_table', 'encoder.layers.2.blocks.3.attn.qkv.weight', 'encoder.layers.2.blocks.3.attn.proj.weight', 'encoder.layers.2.blocks.3.mlp.fc1.weight', 'encoder.layers.2.blocks.3.mlp.fc2.weight', 'encoder.layers.2.blocks.4.attn.relative_position_bias_table', 'encoder.layers.2.blocks.4.attn.qkv.weight', 'encoder.layers.2.blocks.4.attn.proj.weight', 'encoder.layers.2.blocks.4.mlp.fc1.weight', 'encoder.layers.2.blocks.4.mlp.fc2.weight', 'encoder.layers.2.blocks.5.attn.relative_position_bias_table', 'encoder.layers.2.blocks.5.attn.qkv.weight', 'encoder.layers.2.blocks.5.attn.proj.weight', 'encoder.layers.2.blocks.5.mlp.fc1.weight', 'encoder.layers.2.blocks.5.mlp.fc2.weight', 'encoder.layers.2.blocks.6.attn.relative_position_bias_table', 'encoder.layers.2.blocks.6.attn.qkv.weight', 'encoder.layers.2.blocks.6.attn.proj.weight', 'encoder.layers.2.blocks.6.mlp.fc1.weight', 'encoder.layers.2.blocks.6.mlp.fc2.weight', 'encoder.layers.2.blocks.7.attn.relative_position_bias_table', 'encoder.layers.2.blocks.7.attn.qkv.weight', 'encoder.layers.2.blocks.7.attn.proj.weight', 'encoder.layers.2.blocks.7.mlp.fc1.weight', 'encoder.layers.2.blocks.7.mlp.fc2.weight', 'encoder.layers.2.blocks.8.attn.relative_position_bias_table', 'encoder.layers.2.blocks.8.attn.qkv.weight', 'encoder.layers.2.blocks.8.attn.proj.weight', 'encoder.layers.2.blocks.8.mlp.fc1.weight', 'encoder.layers.2.blocks.8.mlp.fc2.weight', 'encoder.layers.2.blocks.9.attn.relative_position_bias_table', 'encoder.layers.2.blocks.9.attn.qkv.weight', 'encoder.layers.2.blocks.9.attn.proj.weight', 'encoder.layers.2.blocks.9.mlp.fc1.weight', 'encoder.layers.2.blocks.9.mlp.fc2.weight', 'encoder.layers.2.blocks.10.attn.relative_position_bias_table', 'encoder.layers.2.blocks.10.attn.qkv.weight', 'encoder.layers.2.blocks.10.attn.proj.weight', 'encoder.layers.2.blocks.10.mlp.fc1.weight', 'encoder.layers.2.blocks.10.mlp.fc2.weight', 'encoder.layers.2.blocks.11.attn.relative_position_bias_table', 'encoder.layers.2.blocks.11.attn.qkv.weight', 'encoder.layers.2.blocks.11.attn.proj.weight', 'encoder.layers.2.blocks.11.mlp.fc1.weight', 'encoder.layers.2.blocks.11.mlp.fc2.weight', 'encoder.layers.2.blocks.12.attn.relative_position_bias_table', 'encoder.layers.2.blocks.12.attn.qkv.weight', 'encoder.layers.2.blocks.12.attn.proj.weight', 'encoder.layers.2.blocks.12.mlp.fc1.weight', 'encoder.layers.2.blocks.12.mlp.fc2.weight', 'encoder.layers.2.blocks.13.attn.relative_position_bias_table', 'encoder.layers.2.blocks.13.attn.qkv.weight', 'encoder.layers.2.blocks.13.attn.proj.weight', 'encoder.layers.2.blocks.13.mlp.fc1.weight', 'encoder.layers.2.blocks.13.mlp.fc2.weight', 'encoder.layers.2.blocks.14.attn.relative_position_bias_table', 'encoder.layers.2.blocks.14.attn.qkv.weight', 'encoder.layers.2.blocks.14.attn.proj.weight', 'encoder.layers.2.blocks.14.mlp.fc1.weight', 'encoder.layers.2.blocks.14.mlp.fc2.weight', 'encoder.layers.2.blocks.15.attn.relative_position_bias_table', 'encoder.layers.2.blocks.15.attn.qkv.weight', 'encoder.layers.2.blocks.15.attn.proj.weight', 'encoder.layers.2.blocks.15.mlp.fc1.weight', 'encoder.layers.2.blocks.15.mlp.fc2.weight', 'encoder.layers.2.blocks.16.attn.relative_position_bias_table', 'encoder.layers.2.blocks.16.attn.qkv.weight', 'encoder.layers.2.blocks.16.attn.proj.weight', 'encoder.layers.2.blocks.16.mlp.fc1.weight', 'encoder.layers.2.blocks.16.mlp.fc2.weight', 'encoder.layers.2.blocks.17.attn.relative_position_bias_table', 'encoder.layers.2.blocks.17.attn.qkv.weight', 'encoder.layers.2.blocks.17.attn.proj.weight', 'encoder.layers.2.blocks.17.mlp.fc1.weight', 'encoder.layers.2.blocks.17.mlp.fc2.weight', 'encoder.layers.2.downsample.reduction.weight', 'encoder.layers.3.blocks.0.attn.relative_position_bias_table', 'encoder.layers.3.blocks.0.attn.qkv.weight', 'encoder.layers.3.blocks.0.attn.proj.weight', 'encoder.layers.3.blocks.0.mlp.fc1.weight', 'encoder.layers.3.blocks.0.mlp.fc2.weight', 'encoder.layers.3.blocks.1.attn.relative_position_bias_table', 'encoder.layers.3.blocks.1.attn.qkv.weight', 'encoder.layers.3.blocks.1.attn.proj.weight', 'encoder.layers.3.blocks.1.mlp.fc1.weight', 'encoder.layers.3.blocks.1.mlp.fc2.weight', 'decoder.0.weight']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 43): INFO AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0008
    weight_decay: 0.05

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0008
    weight_decay: 0.0
)
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 83): INFO number of params: 89874104
[2022-02-05 09:22:26 simmim_pretrain] (utils.py 81): INFO All checkpoints founded in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep: []
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 100): INFO no checkpoint found in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep, ignoring auto resume
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 105): INFO Start training
[2022-02-05 09:24:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][0/1251]	eta 1 day, 15:53:49 lr 0.000004	time 114.8121 (114.8121)	loss 0.5543 (0.5543)	grad_norm 0.2902 (0.2902)	mem 17192MB
[2022-02-05 09:45:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][100/1251]	eta 4:24:36 lr 0.000010	time 0.3949 (13.7934)	loss 0.4499 (0.4969)	grad_norm 1.0401 (0.2900)	mem 18238MB
[2022-02-05 10:06:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][200/1251]	eta 3:52:30 lr 0.000017	time 75.5072 (13.2732)	loss 0.3752 (0.4565)	grad_norm 2.8639 (1.6425)	mem 18238MB
[2022-02-05 10:28:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][300/1251]	eta 3:27:27 lr 0.000023	time 0.3941 (13.0894)	loss 0.3553 (0.4264)	grad_norm 2.0591 (2.8358)	mem 18238MB
[2022-02-05 10:48:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][400/1251]	eta 3:02:30 lr 0.000029	time 57.4084 (12.8679)	loss 0.3173 (0.4040)	grad_norm 1.1405 (3.6005)	mem 18238MB
[2022-02-05 11:08:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][500/1251]	eta 2:38:59 lr 0.000036	time 0.3942 (12.7019)	loss 0.3129 (0.3879)	grad_norm 4.7302 (4.0156)	mem 18238MB
[2022-02-05 11:29:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][600/1251]	eta 2:17:56 lr 0.000042	time 86.9880 (12.7132)	loss 0.3042 (0.3741)	grad_norm 2.4576 (4.0197)	mem 18238MB
[2022-02-05 11:49:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][700/1251]	eta 1:55:17 lr 0.000048	time 0.3943 (12.5542)	loss 0.2920 (0.3630)	grad_norm 4.6089 (4.0017)	mem 18239MB
[2022-02-05 12:09:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][800/1251]	eta 1:34:10 lr 0.000055	time 73.9639 (12.5290)	loss 0.2979 (0.3536)	grad_norm 3.4510 (3.9055)	mem 18239MB
[2022-02-05 12:29:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][900/1251]	eta 1:13:00 lr 0.000061	time 0.3981 (12.4787)	loss 0.2693 (0.3459)	grad_norm 1.5775 (3.8091)	mem 18239MB
[2022-02-05 12:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1000/1251]	eta 0:52:00 lr 0.000068	time 18.3918 (12.4334)	loss 0.2786 (0.3394)	grad_norm 1.2491 (3.7356)	mem 18239MB
[2022-02-05 13:10:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1100/1251]	eta 0:31:18 lr 0.000074	time 0.4033 (12.4426)	loss 0.2725 (0.3335)	grad_norm 2.2311 (3.6312)	mem 18239MB
[2022-02-05 13:30:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1200/1251]	eta 0:10:32 lr 0.000080	time 31.6500 (12.4020)	loss 0.2715 (0.3286)	grad_norm 1.2720 (3.5534)	mem 18239MB
[2022-02-05 13:39:44 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 0 training takes 4:17:18
[2022-02-05 13:39:44 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saving......
[2022-02-05 13:39:46 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saved !!!
[2022-02-05 13:39:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][0/1251]	eta 1:01:34 lr 0.000083	time 2.9530 (2.9530)	loss 0.2705 (0.2705)	grad_norm 0.8280 (0.8280)	mem 18239MB
[2022-02-05 13:41:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][100/1251]	eta 0:14:17 lr 0.000090	time 0.6114 (0.7453)	loss 0.2802 (0.2693)	grad_norm 3.6450 (2.3059)	mem 18239MB
[2022-02-05 13:42:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][200/1251]	eta 0:14:40 lr 0.000096	time 0.7879 (0.8375)	loss 0.2727 (0.2691)	grad_norm 2.2279 (2.2994)	mem 18239MB
[2022-02-05 13:44:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][300/1251]	eta 0:13:41 lr 0.000103	time 0.4401 (0.8638)	loss 0.2757 (0.2682)	grad_norm 1.1539 (2.2752)	mem 18239MB
[2022-02-05 13:45:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][400/1251]	eta 0:11:34 lr 0.000109	time 0.4306 (0.8162)	loss 0.2588 (0.2672)	grad_norm 1.2593 (2.2458)	mem 18239MB
[2022-02-05 13:46:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][500/1251]	eta 0:10:38 lr 0.000115	time 0.5900 (0.8503)	loss 0.2552 (0.2668)	grad_norm 1.4727 (2.2056)	mem 18240MB
[2022-02-05 13:47:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][600/1251]	eta 0:08:45 lr 0.000122	time 0.4254 (0.8066)	loss 0.2584 (0.2662)	grad_norm 1.1834 (2.1712)	mem 18240MB
[2022-02-05 13:48:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][700/1251]	eta 0:06:56 lr 0.000128	time 0.4058 (0.7558)	loss 0.2641 (0.2653)	grad_norm 1.1315 (2.1186)	mem 18240MB
[2022-02-05 13:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][800/1251]	eta 0:05:41 lr 0.000134	time 0.4352 (0.7570)	loss 0.2742 (0.2649)	grad_norm 0.7488 (2.0964)	mem 18240MB
[2022-02-05 13:51:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][900/1251]	eta 0:04:35 lr 0.000141	time 0.4130 (0.7842)	loss 0.2476 (0.2644)	grad_norm 0.6401 (2.0539)	mem 18240MB
[2022-02-05 13:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1000/1251]	eta 0:03:08 lr 0.000147	time 0.4153 (0.7508)	loss 0.2717 (0.2639)	grad_norm 2.2334 (2.0098)	mem 18240MB
[2022-02-05 13:53:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1100/1251]	eta 0:01:51 lr 0.000154	time 0.4521 (0.7393)	loss 0.2551 (0.2633)	grad_norm 1.4980 (1.9817)	mem 18240MB
[2022-02-05 13:55:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1200/1251]	eta 0:00:39 lr 0.000160	time 0.4667 (0.7788)	loss 0.2664 (0.2627)	grad_norm 0.7340 (1.9572)	mem 18240MB
[2022-02-05 13:56:06 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 1 training takes 0:16:20
[2022-02-05 13:56:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][0/1251]	eta 1:16:36 lr 0.000163	time 3.6739 (3.6739)	loss 0.2620 (0.2620)	grad_norm 0.9611 (0.9611)	mem 18240MB
[2022-02-05 13:56:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][100/1251]	eta 0:09:22 lr 0.000169	time 0.4276 (0.4883)	loss 0.2562 (0.2552)	grad_norm 0.5311 (1.6903)	mem 18240MB
[2022-02-05 13:58:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][200/1251]	eta 0:11:20 lr 0.000176	time 0.4207 (0.6473)	loss 0.2618 (0.2542)	grad_norm 0.6081 (1.6235)	mem 18240MB
[2022-02-05 13:59:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][300/1251]	eta 0:09:36 lr 0.000182	time 0.4451 (0.6061)	loss 0.2528 (0.2531)	grad_norm 0.4520 (1.6033)	mem 18240MB
[2022-02-05 14:00:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][400/1251]	eta 0:09:29 lr 0.000189	time 0.4445 (0.6689)	loss 0.2413 (0.2525)	grad_norm 0.6562 (1.5654)	mem 18240MB
[2022-02-05 14:01:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][500/1251]	eta 0:08:08 lr 0.000195	time 2.1151 (0.6503)	loss 0.2539 (0.2520)	grad_norm 1.8790 (1.5394)	mem 18240MB
[2022-02-05 14:03:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][600/1251]	eta 0:07:55 lr 0.000201	time 0.5791 (0.7308)	loss 0.2295 (0.2516)	grad_norm 1.3565 (1.5373)	mem 18240MB
[2022-02-05 14:04:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][700/1251]	eta 0:06:53 lr 0.000208	time 0.4240 (0.7508)	loss 0.2464 (0.2511)	grad_norm 0.5189 (1.5236)	mem 18240MB
[2022-02-05 14:06:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][800/1251]	eta 0:05:48 lr 0.000214	time 2.2608 (0.7731)	loss 0.2481 (0.2507)	grad_norm 0.4695 (1.4909)	mem 18240MB
[2022-02-05 14:08:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][900/1251]	eta 0:04:40 lr 0.000220	time 0.4068 (0.7993)	loss 0.2637 (0.2503)	grad_norm 1.5514 (1.4829)	mem 18240MB
[2022-02-05 14:09:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1000/1251]	eta 0:03:23 lr 0.000227	time 1.3535 (0.8115)	loss 0.2443 (0.2498)	grad_norm 0.9744 (1.4653)	mem 18240MB
[2022-02-05 14:11:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1100/1251]	eta 0:02:04 lr 0.000233	time 0.4410 (0.8271)	loss 0.2425 (0.2491)	grad_norm 1.9947 (1.4427)	mem 18240MB
[2022-02-05 14:12:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1200/1251]	eta 0:00:42 lr 0.000239	time 0.4275 (0.8318)	loss 0.2465 (0.2486)	grad_norm 0.6265 (1.4366)	mem 18240MB
[2022-02-05 14:13:21 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 2 training takes 0:17:15
[2022-02-05 14:13:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][0/1251]	eta 1:24:40 lr 0.000243	time 4.0614 (4.0614)	loss 0.2433 (0.2433)	grad_norm 0.7067 (0.7067)	mem 18240MB
[2022-02-05 14:14:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][100/1251]	eta 0:09:27 lr 0.000249	time 0.4092 (0.4935)	loss 0.2476 (0.2430)	grad_norm 1.1159 (1.3134)	mem 18240MB
[2022-02-05 14:15:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][200/1251]	eta 0:10:07 lr 0.000255	time 0.3964 (0.5784)	loss 0.2400 (0.2425)	grad_norm 0.3384 (1.2386)	mem 18240MB
[2022-02-05 14:16:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][300/1251]	eta 0:08:53 lr 0.000262	time 0.4966 (0.5605)	loss 0.2404 (0.2416)	grad_norm 0.3401 (1.1964)	mem 18240MB
[2022-02-05 14:17:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][400/1251]	eta 0:09:07 lr 0.000268	time 0.7116 (0.6430)	loss 0.2314 (0.2411)	grad_norm 1.4900 (1.2040)	mem 18240MB
[2022-02-05 14:19:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][500/1251]	eta 0:09:34 lr 0.000275	time 0.4066 (0.7646)	loss 0.2282 (0.2405)	grad_norm 0.5011 (1.2036)	mem 18240MB
[2022-02-05 14:21:30 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][600/1251]	eta 0:08:49 lr 0.000281	time 0.4160 (0.8126)	loss 0.2414 (0.2404)	grad_norm 0.9795 (1.1974)	mem 18240MB
[2022-02-05 14:23:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][700/1251]	eta 0:07:41 lr 0.000287	time 0.4862 (0.8377)	loss 0.2334 (0.2401)	grad_norm 0.4512 (1.1759)	mem 18240MB
[2022-02-05 14:24:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][800/1251]	eta 0:06:27 lr 0.000294	time 0.5067 (0.8583)	loss 0.2418 (0.2398)	grad_norm 1.2394 (1.1746)	mem 18240MB
[2022-02-05 14:26:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][900/1251]	eta 0:05:03 lr 0.000300	time 0.4366 (0.8635)	loss 0.2361 (0.2394)	grad_norm 0.5397 (1.1654)	mem 18240MB
[2022-02-05 14:27:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1000/1251]	eta 0:03:37 lr 0.000306	time 1.1073 (0.8658)	loss 0.2352 (0.2390)	grad_norm 0.6021 (1.1541)	mem 18241MB
[2022-02-05 14:29:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1100/1251]	eta 0:02:08 lr 0.000313	time 0.5436 (0.8526)	loss 0.2295 (0.2387)	grad_norm 1.1236 (1.1382)	mem 18241MB
[2022-02-05 14:30:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1200/1251]	eta 0:00:44 lr 0.000319	time 0.4830 (0.8667)	loss 0.2486 (0.2385)	grad_norm 0.4466 (1.1277)	mem 18241MB
[2022-02-05 14:31:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 3 training takes 0:18:11
[2022-02-05 14:31:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][0/1251]	eta 1:30:19 lr 0.000322	time 4.3320 (4.3320)	loss 0.2309 (0.2309)	grad_norm 0.3473 (0.3473)	mem 18241MB
[2022-02-05 14:32:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][100/1251]	eta 0:09:42 lr 0.000329	time 0.4199 (0.5059)	loss 0.2313 (0.2347)	grad_norm 0.9780 (0.9537)	mem 18241MB
[2022-02-05 14:33:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][200/1251]	eta 0:10:42 lr 0.000335	time 0.4042 (0.6115)	loss 0.2380 (0.2338)	grad_norm 0.4685 (0.9641)	mem 18241MB
[2022-02-05 14:35:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][300/1251]	eta 0:12:57 lr 0.000341	time 0.4448 (0.8171)	loss 0.2274 (0.2339)	grad_norm 0.5854 (0.9808)	mem 18241MB
[2022-02-05 14:37:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][400/1251]	eta 0:12:15 lr 0.000348	time 0.9284 (0.8642)	loss 0.2300 (0.2342)	grad_norm 0.5273 (0.9884)	mem 18241MB
[2022-02-05 14:38:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][500/1251]	eta 0:10:11 lr 0.000354	time 0.4123 (0.8136)	loss 0.2346 (0.2337)	grad_norm 0.7111 (0.9791)	mem 18241MB
[2022-02-05 14:39:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][600/1251]	eta 0:09:01 lr 0.000361	time 0.4479 (0.8323)	loss 0.2305 (0.2336)	grad_norm 0.7723 (0.9726)	mem 18241MB
[2022-02-05 14:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][700/1251]	eta 0:07:37 lr 0.000367	time 1.2561 (0.8298)	loss 0.2416 (0.2333)	grad_norm 0.7113 (0.9652)	mem 18241MB
[2022-02-05 14:43:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][800/1251]	eta 0:06:29 lr 0.000373	time 2.0054 (0.8637)	loss 0.2229 (0.2332)	grad_norm 0.3053 (0.9582)	mem 18241MB
[2022-02-05 14:44:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][900/1251]	eta 0:05:07 lr 0.000380	time 2.2077 (0.8764)	loss 0.2203 (0.2330)	grad_norm 0.9912 (0.9536)	mem 18241MB
[2022-02-05 14:46:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1000/1251]	eta 0:03:41 lr 0.000386	time 0.4317 (0.8842)	loss 0.2330 (0.2327)	grad_norm 0.4332 (0.9454)	mem 18241MB
[2022-02-05 14:47:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1100/1251]	eta 0:02:09 lr 0.000392	time 1.9930 (0.8594)	loss 0.2376 (0.2325)	grad_norm 0.3494 (0.9425)	mem 18241MB
[2022-02-05 14:49:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1200/1251]	eta 0:00:44 lr 0.000399	time 3.0251 (0.8816)	loss 0.2229 (0.2322)	grad_norm 0.3280 (0.9404)	mem 18241MB
[2022-02-05 14:49:56 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 4 training takes 0:18:23
[2022-02-05 14:50:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][0/1251]	eta 1:20:39 lr 0.000402	time 3.8685 (3.8685)	loss 0.2361 (0.2361)	grad_norm 0.2441 (0.2441)	mem 18241MB
[2022-02-05 14:50:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][100/1251]	eta 0:09:10 lr 0.000408	time 0.4087 (0.4786)	loss 0.2268 (0.2297)	grad_norm 0.4077 (0.9384)	mem 18241MB
[2022-02-05 14:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][200/1251]	eta 0:11:06 lr 0.000415	time 0.4741 (0.6344)	loss 0.2332 (0.2293)	grad_norm 0.6293 (0.8776)	mem 18241MB
[2022-02-05 14:53:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][300/1251]	eta 0:09:43 lr 0.000421	time 0.4483 (0.6141)	loss 0.4291 (0.2770)	grad_norm 0.1236 (nan)	mem 18241MB
[2022-02-05 14:54:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][400/1251]	eta 0:09:25 lr 0.000427	time 0.4158 (0.6646)	loss 0.4575 (0.3163)	grad_norm 0.7630 (nan)	mem 18241MB
[2022-02-05 14:55:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][500/1251]	eta 0:07:59 lr 0.000434	time 0.4047 (0.6385)	loss 0.4399 (0.3400)	grad_norm 1.2759 (nan)	mem 18241MB
[2022-02-05 14:57:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][600/1251]	eta 0:07:54 lr 0.000440	time 0.3981 (0.7282)	loss 0.5130 (0.3663)	grad_norm 0.0916 (nan)	mem 18241MB
[2022-02-05 14:58:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][700/1251]	eta 0:06:42 lr 0.000446	time 0.4456 (0.7310)	loss 0.4537 (0.3812)	grad_norm 0.5641 (nan)	mem 18241MB
[2022-02-05 14:59:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][800/1251]	eta 0:05:14 lr 0.000453	time 0.4507 (0.6973)	loss 0.5098 (0.3919)	grad_norm 0.3081 (nan)	mem 18241MB
[2022-02-05 15:01:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][900/1251]	eta 0:04:22 lr 0.000459	time 0.4445 (0.7472)	loss 0.4913 (0.4046)	grad_norm 0.0084 (nan)	mem 18241MB
[2022-02-05 15:03:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1000/1251]	eta 0:03:16 lr 0.000466	time 3.5568 (0.7843)	loss 0.5040 (0.4146)	grad_norm 0.0242 (nan)	mem 18241MB
[2022-02-05 15:04:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1100/1251]	eta 0:01:58 lr 0.000472	time 0.4314 (0.7874)	loss 0.4915 (0.4226)	grad_norm 0.2421 (nan)	mem 18241MB
[2022-02-05 15:05:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1200/1251]	eta 0:00:38 lr 0.000478	time 0.4317 (0.7606)	loss 0.5508 (0.4266)	grad_norm 0.0683 (nan)	mem 18241MB
[2022-02-05 15:05:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 5 training takes 0:15:37
[2022-02-05 15:05:33 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saving......
[2022-02-05 15:05:36 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saved !!!
[2022-02-05 15:05:40 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][0/1251]	eta 1:23:58 lr 0.000481	time 4.0272 (4.0272)	loss 0.4738 (0.4738)	grad_norm 1.3644 (1.3644)	mem 18241MB
[2022-02-05 15:07:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][100/1251]	eta 0:16:57 lr 0.000488	time 0.4414 (0.8841)	loss 0.4598 (0.4497)	grad_norm 0.1361 (nan)	mem 18241MB
[2022-02-05 15:08:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][200/1251]	eta 0:16:22 lr 0.000494	time 0.4016 (0.9352)	loss 0.5048 (0.4722)	grad_norm 0.0122 (nan)	mem 18241MB
[2022-02-05 15:10:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][300/1251]	eta 0:14:59 lr 0.000501	time 1.8795 (0.9461)	loss 0.4727 (0.4830)	grad_norm 0.0052 (nan)	mem 18241MB
[2022-02-05 15:12:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][400/1251]	eta 0:13:44 lr 0.000507	time 1.0088 (0.9684)	loss 0.4677 (0.4865)	grad_norm 0.0896 (nan)	mem 18241MB
[2022-02-05 15:13:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][500/1251]	eta 0:11:58 lr 0.000513	time 0.4878 (0.9563)	loss 0.5154 (0.4816)	grad_norm 24.0200 (nan)	mem 18241MB
[2022-02-05 15:15:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][600/1251]	eta 0:10:28 lr 0.000520	time 0.4229 (0.9660)	loss 0.4594 (0.4808)	grad_norm 0.9102 (nan)	mem 18243MB
[2022-02-05 15:16:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][700/1251]	eta 0:08:53 lr 0.000526	time 6.5413 (0.9676)	loss 0.4411 (0.4790)	grad_norm 0.7869 (nan)	mem 18243MB
[2022-02-05 15:18:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][800/1251]	eta 0:07:16 lr 0.000532	time 0.4438 (0.9674)	loss 0.4367 (0.4746)	grad_norm 1.4051 (nan)	mem 18243MB
[2022-02-05 15:20:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][900/1251]	eta 0:05:41 lr 0.000539	time 3.8164 (0.9736)	loss 0.4383 (0.4707)	grad_norm 0.0261 (nan)	mem 18243MB
[2022-02-05 15:21:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1000/1251]	eta 0:04:03 lr 0.000545	time 1.6960 (0.9705)	loss 0.4484 (0.4665)	grad_norm 21.2195 (nan)	mem 18243MB
[2022-02-05 15:23:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1100/1251]	eta 0:02:26 lr 0.000552	time 0.4055 (0.9699)	loss 0.4562 (0.4642)	grad_norm 1.7039 (nan)	mem 18243MB
[2022-02-05 15:24:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1200/1251]	eta 0:00:49 lr 0.000558	time 0.6191 (0.9622)	loss 0.4597 (0.4641)	grad_norm 1.6285 (nan)	mem 18243MB
[2022-02-05 15:25:37 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 6 training takes 0:20:01
[2022-02-05 15:25:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][0/1251]	eta 1:24:51 lr 0.000561	time 4.0702 (4.0702)	loss 0.4520 (0.4520)	grad_norm 0.2361 (0.2361)	mem 18243MB
[2022-02-05 15:26:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][100/1251]	eta 0:09:22 lr 0.000567	time 0.4117 (0.4889)	loss 0.4651 (0.4644)	grad_norm 0.0263 (1.5361)	mem 18243MB
[2022-02-05 15:27:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][200/1251]	eta 0:08:14 lr 0.000574	time 0.5824 (0.4702)	loss 0.4427 (0.4608)	grad_norm 0.4953 (2.5894)	mem 18243MB
[2022-02-05 15:27:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][300/1251]	eta 0:07:20 lr 0.000580	time 0.4171 (0.4631)	loss 0.4863 (0.4698)	grad_norm 0.0398 (2.0401)	mem 18243MB
[2022-02-05 15:30:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][400/1251]	eta 0:09:53 lr 0.000587	time 0.4244 (0.6975)	loss 0.4536 (0.4673)	grad_norm 0.3069 (1.8147)	mem 18243MB
[2022-02-05 15:32:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][500/1251]	eta 0:09:52 lr 0.000593	time 0.4264 (0.7884)	loss 0.4325 (0.4624)	grad_norm 0.1347 (2.3873)	mem 18243MB
[2022-02-05 15:33:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][600/1251]	eta 0:08:55 lr 0.000599	time 1.4852 (0.8225)	loss 0.4847 (0.4582)	grad_norm 9.4087 (2.4801)	mem 18243MB
[2022-02-05 15:35:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][700/1251]	eta 0:07:38 lr 0.000606	time 0.4803 (0.8325)	loss 0.4958 (0.4663)	grad_norm 0.0721 (2.2051)	mem 18243MB
[2022-02-05 15:37:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][800/1251]	eta 0:06:24 lr 0.000612	time 0.4407 (0.8519)	loss 0.5036 (0.4695)	grad_norm 0.0279 (2.1894)	mem 18243MB
[2022-02-05 15:38:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][900/1251]	eta 0:05:05 lr 0.000618	time 1.0429 (0.8691)	loss 0.4598 (0.4725)	grad_norm 0.3491 (1.9502)	mem 18243MB
[2022-02-05 15:40:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1000/1251]	eta 0:03:39 lr 0.000625	time 0.4146 (0.8731)	loss 0.4447 (0.4727)	grad_norm 0.0666 (1.8049)	mem 18243MB
[2022-02-05 15:41:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1100/1251]	eta 0:02:13 lr 0.000631	time 0.4089 (0.8846)	loss 0.5773 (0.4706)	grad_norm 533.4438 (2.1825)	mem 18243MB
[2022-02-05 15:43:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1200/1251]	eta 0:00:45 lr 0.000637	time 0.4819 (0.8905)	loss 0.4459 (0.4708)	grad_norm 0.2206 (inf)	mem 18243MB
[2022-02-05 15:44:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 7 training takes 0:18:32
[2022-02-05 15:44:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][0/1251]	eta 1:22:31 lr 0.000641	time 3.9582 (3.9582)	loss 0.4379 (0.4379)	grad_norm 0.1034 (0.1034)	mem 18243MB
[2022-02-05 15:44:59 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][100/1251]	eta 0:09:21 lr 0.000647	time 0.4268 (0.4878)	loss 0.4240 (0.4471)	grad_norm 0.1163 (0.5972)	mem 18243MB
[2022-02-05 15:46:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][200/1251]	eta 0:12:24 lr 0.000653	time 0.4756 (0.7080)	loss 0.5335 (0.4479)	grad_norm 0.6204 (5.4569)	mem 18243MB
[2022-02-05 15:47:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][300/1251]	eta 0:09:50 lr 0.000660	time 0.4158 (0.6213)	loss 0.5053 (0.4720)	grad_norm 0.0163 (3.8024)	mem 18243MB
[2022-02-05 15:48:02 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][400/1251]	eta 0:08:11 lr 0.000666	time 0.4401 (0.5773)	loss 0.4971 (0.4803)	grad_norm 0.0055 (2.8562)	mem 18243MB
[2022-02-05 15:49:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][500/1251]	eta 0:07:22 lr 0.000673	time 1.8764 (0.5890)	loss 0.5002 (0.4848)	grad_norm 0.0067 (2.2872)	mem 18243MB
[2022-02-05 15:51:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][600/1251]	eta 0:07:31 lr 0.000679	time 0.4090 (0.6942)	loss 0.4947 (0.4882)	grad_norm 0.0027 (1.9076)	mem 18243MB
[2022-02-05 15:52:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][700/1251]	eta 0:06:50 lr 0.000685	time 2.9712 (0.7456)	loss 0.5094 (0.4906)	grad_norm 0.0018 (1.6364)	mem 18243MB
[2022-02-05 15:53:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][800/1251]	eta 0:05:29 lr 0.000692	time 0.5231 (0.7305)	loss 0.5050 (0.4927)	grad_norm 0.0023 (1.4328)	mem 18243MB
[2022-02-05 15:55:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][900/1251]	eta 0:04:29 lr 0.000698	time 0.4867 (0.7679)	loss 0.5158 (0.4942)	grad_norm 0.0031 (1.2744)	mem 18243MB
[2022-02-05 15:57:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1000/1251]	eta 0:03:18 lr 0.000704	time 2.5024 (0.7920)	loss 0.5137 (0.4952)	grad_norm 0.0069 (1.1477)	mem 18243MB
[2022-02-05 15:58:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1100/1251]	eta 0:01:56 lr 0.000711	time 0.4010 (0.7713)	loss 0.5179 (0.4962)	grad_norm 0.0027 (1.0440)	mem 18243MB
[2022-02-05 16:00:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1200/1251]	eta 0:00:40 lr 0.000717	time 0.3946 (0.7944)	loss 0.5119 (0.4969)	grad_norm 0.0025 (0.9576)	mem 18243MB
[2022-02-05 16:00:52 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 8 training takes 0:16:41
[2022-02-05 16:00:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][0/1251]	eta 1:20:40 lr 0.000720	time 3.8696 (3.8696)	loss 0.4855 (0.4855)	grad_norm 0.0029 (0.0029)	mem 18243MB
[2022-02-05 16:01:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][100/1251]	eta 0:09:21 lr 0.000727	time 0.4418 (0.4877)	loss 0.5036 (0.5047)	grad_norm 0.0077 (0.0063)	mem 18243MB
[2022-02-05 16:02:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][200/1251]	eta 0:08:13 lr 0.000733	time 0.4427 (0.4691)	loss 0.5000 (0.5045)	grad_norm 0.0043 (0.0063)	mem 18243MB
[2022-02-05 16:03:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][300/1251]	eta 0:07:20 lr 0.000739	time 0.4323 (0.4635)	loss 0.5210 (0.5047)	grad_norm 0.0036 (0.0064)	mem 18243MB
[2022-02-05 16:05:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][400/1251]	eta 0:08:49 lr 0.000746	time 0.4176 (0.6220)	loss 0.4839 (0.5052)	grad_norm 0.0049 (0.0073)	mem 18243MB
[2022-02-05 16:06:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][500/1251]	eta 0:09:07 lr 0.000752	time 0.4086 (0.7284)	loss 0.4946 (0.5054)	grad_norm 0.0034 (0.0072)	mem 18243MB
[2022-02-05 16:08:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][600/1251]	eta 0:08:17 lr 0.000759	time 0.4523 (0.7643)	loss 0.5037 (0.5055)	grad_norm 0.0185 (0.0070)	mem 18243MB
[2022-02-05 16:10:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][700/1251]	eta 0:07:18 lr 0.000765	time 0.4846 (0.7965)	loss 0.5141 (0.5057)	grad_norm 0.0029 (0.0071)	mem 18243MB
[2022-02-05 16:11:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][800/1251]	eta 0:06:11 lr 0.000771	time 0.5237 (0.8228)	loss 0.4947 (0.5055)	grad_norm 0.0037 (0.0071)	mem 18243MB
[2022-02-05 16:13:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][900/1251]	eta 0:04:50 lr 0.000778	time 0.4529 (0.8269)	loss 0.5303 (0.5055)	grad_norm 0.0031 (0.0073)	mem 18243MB
[2022-02-05 16:14:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1000/1251]	eta 0:03:28 lr 0.000784	time 5.7999 (0.8313)	loss 0.5151 (0.5056)	grad_norm 0.0050 (0.0074)	mem 18243MB
[2022-02-05 16:16:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1100/1251]	eta 0:02:07 lr 0.000790	time 1.1566 (0.8422)	loss 0.4930 (0.5055)	grad_norm 0.0044 (0.0074)	mem 18243MB
[2022-02-05 16:17:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1200/1251]	eta 0:00:43 lr 0.000797	time 0.5183 (0.8531)	loss 0.4922 (0.5056)	grad_norm 0.0028 (0.0076)	mem 18243MB
[2022-02-05 16:18:39 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 9 training takes 0:17:46
[2022-02-05 16:18:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][0/1251]	eta 1:24:03 lr 0.000800	time 4.0314 (4.0314)	loss 0.5028 (0.5028)	grad_norm 0.0046 (0.0046)	mem 18243MB
[2022-02-05 16:19:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][100/1251]	eta 0:09:18 lr 0.000781	time 0.4616 (0.4852)	loss 0.5053 (0.5051)	grad_norm 0.0029 (0.0082)	mem 18243MB
[2022-02-05 16:20:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][200/1251]	eta 0:10:45 lr 0.000781	time 1.2564 (0.6146)	loss 0.5208 (0.5047)	grad_norm 0.0030 (0.0077)	mem 18243MB
[2022-02-05 16:22:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][300/1251]	eta 0:13:19 lr 0.000781	time 0.4131 (0.8408)	loss 0.5163 (0.5054)	grad_norm 0.0067 (0.0078)	mem 18243MB
[2022-02-05 16:24:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][400/1251]	eta 0:12:00 lr 0.000780	time 0.4386 (0.8464)	loss 0.5159 (0.5057)	grad_norm 0.0075 (0.0083)	mem 18243MB
[2022-02-05 16:25:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][500/1251]	eta 0:09:37 lr 0.000780	time 0.4158 (0.7694)	loss 0.5114 (0.5056)	grad_norm 0.0055 (0.0083)	mem 18243MB
[2022-02-05 16:26:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][600/1251]	eta 0:08:21 lr 0.000780	time 0.4583 (0.7696)	loss 0.5191 (0.5058)	grad_norm 0.0064 (0.0083)	mem 18243MB
[2022-02-05 16:27:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][700/1251]	eta 0:06:53 lr 0.000779	time 0.4195 (0.7505)	loss 0.4864 (0.5056)	grad_norm 0.0081 (0.0085)	mem 18243MB
[2022-02-05 16:29:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][800/1251]	eta 0:06:11 lr 0.000779	time 1.3727 (0.8233)	loss 0.4949 (0.5058)	grad_norm 0.0031 (0.0089)	mem 18243MB
[2022-02-05 16:31:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][900/1251]	eta 0:04:56 lr 0.000779	time 8.4577 (0.8454)	loss 0.5168 (0.5056)	grad_norm 0.0051 (0.0087)	mem 18243MB
[2022-02-05 16:32:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1000/1251]	eta 0:03:30 lr 0.000778	time 0.4084 (0.8387)	loss 0.5202 (0.5056)	grad_norm 0.0031 (0.0088)	mem 18243MB
[2022-02-05 16:34:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1100/1251]	eta 0:02:08 lr 0.000778	time 0.4177 (0.8530)	loss 0.5111 (0.5056)	grad_norm 0.0056 (0.0087)	mem 18243MB
[2022-02-05 16:35:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1200/1251]	eta 0:00:44 lr 0.000778	time 0.4621 (0.8640)	loss 0.4990 (0.5055)	grad_norm 0.0050 (0.0088)	mem 18243MB
[2022-02-05 16:36:41 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 10 training takes 0:18:02
[2022-02-05 16:36:41 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saving......
[2022-02-05 16:36:44 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saved !!!
[2022-02-05 16:36:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][0/1251]	eta 1:13:03 lr 0.000778	time 3.5042 (3.5042)	loss 0.5118 (0.5118)	grad_norm 0.0109 (0.0109)	mem 18243MB
[2022-02-05 16:37:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][100/1251]	eta 0:12:09 lr 0.000777	time 0.5750 (0.6336)	loss 0.5325 (0.5074)	grad_norm 0.0052 (0.0076)	mem 18243MB
[2022-02-05 16:39:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][200/1251]	eta 0:14:14 lr 0.000777	time 0.4904 (0.8130)	loss 0.5064 (0.5059)	grad_norm 0.0036 (0.0104)	mem 18243MB
[2022-02-05 16:41:03 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][300/1251]	eta 0:13:38 lr 0.000777	time 0.5833 (0.8607)	loss 0.4996 (0.5055)	grad_norm 0.0044 (0.0099)	mem 18243MB
[2022-02-05 16:42:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][400/1251]	eta 0:12:40 lr 0.000776	time 1.4644 (0.8932)	loss 0.5054 (0.5057)	grad_norm 0.0049 (0.0093)	mem 18243MB
[2022-02-05 16:43:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][500/1251]	eta 0:10:45 lr 0.000776	time 0.4532 (0.8590)	loss 0.4886 (0.5055)	grad_norm 0.0048 (0.0096)	mem 18243MB
[2022-02-05 16:45:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][600/1251]	eta 0:09:28 lr 0.000776	time 0.4424 (0.8730)	loss 0.5031 (0.5053)	grad_norm 0.0035 (0.0096)	mem 18243MB
[2022-02-05 16:46:58 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][700/1251]	eta 0:08:02 lr 0.000775	time 0.4489 (0.8760)	loss 0.5361 (0.5057)	grad_norm 0.0055 (0.0094)	mem 18243MB
[2022-02-05 16:47:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][800/1251]	eta 0:06:11 lr 0.000775	time 0.4341 (0.8229)	loss 0.4948 (0.5057)	grad_norm 0.0069 (0.0095)	mem 18243MB
[2022-02-05 16:49:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][900/1251]	eta 0:04:55 lr 0.000775	time 0.4356 (0.8426)	loss 0.5100 (0.5056)	grad_norm 0.0119 (0.0096)	mem 18243MB
[2022-02-05 16:50:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1000/1251]	eta 0:03:23 lr 0.000774	time 0.6975 (0.8099)	loss 0.5050 (0.5058)	grad_norm 0.0047 (0.0097)	mem 18243MB
[2022-02-05 16:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1100/1251]	eta 0:02:06 lr 0.000774	time 1.0458 (0.8359)	loss 0.5302 (0.5060)	grad_norm 0.0044 (0.0095)	mem 18243MB
[2022-02-05 16:53:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1200/1251]	eta 0:00:43 lr 0.000773	time 0.4221 (0.8547)	loss 0.5000 (0.5061)	grad_norm 0.0074 (0.0096)	mem 18243MB
[2022-02-05 16:54:34 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 11 training takes 0:17:50
[2022-02-05 16:54:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][0/1251]	eta 1:16:28 lr 0.000773	time 3.6676 (3.6676)	loss 0.5215 (0.5215)	grad_norm 0.0116 (0.0116)	mem 18243MB
[2022-02-05 16:55:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][100/1251]	eta 0:14:50 lr 0.000773	time 0.7810 (0.7737)	loss 0.5164 (0.5068)	grad_norm 0.0069 (0.0073)	mem 18243MB
[2022-02-05 16:57:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][200/1251]	eta 0:14:47 lr 0.000773	time 1.7244 (0.8448)	loss 0.5187 (0.5058)	grad_norm 0.0096 (0.0076)	mem 18243MB
[2022-02-05 16:58:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][300/1251]	eta 0:13:41 lr 0.000772	time 0.4478 (0.8634)	loss 0.4969 (0.5061)	grad_norm 0.0054 (0.0086)	mem 18243MB
[2022-02-05 17:00:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][400/1251]	eta 0:11:49 lr 0.000772	time 0.6186 (0.8340)	loss 0.5156 (0.5064)	grad_norm 0.0082 (0.0085)	mem 18243MB
[2022-02-05 17:01:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][500/1251]	eta 0:11:01 lr 0.000772	time 0.4288 (0.8806)	loss 0.5079 (0.5062)	grad_norm 0.0048 (0.0085)	mem 18243MB
[2022-02-05 17:03:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][600/1251]	eta 0:09:45 lr 0.000771	time 0.4818 (0.8992)	loss 0.4947 (0.5059)	grad_norm 0.0039 (0.0084)	mem 18243MB
[2022-02-05 17:05:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][700/1251]	eta 0:08:20 lr 0.000771	time 0.4087 (0.9092)	loss 0.4817 (0.5060)	grad_norm 0.0042 (0.0083)	mem 18243MB
[2022-02-05 17:06:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][800/1251]	eta 0:06:54 lr 0.000770	time 3.8632 (0.9187)	loss 0.5162 (0.5057)	grad_norm 0.0056 (0.0085)	mem 18243MB
[2022-02-05 17:08:22 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][900/1251]	eta 0:05:22 lr 0.000770	time 0.6304 (0.9192)	loss 0.5097 (0.5058)	grad_norm 0.0882 (0.0122)	mem 18243MB
[2022-02-05 17:09:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1000/1251]	eta 0:03:50 lr 0.000770	time 0.4901 (0.9200)	loss 0.5016 (0.5059)	grad_norm 0.0458 (0.0284)	mem 18243MB
[2022-02-05 17:11:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1100/1251]	eta 0:02:20 lr 0.000769	time 4.5322 (0.9288)	loss 0.4981 (0.5058)	grad_norm 0.0095 (0.0943)	mem 18243MB
[2022-02-05 17:13:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1200/1251]	eta 0:00:47 lr 0.000769	time 0.6195 (0.9275)	loss 0.5148 (0.5059)	grad_norm 0.0085 (0.0929)	mem 18243MB
[2022-02-05 17:13:54 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 12 training takes 0:19:19
[2022-02-05 17:13:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][0/1251]	eta 1:16:38 lr 0.000769	time 3.6756 (3.6756)	loss 0.5141 (0.5141)	grad_norm 0.0134 (0.0134)	mem 18243MB
[2022-02-05 17:14:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][100/1251]	eta 0:09:08 lr 0.000768	time 0.4265 (0.4769)	loss 0.4834 (0.5059)	grad_norm 0.0057 (0.0109)	mem 18243MB
[2022-02-05 17:16:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][200/1251]	eta 0:11:38 lr 0.000768	time 0.4268 (0.6645)	loss 0.5068 (0.5065)	grad_norm 0.0053 (0.0105)	mem 18243MB
[2022-02-05 17:17:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][300/1251]	eta 0:12:06 lr 0.000768	time 0.5157 (0.7636)	loss 0.4987 (0.5058)	grad_norm 0.0068 (0.0111)	mem 18243MB
[2022-02-05 17:19:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][400/1251]	eta 0:11:13 lr 0.000767	time 0.4169 (0.7910)	loss 0.5224 (0.5060)	grad_norm 0.0052 (0.0109)	mem 18243MB
[2022-02-05 17:20:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][500/1251]	eta 0:09:47 lr 0.000767	time 0.5883 (0.7817)	loss 0.4828 (0.5060)	grad_norm 0.0175 (0.0106)	mem 18243MB
[2022-02-05 17:21:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][600/1251]	eta 0:08:12 lr 0.000766	time 3.2032 (0.7566)	loss 0.5117 (0.5061)	grad_norm 0.0134 (0.0104)	mem 18243MB
[2022-02-05 17:23:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][700/1251]	eta 0:07:20 lr 0.000766	time 0.4028 (0.8003)	loss 0.5239 (0.5060)	grad_norm 0.0044 (0.0108)	mem 18243MB
[2022-02-05 17:25:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][800/1251]	eta 0:06:15 lr 0.000766	time 0.3985 (0.8326)	loss 0.5014 (0.5062)	grad_norm 0.0114 (0.0110)	mem 18243MB
[2022-02-05 17:26:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][900/1251]	eta 0:04:58 lr 0.000765	time 0.6148 (0.8492)	loss 0.5003 (0.5062)	grad_norm 0.0027 (0.0109)	mem 18243MB
[2022-02-05 17:28:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1000/1251]	eta 0:03:36 lr 0.000765	time 0.4273 (0.8616)	loss 0.4993 (0.5062)	grad_norm 0.0140 (0.0108)	mem 18243MB
[2022-02-05 17:30:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1100/1251]	eta 0:02:12 lr 0.000764	time 0.4532 (0.8779)	loss 0.5004 (0.5061)	grad_norm 0.0107 (0.0109)	mem 18243MB
[2022-02-05 17:31:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1200/1251]	eta 0:00:44 lr 0.000764	time 0.5378 (0.8783)	loss 0.4941 (0.5061)	grad_norm 0.0027 (0.0106)	mem 18243MB
[2022-02-05 17:32:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 13 training takes 0:18:16
[2022-02-05 17:32:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][0/1251]	eta 1:28:50 lr 0.000764	time 4.2612 (4.2612)	loss 0.4906 (0.4906)	grad_norm 0.0057 (0.0057)	mem 18243MB
[2022-02-05 17:33:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][100/1251]	eta 0:09:30 lr 0.000763	time 0.4356 (0.4958)	loss 0.4973 (0.5057)	grad_norm 0.0031 (0.0087)	mem 18243MB
[2022-02-05 17:34:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][200/1251]	eta 0:11:46 lr 0.000763	time 0.4247 (0.6724)	loss 0.5087 (0.5059)	grad_norm 0.0055 (0.0088)	mem 18243MB
[2022-02-05 17:36:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][300/1251]	eta 0:13:02 lr 0.000763	time 0.5249 (0.8233)	loss 0.5212 (0.5071)	grad_norm 0.0065 (0.0095)	mem 18243MB
[2022-02-05 17:37:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][400/1251]	eta 0:12:15 lr 0.000762	time 0.6521 (0.8638)	loss 0.4970 (0.5065)	grad_norm 0.0145 (0.0094)	mem 18243MB
[2022-02-05 17:39:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][500/1251]	eta 0:11:08 lr 0.000762	time 1.7114 (0.8902)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][600/1251]	eta 0:09:49 lr 0.000761	time 3.6215 (0.9061)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:42:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][700/1251]	eta 0:08:23 lr 0.000761	time 0.5390 (0.9145)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:44:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][800/1251]	eta 0:06:54 lr 0.000761	time 0.8262 (0.9181)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:46:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][900/1251]	eta 0:05:25 lr 0.000760	time 0.7292 (0.9266)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:47:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1000/1251]	eta 0:03:53 lr 0.000760	time 0.5330 (0.9303)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:49:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1100/1251]	eta 0:02:20 lr 0.000759	time 1.6054 (0.9295)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:50:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1200/1251]	eta 0:00:47 lr 0.000759	time 0.4148 (0.9302)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:51:29 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 14 training takes 0:19:19
[2022-02-05 17:51:33 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][0/1251]	eta 1:25:26 lr 0.000759	time 4.0980 (4.0980)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][100/1251]	eta 0:09:03 lr 0.000758	time 0.4545 (0.4721)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:53:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][200/1251]	eta 0:11:56 lr 0.000758	time 0.4286 (0.6820)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:54:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][300/1251]	eta 0:10:23 lr 0.000757	time 0.4350 (0.6558)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB

I did not modify any of the configs except for specifying --accumulation-steps 2 from the command line to fit in memory on an 8-GPU machine. I'm using CUDA 11.1, CUDNN 8 and Pytorch 1.9.0 (which is sufficiently new). Could you help take a look what went wrong and how to fix this?

Thank you!

@xiaofei05
Copy link

I encounter the same problem. How do you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants