You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 22): INFO >>>>>>>>>> Build Optimizer for Pre-training Stage
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 27): INFO No weight decay: {'encoder.mask_token', 'encoder.absolute_pos_embed'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 30): INFO No weight decay keywords: {'encoder.relative_position_bias_table'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 63): INFO No decay params: ['encoder.mask_token', 'encoder.patch_embed.proj.bias', 'encoder.patch_embed.norm.weight', 'encoder.patch_embed.norm.bias', 'encoder.layers.0.blocks.0.norm1.weight', 'encoder.layers.0.blocks.0.norm1.bias', 'encoder.layers.0.blocks.0.attn.qkv.bias', 'encoder.layers.0.blocks.0.attn.proj.bias', 'encoder.layers.0.blocks.0.norm2.weight', 'encoder.layers.0.blocks.0.norm2.bias', 'encoder.layers.0.blocks.0.mlp.fc1.bias', 'encoder.layers.0.blocks.0.mlp.fc2.bias', 'encoder.layers.0.blocks.1.norm1.weight', 'encoder.layers.0.blocks.1.norm1.bias', 'encoder.layers.0.blocks.1.attn.qkv.bias', 'encoder.layers.0.blocks.1.attn.proj.bias', 'encoder.layers.0.blocks.1.norm2.weight', 'encoder.layers.0.blocks.1.norm2.bias', 'encoder.layers.0.blocks.1.mlp.fc1.bias', 'encoder.layers.0.blocks.1.mlp.fc2.bias', 'encoder.layers.0.downsample.norm.weight', 'encoder.layers.0.downsample.norm.bias', 'encoder.layers.1.blocks.0.norm1.weight', 'encoder.layers.1.blocks.0.norm1.bias', 'encoder.layers.1.blocks.0.attn.qkv.bias', 'encoder.layers.1.blocks.0.attn.proj.bias', 'encoder.layers.1.blocks.0.norm2.weight', 'encoder.layers.1.blocks.0.norm2.bias', 'encoder.layers.1.blocks.0.mlp.fc1.bias', 'encoder.layers.1.blocks.0.mlp.fc2.bias', 'encoder.layers.1.blocks.1.norm1.weight', 'encoder.layers.1.blocks.1.norm1.bias', 'encoder.layers.1.blocks.1.attn.qkv.bias', 'encoder.layers.1.blocks.1.attn.proj.bias', 'encoder.layers.1.blocks.1.norm2.weight', 'encoder.layers.1.blocks.1.norm2.bias', 'encoder.layers.1.blocks.1.mlp.fc1.bias', 'encoder.layers.1.blocks.1.mlp.fc2.bias', 'encoder.layers.1.downsample.norm.weight', 'encoder.layers.1.downsample.norm.bias', 'encoder.layers.2.blocks.0.norm1.weight', 'encoder.layers.2.blocks.0.norm1.bias', 'encoder.layers.2.blocks.0.attn.qkv.bias', 'encoder.layers.2.blocks.0.attn.proj.bias', 'encoder.layers.2.blocks.0.norm2.weight', 'encoder.layers.2.blocks.0.norm2.bias', 'encoder.layers.2.blocks.0.mlp.fc1.bias', 'encoder.layers.2.blocks.0.mlp.fc2.bias', 'encoder.layers.2.blocks.1.norm1.weight', 'encoder.layers.2.blocks.1.norm1.bias', 'encoder.layers.2.blocks.1.attn.qkv.bias', 'encoder.layers.2.blocks.1.attn.proj.bias', 'encoder.layers.2.blocks.1.norm2.weight', 'encoder.layers.2.blocks.1.norm2.bias', 'encoder.layers.2.blocks.1.mlp.fc1.bias', 'encoder.layers.2.blocks.1.mlp.fc2.bias', 'encoder.layers.2.blocks.2.norm1.weight', 'encoder.layers.2.blocks.2.norm1.bias', 'encoder.layers.2.blocks.2.attn.qkv.bias', 'encoder.layers.2.blocks.2.attn.proj.bias', 'encoder.layers.2.blocks.2.norm2.weight', 'encoder.layers.2.blocks.2.norm2.bias', 'encoder.layers.2.blocks.2.mlp.fc1.bias', 'encoder.layers.2.blocks.2.mlp.fc2.bias', 'encoder.layers.2.blocks.3.norm1.weight', 'encoder.layers.2.blocks.3.norm1.bias', 'encoder.layers.2.blocks.3.attn.qkv.bias', 'encoder.layers.2.blocks.3.attn.proj.bias', 'encoder.layers.2.blocks.3.norm2.weight', 'encoder.layers.2.blocks.3.norm2.bias', 'encoder.layers.2.blocks.3.mlp.fc1.bias', 'encoder.layers.2.blocks.3.mlp.fc2.bias', 'encoder.layers.2.blocks.4.norm1.weight', 'encoder.layers.2.blocks.4.norm1.bias', 'encoder.layers.2.blocks.4.attn.qkv.bias', 'encoder.layers.2.blocks.4.attn.proj.bias', 'encoder.layers.2.blocks.4.norm2.weight', 'encoder.layers.2.blocks.4.norm2.bias', 'encoder.layers.2.blocks.4.mlp.fc1.bias', 'encoder.layers.2.blocks.4.mlp.fc2.bias', 'encoder.layers.2.blocks.5.norm1.weight', 'encoder.layers.2.blocks.5.norm1.bias', 'encoder.layers.2.blocks.5.attn.qkv.bias', 'encoder.layers.2.blocks.5.attn.proj.bias', 'encoder.layers.2.blocks.5.norm2.weight', 'encoder.layers.2.blocks.5.norm2.bias', 'encoder.layers.2.blocks.5.mlp.fc1.bias', 'encoder.layers.2.blocks.5.mlp.fc2.bias', 'encoder.layers.2.blocks.6.norm1.weight', 'encoder.layers.2.blocks.6.norm1.bias', 'encoder.layers.2.blocks.6.attn.qkv.bias', 'encoder.layers.2.blocks.6.attn.proj.bias', 'encoder.layers.2.blocks.6.norm2.weight', 'encoder.layers.2.blocks.6.norm2.bias', 'encoder.layers.2.blocks.6.mlp.fc1.bias', 'encoder.layers.2.blocks.6.mlp.fc2.bias', 'encoder.layers.2.blocks.7.norm1.weight', 'encoder.layers.2.blocks.7.norm1.bias', 'encoder.layers.2.blocks.7.attn.qkv.bias', 'encoder.layers.2.blocks.7.attn.proj.bias', 'encoder.layers.2.blocks.7.norm2.weight', 'encoder.layers.2.blocks.7.norm2.bias', 'encoder.layers.2.blocks.7.mlp.fc1.bias', 'encoder.layers.2.blocks.7.mlp.fc2.bias', 'encoder.layers.2.blocks.8.norm1.weight', 'encoder.layers.2.blocks.8.norm1.bias', 'encoder.layers.2.blocks.8.attn.qkv.bias', 'encoder.layers.2.blocks.8.attn.proj.bias', 'encoder.layers.2.blocks.8.norm2.weight', 'encoder.layers.2.blocks.8.norm2.bias', 'encoder.layers.2.blocks.8.mlp.fc1.bias', 'encoder.layers.2.blocks.8.mlp.fc2.bias', 'encoder.layers.2.blocks.9.norm1.weight', 'encoder.layers.2.blocks.9.norm1.bias', 'encoder.layers.2.blocks.9.attn.qkv.bias', 'encoder.layers.2.blocks.9.attn.proj.bias', 'encoder.layers.2.blocks.9.norm2.weight', 'encoder.layers.2.blocks.9.norm2.bias', 'encoder.layers.2.blocks.9.mlp.fc1.bias', 'encoder.layers.2.blocks.9.mlp.fc2.bias', 'encoder.layers.2.blocks.10.norm1.weight', 'encoder.layers.2.blocks.10.norm1.bias', 'encoder.layers.2.blocks.10.attn.qkv.bias', 'encoder.layers.2.blocks.10.attn.proj.bias', 'encoder.layers.2.blocks.10.norm2.weight', 'encoder.layers.2.blocks.10.norm2.bias', 'encoder.layers.2.blocks.10.mlp.fc1.bias', 'encoder.layers.2.blocks.10.mlp.fc2.bias', 'encoder.layers.2.blocks.11.norm1.weight', 'encoder.layers.2.blocks.11.norm1.bias', 'encoder.layers.2.blocks.11.attn.qkv.bias', 'encoder.layers.2.blocks.11.attn.proj.bias', 'encoder.layers.2.blocks.11.norm2.weight', 'encoder.layers.2.blocks.11.norm2.bias', 'encoder.layers.2.blocks.11.mlp.fc1.bias', 'encoder.layers.2.blocks.11.mlp.fc2.bias', 'encoder.layers.2.blocks.12.norm1.weight', 'encoder.layers.2.blocks.12.norm1.bias', 'encoder.layers.2.blocks.12.attn.qkv.bias', 'encoder.layers.2.blocks.12.attn.proj.bias', 'encoder.layers.2.blocks.12.norm2.weight', 'encoder.layers.2.blocks.12.norm2.bias', 'encoder.layers.2.blocks.12.mlp.fc1.bias', 'encoder.layers.2.blocks.12.mlp.fc2.bias', 'encoder.layers.2.blocks.13.norm1.weight', 'encoder.layers.2.blocks.13.norm1.bias', 'encoder.layers.2.blocks.13.attn.qkv.bias', 'encoder.layers.2.blocks.13.attn.proj.bias', 'encoder.layers.2.blocks.13.norm2.weight', 'encoder.layers.2.blocks.13.norm2.bias', 'encoder.layers.2.blocks.13.mlp.fc1.bias', 'encoder.layers.2.blocks.13.mlp.fc2.bias', 'encoder.layers.2.blocks.14.norm1.weight', 'encoder.layers.2.blocks.14.norm1.bias', 'encoder.layers.2.blocks.14.attn.qkv.bias', 'encoder.layers.2.blocks.14.attn.proj.bias', 'encoder.layers.2.blocks.14.norm2.weight', 'encoder.layers.2.blocks.14.norm2.bias', 'encoder.layers.2.blocks.14.mlp.fc1.bias', 'encoder.layers.2.blocks.14.mlp.fc2.bias', 'encoder.layers.2.blocks.15.norm1.weight', 'encoder.layers.2.blocks.15.norm1.bias', 'encoder.layers.2.blocks.15.attn.qkv.bias', 'encoder.layers.2.blocks.15.attn.proj.bias', 'encoder.layers.2.blocks.15.norm2.weight', 'encoder.layers.2.blocks.15.norm2.bias', 'encoder.layers.2.blocks.15.mlp.fc1.bias', 'encoder.layers.2.blocks.15.mlp.fc2.bias', 'encoder.layers.2.blocks.16.norm1.weight', 'encoder.layers.2.blocks.16.norm1.bias', 'encoder.layers.2.blocks.16.attn.qkv.bias', 'encoder.layers.2.blocks.16.attn.proj.bias', 'encoder.layers.2.blocks.16.norm2.weight', 'encoder.layers.2.blocks.16.norm2.bias', 'encoder.layers.2.blocks.16.mlp.fc1.bias', 'encoder.layers.2.blocks.16.mlp.fc2.bias', 'encoder.layers.2.blocks.17.norm1.weight', 'encoder.layers.2.blocks.17.norm1.bias', 'encoder.layers.2.blocks.17.attn.qkv.bias', 'encoder.layers.2.blocks.17.attn.proj.bias', 'encoder.layers.2.blocks.17.norm2.weight', 'encoder.layers.2.blocks.17.norm2.bias', 'encoder.layers.2.blocks.17.mlp.fc1.bias', 'encoder.layers.2.blocks.17.mlp.fc2.bias', 'encoder.layers.2.downsample.norm.weight', 'encoder.layers.2.downsample.norm.bias', 'encoder.layers.3.blocks.0.norm1.weight', 'encoder.layers.3.blocks.0.norm1.bias', 'encoder.layers.3.blocks.0.attn.qkv.bias', 'encoder.layers.3.blocks.0.attn.proj.bias', 'encoder.layers.3.blocks.0.norm2.weight', 'encoder.layers.3.blocks.0.norm2.bias', 'encoder.layers.3.blocks.0.mlp.fc1.bias', 'encoder.layers.3.blocks.0.mlp.fc2.bias', 'encoder.layers.3.blocks.1.norm1.weight', 'encoder.layers.3.blocks.1.norm1.bias', 'encoder.layers.3.blocks.1.attn.qkv.bias', 'encoder.layers.3.blocks.1.attn.proj.bias', 'encoder.layers.3.blocks.1.norm2.weight', 'encoder.layers.3.blocks.1.norm2.bias', 'encoder.layers.3.blocks.1.mlp.fc1.bias', 'encoder.layers.3.blocks.1.mlp.fc2.bias', 'encoder.norm.weight', 'encoder.norm.bias', 'decoder.0.bias']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 64): INFO Has decay params: ['encoder.patch_embed.proj.weight', 'encoder.layers.0.blocks.0.attn.relative_position_bias_table', 'encoder.layers.0.blocks.0.attn.qkv.weight', 'encoder.layers.0.blocks.0.attn.proj.weight', 'encoder.layers.0.blocks.0.mlp.fc1.weight', 'encoder.layers.0.blocks.0.mlp.fc2.weight', 'encoder.layers.0.blocks.1.attn.relative_position_bias_table', 'encoder.layers.0.blocks.1.attn.qkv.weight', 'encoder.layers.0.blocks.1.attn.proj.weight', 'encoder.layers.0.blocks.1.mlp.fc1.weight', 'encoder.layers.0.blocks.1.mlp.fc2.weight', 'encoder.layers.0.downsample.reduction.weight', 'encoder.layers.1.blocks.0.attn.relative_position_bias_table', 'encoder.layers.1.blocks.0.attn.qkv.weight', 'encoder.layers.1.blocks.0.attn.proj.weight', 'encoder.layers.1.blocks.0.mlp.fc1.weight', 'encoder.layers.1.blocks.0.mlp.fc2.weight', 'encoder.layers.1.blocks.1.attn.relative_position_bias_table', 'encoder.layers.1.blocks.1.attn.qkv.weight', 'encoder.layers.1.blocks.1.attn.proj.weight', 'encoder.layers.1.blocks.1.mlp.fc1.weight', 'encoder.layers.1.blocks.1.mlp.fc2.weight', 'encoder.layers.1.downsample.reduction.weight', 'encoder.layers.2.blocks.0.attn.relative_position_bias_table', 'encoder.layers.2.blocks.0.attn.qkv.weight', 'encoder.layers.2.blocks.0.attn.proj.weight', 'encoder.layers.2.blocks.0.mlp.fc1.weight', 'encoder.layers.2.blocks.0.mlp.fc2.weight', 'encoder.layers.2.blocks.1.attn.relative_position_bias_table', 'encoder.layers.2.blocks.1.attn.qkv.weight', 'encoder.layers.2.blocks.1.attn.proj.weight', 'encoder.layers.2.blocks.1.mlp.fc1.weight', 'encoder.layers.2.blocks.1.mlp.fc2.weight', 'encoder.layers.2.blocks.2.attn.relative_position_bias_table', 'encoder.layers.2.blocks.2.attn.qkv.weight', 'encoder.layers.2.blocks.2.attn.proj.weight', 'encoder.layers.2.blocks.2.mlp.fc1.weight', 'encoder.layers.2.blocks.2.mlp.fc2.weight', 'encoder.layers.2.blocks.3.attn.relative_position_bias_table', 'encoder.layers.2.blocks.3.attn.qkv.weight', 'encoder.layers.2.blocks.3.attn.proj.weight', 'encoder.layers.2.blocks.3.mlp.fc1.weight', 'encoder.layers.2.blocks.3.mlp.fc2.weight', 'encoder.layers.2.blocks.4.attn.relative_position_bias_table', 'encoder.layers.2.blocks.4.attn.qkv.weight', 'encoder.layers.2.blocks.4.attn.proj.weight', 'encoder.layers.2.blocks.4.mlp.fc1.weight', 'encoder.layers.2.blocks.4.mlp.fc2.weight', 'encoder.layers.2.blocks.5.attn.relative_position_bias_table', 'encoder.layers.2.blocks.5.attn.qkv.weight', 'encoder.layers.2.blocks.5.attn.proj.weight', 'encoder.layers.2.blocks.5.mlp.fc1.weight', 'encoder.layers.2.blocks.5.mlp.fc2.weight', 'encoder.layers.2.blocks.6.attn.relative_position_bias_table', 'encoder.layers.2.blocks.6.attn.qkv.weight', 'encoder.layers.2.blocks.6.attn.proj.weight', 'encoder.layers.2.blocks.6.mlp.fc1.weight', 'encoder.layers.2.blocks.6.mlp.fc2.weight', 'encoder.layers.2.blocks.7.attn.relative_position_bias_table', 'encoder.layers.2.blocks.7.attn.qkv.weight', 'encoder.layers.2.blocks.7.attn.proj.weight', 'encoder.layers.2.blocks.7.mlp.fc1.weight', 'encoder.layers.2.blocks.7.mlp.fc2.weight', 'encoder.layers.2.blocks.8.attn.relative_position_bias_table', 'encoder.layers.2.blocks.8.attn.qkv.weight', 'encoder.layers.2.blocks.8.attn.proj.weight', 'encoder.layers.2.blocks.8.mlp.fc1.weight', 'encoder.layers.2.blocks.8.mlp.fc2.weight', 'encoder.layers.2.blocks.9.attn.relative_position_bias_table', 'encoder.layers.2.blocks.9.attn.qkv.weight', 'encoder.layers.2.blocks.9.attn.proj.weight', 'encoder.layers.2.blocks.9.mlp.fc1.weight', 'encoder.layers.2.blocks.9.mlp.fc2.weight', 'encoder.layers.2.blocks.10.attn.relative_position_bias_table', 'encoder.layers.2.blocks.10.attn.qkv.weight', 'encoder.layers.2.blocks.10.attn.proj.weight', 'encoder.layers.2.blocks.10.mlp.fc1.weight', 'encoder.layers.2.blocks.10.mlp.fc2.weight', 'encoder.layers.2.blocks.11.attn.relative_position_bias_table', 'encoder.layers.2.blocks.11.attn.qkv.weight', 'encoder.layers.2.blocks.11.attn.proj.weight', 'encoder.layers.2.blocks.11.mlp.fc1.weight', 'encoder.layers.2.blocks.11.mlp.fc2.weight', 'encoder.layers.2.blocks.12.attn.relative_position_bias_table', 'encoder.layers.2.blocks.12.attn.qkv.weight', 'encoder.layers.2.blocks.12.attn.proj.weight', 'encoder.layers.2.blocks.12.mlp.fc1.weight', 'encoder.layers.2.blocks.12.mlp.fc2.weight', 'encoder.layers.2.blocks.13.attn.relative_position_bias_table', 'encoder.layers.2.blocks.13.attn.qkv.weight', 'encoder.layers.2.blocks.13.attn.proj.weight', 'encoder.layers.2.blocks.13.mlp.fc1.weight', 'encoder.layers.2.blocks.13.mlp.fc2.weight', 'encoder.layers.2.blocks.14.attn.relative_position_bias_table', 'encoder.layers.2.blocks.14.attn.qkv.weight', 'encoder.layers.2.blocks.14.attn.proj.weight', 'encoder.layers.2.blocks.14.mlp.fc1.weight', 'encoder.layers.2.blocks.14.mlp.fc2.weight', 'encoder.layers.2.blocks.15.attn.relative_position_bias_table', 'encoder.layers.2.blocks.15.attn.qkv.weight', 'encoder.layers.2.blocks.15.attn.proj.weight', 'encoder.layers.2.blocks.15.mlp.fc1.weight', 'encoder.layers.2.blocks.15.mlp.fc2.weight', 'encoder.layers.2.blocks.16.attn.relative_position_bias_table', 'encoder.layers.2.blocks.16.attn.qkv.weight', 'encoder.layers.2.blocks.16.attn.proj.weight', 'encoder.layers.2.blocks.16.mlp.fc1.weight', 'encoder.layers.2.blocks.16.mlp.fc2.weight', 'encoder.layers.2.blocks.17.attn.relative_position_bias_table', 'encoder.layers.2.blocks.17.attn.qkv.weight', 'encoder.layers.2.blocks.17.attn.proj.weight', 'encoder.layers.2.blocks.17.mlp.fc1.weight', 'encoder.layers.2.blocks.17.mlp.fc2.weight', 'encoder.layers.2.downsample.reduction.weight', 'encoder.layers.3.blocks.0.attn.relative_position_bias_table', 'encoder.layers.3.blocks.0.attn.qkv.weight', 'encoder.layers.3.blocks.0.attn.proj.weight', 'encoder.layers.3.blocks.0.mlp.fc1.weight', 'encoder.layers.3.blocks.0.mlp.fc2.weight', 'encoder.layers.3.blocks.1.attn.relative_position_bias_table', 'encoder.layers.3.blocks.1.attn.qkv.weight', 'encoder.layers.3.blocks.1.attn.proj.weight', 'encoder.layers.3.blocks.1.mlp.fc1.weight', 'encoder.layers.3.blocks.1.mlp.fc2.weight', 'decoder.0.weight']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 43): INFO AdamW (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.0008
weight_decay: 0.05
Parameter Group 1
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.0008
weight_decay: 0.0
)
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 83): INFO number of params: 89874104
[2022-02-05 09:22:26 simmim_pretrain] (utils.py 81): INFO All checkpoints founded in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep: []
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 100): INFO no checkpoint found in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep, ignoring auto resume
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 105): INFO Start training
[2022-02-05 09:24:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][0/1251] eta 1 day, 15:53:49 lr 0.000004 time 114.8121 (114.8121) loss 0.5543 (0.5543) grad_norm 0.2902 (0.2902) mem 17192MB
[2022-02-05 09:45:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][100/1251] eta 4:24:36 lr 0.000010 time 0.3949 (13.7934) loss 0.4499 (0.4969) grad_norm 1.0401 (0.2900) mem 18238MB
[2022-02-05 10:06:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][200/1251] eta 3:52:30 lr 0.000017 time 75.5072 (13.2732) loss 0.3752 (0.4565) grad_norm 2.8639 (1.6425) mem 18238MB
[2022-02-05 10:28:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][300/1251] eta 3:27:27 lr 0.000023 time 0.3941 (13.0894) loss 0.3553 (0.4264) grad_norm 2.0591 (2.8358) mem 18238MB
[2022-02-05 10:48:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][400/1251] eta 3:02:30 lr 0.000029 time 57.4084 (12.8679) loss 0.3173 (0.4040) grad_norm 1.1405 (3.6005) mem 18238MB
[2022-02-05 11:08:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][500/1251] eta 2:38:59 lr 0.000036 time 0.3942 (12.7019) loss 0.3129 (0.3879) grad_norm 4.7302 (4.0156) mem 18238MB
[2022-02-05 11:29:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][600/1251] eta 2:17:56 lr 0.000042 time 86.9880 (12.7132) loss 0.3042 (0.3741) grad_norm 2.4576 (4.0197) mem 18238MB
[2022-02-05 11:49:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][700/1251] eta 1:55:17 lr 0.000048 time 0.3943 (12.5542) loss 0.2920 (0.3630) grad_norm 4.6089 (4.0017) mem 18239MB
[2022-02-05 12:09:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][800/1251] eta 1:34:10 lr 0.000055 time 73.9639 (12.5290) loss 0.2979 (0.3536) grad_norm 3.4510 (3.9055) mem 18239MB
[2022-02-05 12:29:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][900/1251] eta 1:13:00 lr 0.000061 time 0.3981 (12.4787) loss 0.2693 (0.3459) grad_norm 1.5775 (3.8091) mem 18239MB
[2022-02-05 12:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1000/1251] eta 0:52:00 lr 0.000068 time 18.3918 (12.4334) loss 0.2786 (0.3394) grad_norm 1.2491 (3.7356) mem 18239MB
[2022-02-05 13:10:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1100/1251] eta 0:31:18 lr 0.000074 time 0.4033 (12.4426) loss 0.2725 (0.3335) grad_norm 2.2311 (3.6312) mem 18239MB
[2022-02-05 13:30:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1200/1251] eta 0:10:32 lr 0.000080 time 31.6500 (12.4020) loss 0.2715 (0.3286) grad_norm 1.2720 (3.5534) mem 18239MB
[2022-02-05 13:39:44 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 0 training takes 4:17:18
[2022-02-05 13:39:44 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saving......
[2022-02-05 13:39:46 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saved !!!
[2022-02-05 13:39:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][0/1251] eta 1:01:34 lr 0.000083 time 2.9530 (2.9530) loss 0.2705 (0.2705) grad_norm 0.8280 (0.8280) mem 18239MB
[2022-02-05 13:41:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][100/1251] eta 0:14:17 lr 0.000090 time 0.6114 (0.7453) loss 0.2802 (0.2693) grad_norm 3.6450 (2.3059) mem 18239MB
[2022-02-05 13:42:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][200/1251] eta 0:14:40 lr 0.000096 time 0.7879 (0.8375) loss 0.2727 (0.2691) grad_norm 2.2279 (2.2994) mem 18239MB
[2022-02-05 13:44:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][300/1251] eta 0:13:41 lr 0.000103 time 0.4401 (0.8638) loss 0.2757 (0.2682) grad_norm 1.1539 (2.2752) mem 18239MB
[2022-02-05 13:45:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][400/1251] eta 0:11:34 lr 0.000109 time 0.4306 (0.8162) loss 0.2588 (0.2672) grad_norm 1.2593 (2.2458) mem 18239MB
[2022-02-05 13:46:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][500/1251] eta 0:10:38 lr 0.000115 time 0.5900 (0.8503) loss 0.2552 (0.2668) grad_norm 1.4727 (2.2056) mem 18240MB
[2022-02-05 13:47:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][600/1251] eta 0:08:45 lr 0.000122 time 0.4254 (0.8066) loss 0.2584 (0.2662) grad_norm 1.1834 (2.1712) mem 18240MB
[2022-02-05 13:48:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][700/1251] eta 0:06:56 lr 0.000128 time 0.4058 (0.7558) loss 0.2641 (0.2653) grad_norm 1.1315 (2.1186) mem 18240MB
[2022-02-05 13:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][800/1251] eta 0:05:41 lr 0.000134 time 0.4352 (0.7570) loss 0.2742 (0.2649) grad_norm 0.7488 (2.0964) mem 18240MB
[2022-02-05 13:51:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][900/1251] eta 0:04:35 lr 0.000141 time 0.4130 (0.7842) loss 0.2476 (0.2644) grad_norm 0.6401 (2.0539) mem 18240MB
[2022-02-05 13:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1000/1251] eta 0:03:08 lr 0.000147 time 0.4153 (0.7508) loss 0.2717 (0.2639) grad_norm 2.2334 (2.0098) mem 18240MB
[2022-02-05 13:53:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1100/1251] eta 0:01:51 lr 0.000154 time 0.4521 (0.7393) loss 0.2551 (0.2633) grad_norm 1.4980 (1.9817) mem 18240MB
[2022-02-05 13:55:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1200/1251] eta 0:00:39 lr 0.000160 time 0.4667 (0.7788) loss 0.2664 (0.2627) grad_norm 0.7340 (1.9572) mem 18240MB
[2022-02-05 13:56:06 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 1 training takes 0:16:20
[2022-02-05 13:56:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][0/1251] eta 1:16:36 lr 0.000163 time 3.6739 (3.6739) loss 0.2620 (0.2620) grad_norm 0.9611 (0.9611) mem 18240MB
[2022-02-05 13:56:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][100/1251] eta 0:09:22 lr 0.000169 time 0.4276 (0.4883) loss 0.2562 (0.2552) grad_norm 0.5311 (1.6903) mem 18240MB
[2022-02-05 13:58:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][200/1251] eta 0:11:20 lr 0.000176 time 0.4207 (0.6473) loss 0.2618 (0.2542) grad_norm 0.6081 (1.6235) mem 18240MB
[2022-02-05 13:59:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][300/1251] eta 0:09:36 lr 0.000182 time 0.4451 (0.6061) loss 0.2528 (0.2531) grad_norm 0.4520 (1.6033) mem 18240MB
[2022-02-05 14:00:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][400/1251] eta 0:09:29 lr 0.000189 time 0.4445 (0.6689) loss 0.2413 (0.2525) grad_norm 0.6562 (1.5654) mem 18240MB
[2022-02-05 14:01:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][500/1251] eta 0:08:08 lr 0.000195 time 2.1151 (0.6503) loss 0.2539 (0.2520) grad_norm 1.8790 (1.5394) mem 18240MB
[2022-02-05 14:03:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][600/1251] eta 0:07:55 lr 0.000201 time 0.5791 (0.7308) loss 0.2295 (0.2516) grad_norm 1.3565 (1.5373) mem 18240MB
[2022-02-05 14:04:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][700/1251] eta 0:06:53 lr 0.000208 time 0.4240 (0.7508) loss 0.2464 (0.2511) grad_norm 0.5189 (1.5236) mem 18240MB
[2022-02-05 14:06:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][800/1251] eta 0:05:48 lr 0.000214 time 2.2608 (0.7731) loss 0.2481 (0.2507) grad_norm 0.4695 (1.4909) mem 18240MB
[2022-02-05 14:08:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][900/1251] eta 0:04:40 lr 0.000220 time 0.4068 (0.7993) loss 0.2637 (0.2503) grad_norm 1.5514 (1.4829) mem 18240MB
[2022-02-05 14:09:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1000/1251] eta 0:03:23 lr 0.000227 time 1.3535 (0.8115) loss 0.2443 (0.2498) grad_norm 0.9744 (1.4653) mem 18240MB
[2022-02-05 14:11:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1100/1251] eta 0:02:04 lr 0.000233 time 0.4410 (0.8271) loss 0.2425 (0.2491) grad_norm 1.9947 (1.4427) mem 18240MB
[2022-02-05 14:12:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1200/1251] eta 0:00:42 lr 0.000239 time 0.4275 (0.8318) loss 0.2465 (0.2486) grad_norm 0.6265 (1.4366) mem 18240MB
[2022-02-05 14:13:21 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 2 training takes 0:17:15
[2022-02-05 14:13:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][0/1251] eta 1:24:40 lr 0.000243 time 4.0614 (4.0614) loss 0.2433 (0.2433) grad_norm 0.7067 (0.7067) mem 18240MB
[2022-02-05 14:14:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][100/1251] eta 0:09:27 lr 0.000249 time 0.4092 (0.4935) loss 0.2476 (0.2430) grad_norm 1.1159 (1.3134) mem 18240MB
[2022-02-05 14:15:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][200/1251] eta 0:10:07 lr 0.000255 time 0.3964 (0.5784) loss 0.2400 (0.2425) grad_norm 0.3384 (1.2386) mem 18240MB
[2022-02-05 14:16:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][300/1251] eta 0:08:53 lr 0.000262 time 0.4966 (0.5605) loss 0.2404 (0.2416) grad_norm 0.3401 (1.1964) mem 18240MB
[2022-02-05 14:17:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][400/1251] eta 0:09:07 lr 0.000268 time 0.7116 (0.6430) loss 0.2314 (0.2411) grad_norm 1.4900 (1.2040) mem 18240MB
[2022-02-05 14:19:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][500/1251] eta 0:09:34 lr 0.000275 time 0.4066 (0.7646) loss 0.2282 (0.2405) grad_norm 0.5011 (1.2036) mem 18240MB
[2022-02-05 14:21:30 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][600/1251] eta 0:08:49 lr 0.000281 time 0.4160 (0.8126) loss 0.2414 (0.2404) grad_norm 0.9795 (1.1974) mem 18240MB
[2022-02-05 14:23:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][700/1251] eta 0:07:41 lr 0.000287 time 0.4862 (0.8377) loss 0.2334 (0.2401) grad_norm 0.4512 (1.1759) mem 18240MB
[2022-02-05 14:24:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][800/1251] eta 0:06:27 lr 0.000294 time 0.5067 (0.8583) loss 0.2418 (0.2398) grad_norm 1.2394 (1.1746) mem 18240MB
[2022-02-05 14:26:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][900/1251] eta 0:05:03 lr 0.000300 time 0.4366 (0.8635) loss 0.2361 (0.2394) grad_norm 0.5397 (1.1654) mem 18240MB
[2022-02-05 14:27:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1000/1251] eta 0:03:37 lr 0.000306 time 1.1073 (0.8658) loss 0.2352 (0.2390) grad_norm 0.6021 (1.1541) mem 18241MB
[2022-02-05 14:29:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1100/1251] eta 0:02:08 lr 0.000313 time 0.5436 (0.8526) loss 0.2295 (0.2387) grad_norm 1.1236 (1.1382) mem 18241MB
[2022-02-05 14:30:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1200/1251] eta 0:00:44 lr 0.000319 time 0.4830 (0.8667) loss 0.2486 (0.2385) grad_norm 0.4466 (1.1277) mem 18241MB
[2022-02-05 14:31:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 3 training takes 0:18:11
[2022-02-05 14:31:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][0/1251] eta 1:30:19 lr 0.000322 time 4.3320 (4.3320) loss 0.2309 (0.2309) grad_norm 0.3473 (0.3473) mem 18241MB
[2022-02-05 14:32:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][100/1251] eta 0:09:42 lr 0.000329 time 0.4199 (0.5059) loss 0.2313 (0.2347) grad_norm 0.9780 (0.9537) mem 18241MB
[2022-02-05 14:33:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][200/1251] eta 0:10:42 lr 0.000335 time 0.4042 (0.6115) loss 0.2380 (0.2338) grad_norm 0.4685 (0.9641) mem 18241MB
[2022-02-05 14:35:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][300/1251] eta 0:12:57 lr 0.000341 time 0.4448 (0.8171) loss 0.2274 (0.2339) grad_norm 0.5854 (0.9808) mem 18241MB
[2022-02-05 14:37:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][400/1251] eta 0:12:15 lr 0.000348 time 0.9284 (0.8642) loss 0.2300 (0.2342) grad_norm 0.5273 (0.9884) mem 18241MB
[2022-02-05 14:38:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][500/1251] eta 0:10:11 lr 0.000354 time 0.4123 (0.8136) loss 0.2346 (0.2337) grad_norm 0.7111 (0.9791) mem 18241MB
[2022-02-05 14:39:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][600/1251] eta 0:09:01 lr 0.000361 time 0.4479 (0.8323) loss 0.2305 (0.2336) grad_norm 0.7723 (0.9726) mem 18241MB
[2022-02-05 14:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][700/1251] eta 0:07:37 lr 0.000367 time 1.2561 (0.8298) loss 0.2416 (0.2333) grad_norm 0.7113 (0.9652) mem 18241MB
[2022-02-05 14:43:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][800/1251] eta 0:06:29 lr 0.000373 time 2.0054 (0.8637) loss 0.2229 (0.2332) grad_norm 0.3053 (0.9582) mem 18241MB
[2022-02-05 14:44:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][900/1251] eta 0:05:07 lr 0.000380 time 2.2077 (0.8764) loss 0.2203 (0.2330) grad_norm 0.9912 (0.9536) mem 18241MB
[2022-02-05 14:46:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1000/1251] eta 0:03:41 lr 0.000386 time 0.4317 (0.8842) loss 0.2330 (0.2327) grad_norm 0.4332 (0.9454) mem 18241MB
[2022-02-05 14:47:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1100/1251] eta 0:02:09 lr 0.000392 time 1.9930 (0.8594) loss 0.2376 (0.2325) grad_norm 0.3494 (0.9425) mem 18241MB
[2022-02-05 14:49:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1200/1251] eta 0:00:44 lr 0.000399 time 3.0251 (0.8816) loss 0.2229 (0.2322) grad_norm 0.3280 (0.9404) mem 18241MB
[2022-02-05 14:49:56 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 4 training takes 0:18:23
[2022-02-05 14:50:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][0/1251] eta 1:20:39 lr 0.000402 time 3.8685 (3.8685) loss 0.2361 (0.2361) grad_norm 0.2441 (0.2441) mem 18241MB
[2022-02-05 14:50:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][100/1251] eta 0:09:10 lr 0.000408 time 0.4087 (0.4786) loss 0.2268 (0.2297) grad_norm 0.4077 (0.9384) mem 18241MB
[2022-02-05 14:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][200/1251] eta 0:11:06 lr 0.000415 time 0.4741 (0.6344) loss 0.2332 (0.2293) grad_norm 0.6293 (0.8776) mem 18241MB
[2022-02-05 14:53:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][300/1251] eta 0:09:43 lr 0.000421 time 0.4483 (0.6141) loss 0.4291 (0.2770) grad_norm 0.1236 (nan) mem 18241MB
[2022-02-05 14:54:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][400/1251] eta 0:09:25 lr 0.000427 time 0.4158 (0.6646) loss 0.4575 (0.3163) grad_norm 0.7630 (nan) mem 18241MB
[2022-02-05 14:55:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][500/1251] eta 0:07:59 lr 0.000434 time 0.4047 (0.6385) loss 0.4399 (0.3400) grad_norm 1.2759 (nan) mem 18241MB
[2022-02-05 14:57:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][600/1251] eta 0:07:54 lr 0.000440 time 0.3981 (0.7282) loss 0.5130 (0.3663) grad_norm 0.0916 (nan) mem 18241MB
[2022-02-05 14:58:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][700/1251] eta 0:06:42 lr 0.000446 time 0.4456 (0.7310) loss 0.4537 (0.3812) grad_norm 0.5641 (nan) mem 18241MB
[2022-02-05 14:59:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][800/1251] eta 0:05:14 lr 0.000453 time 0.4507 (0.6973) loss 0.5098 (0.3919) grad_norm 0.3081 (nan) mem 18241MB
[2022-02-05 15:01:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][900/1251] eta 0:04:22 lr 0.000459 time 0.4445 (0.7472) loss 0.4913 (0.4046) grad_norm 0.0084 (nan) mem 18241MB
[2022-02-05 15:03:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1000/1251] eta 0:03:16 lr 0.000466 time 3.5568 (0.7843) loss 0.5040 (0.4146) grad_norm 0.0242 (nan) mem 18241MB
[2022-02-05 15:04:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1100/1251] eta 0:01:58 lr 0.000472 time 0.4314 (0.7874) loss 0.4915 (0.4226) grad_norm 0.2421 (nan) mem 18241MB
[2022-02-05 15:05:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1200/1251] eta 0:00:38 lr 0.000478 time 0.4317 (0.7606) loss 0.5508 (0.4266) grad_norm 0.0683 (nan) mem 18241MB
[2022-02-05 15:05:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 5 training takes 0:15:37
[2022-02-05 15:05:33 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saving......
[2022-02-05 15:05:36 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saved !!!
[2022-02-05 15:05:40 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][0/1251] eta 1:23:58 lr 0.000481 time 4.0272 (4.0272) loss 0.4738 (0.4738) grad_norm 1.3644 (1.3644) mem 18241MB
[2022-02-05 15:07:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][100/1251] eta 0:16:57 lr 0.000488 time 0.4414 (0.8841) loss 0.4598 (0.4497) grad_norm 0.1361 (nan) mem 18241MB
[2022-02-05 15:08:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][200/1251] eta 0:16:22 lr 0.000494 time 0.4016 (0.9352) loss 0.5048 (0.4722) grad_norm 0.0122 (nan) mem 18241MB
[2022-02-05 15:10:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][300/1251] eta 0:14:59 lr 0.000501 time 1.8795 (0.9461) loss 0.4727 (0.4830) grad_norm 0.0052 (nan) mem 18241MB
[2022-02-05 15:12:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][400/1251] eta 0:13:44 lr 0.000507 time 1.0088 (0.9684) loss 0.4677 (0.4865) grad_norm 0.0896 (nan) mem 18241MB
[2022-02-05 15:13:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][500/1251] eta 0:11:58 lr 0.000513 time 0.4878 (0.9563) loss 0.5154 (0.4816) grad_norm 24.0200 (nan) mem 18241MB
[2022-02-05 15:15:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][600/1251] eta 0:10:28 lr 0.000520 time 0.4229 (0.9660) loss 0.4594 (0.4808) grad_norm 0.9102 (nan) mem 18243MB
[2022-02-05 15:16:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][700/1251] eta 0:08:53 lr 0.000526 time 6.5413 (0.9676) loss 0.4411 (0.4790) grad_norm 0.7869 (nan) mem 18243MB
[2022-02-05 15:18:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][800/1251] eta 0:07:16 lr 0.000532 time 0.4438 (0.9674) loss 0.4367 (0.4746) grad_norm 1.4051 (nan) mem 18243MB
[2022-02-05 15:20:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][900/1251] eta 0:05:41 lr 0.000539 time 3.8164 (0.9736) loss 0.4383 (0.4707) grad_norm 0.0261 (nan) mem 18243MB
[2022-02-05 15:21:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1000/1251] eta 0:04:03 lr 0.000545 time 1.6960 (0.9705) loss 0.4484 (0.4665) grad_norm 21.2195 (nan) mem 18243MB
[2022-02-05 15:23:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1100/1251] eta 0:02:26 lr 0.000552 time 0.4055 (0.9699) loss 0.4562 (0.4642) grad_norm 1.7039 (nan) mem 18243MB
[2022-02-05 15:24:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1200/1251] eta 0:00:49 lr 0.000558 time 0.6191 (0.9622) loss 0.4597 (0.4641) grad_norm 1.6285 (nan) mem 18243MB
[2022-02-05 15:25:37 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 6 training takes 0:20:01
[2022-02-05 15:25:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][0/1251] eta 1:24:51 lr 0.000561 time 4.0702 (4.0702) loss 0.4520 (0.4520) grad_norm 0.2361 (0.2361) mem 18243MB
[2022-02-05 15:26:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][100/1251] eta 0:09:22 lr 0.000567 time 0.4117 (0.4889) loss 0.4651 (0.4644) grad_norm 0.0263 (1.5361) mem 18243MB
[2022-02-05 15:27:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][200/1251] eta 0:08:14 lr 0.000574 time 0.5824 (0.4702) loss 0.4427 (0.4608) grad_norm 0.4953 (2.5894) mem 18243MB
[2022-02-05 15:27:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][300/1251] eta 0:07:20 lr 0.000580 time 0.4171 (0.4631) loss 0.4863 (0.4698) grad_norm 0.0398 (2.0401) mem 18243MB
[2022-02-05 15:30:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][400/1251] eta 0:09:53 lr 0.000587 time 0.4244 (0.6975) loss 0.4536 (0.4673) grad_norm 0.3069 (1.8147) mem 18243MB
[2022-02-05 15:32:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][500/1251] eta 0:09:52 lr 0.000593 time 0.4264 (0.7884) loss 0.4325 (0.4624) grad_norm 0.1347 (2.3873) mem 18243MB
[2022-02-05 15:33:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][600/1251] eta 0:08:55 lr 0.000599 time 1.4852 (0.8225) loss 0.4847 (0.4582) grad_norm 9.4087 (2.4801) mem 18243MB
[2022-02-05 15:35:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][700/1251] eta 0:07:38 lr 0.000606 time 0.4803 (0.8325) loss 0.4958 (0.4663) grad_norm 0.0721 (2.2051) mem 18243MB
[2022-02-05 15:37:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][800/1251] eta 0:06:24 lr 0.000612 time 0.4407 (0.8519) loss 0.5036 (0.4695) grad_norm 0.0279 (2.1894) mem 18243MB
[2022-02-05 15:38:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][900/1251] eta 0:05:05 lr 0.000618 time 1.0429 (0.8691) loss 0.4598 (0.4725) grad_norm 0.3491 (1.9502) mem 18243MB
[2022-02-05 15:40:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1000/1251] eta 0:03:39 lr 0.000625 time 0.4146 (0.8731) loss 0.4447 (0.4727) grad_norm 0.0666 (1.8049) mem 18243MB
[2022-02-05 15:41:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1100/1251] eta 0:02:13 lr 0.000631 time 0.4089 (0.8846) loss 0.5773 (0.4706) grad_norm 533.4438 (2.1825) mem 18243MB
[2022-02-05 15:43:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1200/1251] eta 0:00:45 lr 0.000637 time 0.4819 (0.8905) loss 0.4459 (0.4708) grad_norm 0.2206 (inf) mem 18243MB
[2022-02-05 15:44:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 7 training takes 0:18:32
[2022-02-05 15:44:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][0/1251] eta 1:22:31 lr 0.000641 time 3.9582 (3.9582) loss 0.4379 (0.4379) grad_norm 0.1034 (0.1034) mem 18243MB
[2022-02-05 15:44:59 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][100/1251] eta 0:09:21 lr 0.000647 time 0.4268 (0.4878) loss 0.4240 (0.4471) grad_norm 0.1163 (0.5972) mem 18243MB
[2022-02-05 15:46:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][200/1251] eta 0:12:24 lr 0.000653 time 0.4756 (0.7080) loss 0.5335 (0.4479) grad_norm 0.6204 (5.4569) mem 18243MB
[2022-02-05 15:47:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][300/1251] eta 0:09:50 lr 0.000660 time 0.4158 (0.6213) loss 0.5053 (0.4720) grad_norm 0.0163 (3.8024) mem 18243MB
[2022-02-05 15:48:02 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][400/1251] eta 0:08:11 lr 0.000666 time 0.4401 (0.5773) loss 0.4971 (0.4803) grad_norm 0.0055 (2.8562) mem 18243MB
[2022-02-05 15:49:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][500/1251] eta 0:07:22 lr 0.000673 time 1.8764 (0.5890) loss 0.5002 (0.4848) grad_norm 0.0067 (2.2872) mem 18243MB
[2022-02-05 15:51:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][600/1251] eta 0:07:31 lr 0.000679 time 0.4090 (0.6942) loss 0.4947 (0.4882) grad_norm 0.0027 (1.9076) mem 18243MB
[2022-02-05 15:52:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][700/1251] eta 0:06:50 lr 0.000685 time 2.9712 (0.7456) loss 0.5094 (0.4906) grad_norm 0.0018 (1.6364) mem 18243MB
[2022-02-05 15:53:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][800/1251] eta 0:05:29 lr 0.000692 time 0.5231 (0.7305) loss 0.5050 (0.4927) grad_norm 0.0023 (1.4328) mem 18243MB
[2022-02-05 15:55:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][900/1251] eta 0:04:29 lr 0.000698 time 0.4867 (0.7679) loss 0.5158 (0.4942) grad_norm 0.0031 (1.2744) mem 18243MB
[2022-02-05 15:57:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1000/1251] eta 0:03:18 lr 0.000704 time 2.5024 (0.7920) loss 0.5137 (0.4952) grad_norm 0.0069 (1.1477) mem 18243MB
[2022-02-05 15:58:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1100/1251] eta 0:01:56 lr 0.000711 time 0.4010 (0.7713) loss 0.5179 (0.4962) grad_norm 0.0027 (1.0440) mem 18243MB
[2022-02-05 16:00:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1200/1251] eta 0:00:40 lr 0.000717 time 0.3946 (0.7944) loss 0.5119 (0.4969) grad_norm 0.0025 (0.9576) mem 18243MB
[2022-02-05 16:00:52 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 8 training takes 0:16:41
[2022-02-05 16:00:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][0/1251] eta 1:20:40 lr 0.000720 time 3.8696 (3.8696) loss 0.4855 (0.4855) grad_norm 0.0029 (0.0029) mem 18243MB
[2022-02-05 16:01:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][100/1251] eta 0:09:21 lr 0.000727 time 0.4418 (0.4877) loss 0.5036 (0.5047) grad_norm 0.0077 (0.0063) mem 18243MB
[2022-02-05 16:02:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][200/1251] eta 0:08:13 lr 0.000733 time 0.4427 (0.4691) loss 0.5000 (0.5045) grad_norm 0.0043 (0.0063) mem 18243MB
[2022-02-05 16:03:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][300/1251] eta 0:07:20 lr 0.000739 time 0.4323 (0.4635) loss 0.5210 (0.5047) grad_norm 0.0036 (0.0064) mem 18243MB
[2022-02-05 16:05:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][400/1251] eta 0:08:49 lr 0.000746 time 0.4176 (0.6220) loss 0.4839 (0.5052) grad_norm 0.0049 (0.0073) mem 18243MB
[2022-02-05 16:06:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][500/1251] eta 0:09:07 lr 0.000752 time 0.4086 (0.7284) loss 0.4946 (0.5054) grad_norm 0.0034 (0.0072) mem 18243MB
[2022-02-05 16:08:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][600/1251] eta 0:08:17 lr 0.000759 time 0.4523 (0.7643) loss 0.5037 (0.5055) grad_norm 0.0185 (0.0070) mem 18243MB
[2022-02-05 16:10:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][700/1251] eta 0:07:18 lr 0.000765 time 0.4846 (0.7965) loss 0.5141 (0.5057) grad_norm 0.0029 (0.0071) mem 18243MB
[2022-02-05 16:11:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][800/1251] eta 0:06:11 lr 0.000771 time 0.5237 (0.8228) loss 0.4947 (0.5055) grad_norm 0.0037 (0.0071) mem 18243MB
[2022-02-05 16:13:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][900/1251] eta 0:04:50 lr 0.000778 time 0.4529 (0.8269) loss 0.5303 (0.5055) grad_norm 0.0031 (0.0073) mem 18243MB
[2022-02-05 16:14:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1000/1251] eta 0:03:28 lr 0.000784 time 5.7999 (0.8313) loss 0.5151 (0.5056) grad_norm 0.0050 (0.0074) mem 18243MB
[2022-02-05 16:16:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1100/1251] eta 0:02:07 lr 0.000790 time 1.1566 (0.8422) loss 0.4930 (0.5055) grad_norm 0.0044 (0.0074) mem 18243MB
[2022-02-05 16:17:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1200/1251] eta 0:00:43 lr 0.000797 time 0.5183 (0.8531) loss 0.4922 (0.5056) grad_norm 0.0028 (0.0076) mem 18243MB
[2022-02-05 16:18:39 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 9 training takes 0:17:46
[2022-02-05 16:18:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][0/1251] eta 1:24:03 lr 0.000800 time 4.0314 (4.0314) loss 0.5028 (0.5028) grad_norm 0.0046 (0.0046) mem 18243MB
[2022-02-05 16:19:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][100/1251] eta 0:09:18 lr 0.000781 time 0.4616 (0.4852) loss 0.5053 (0.5051) grad_norm 0.0029 (0.0082) mem 18243MB
[2022-02-05 16:20:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][200/1251] eta 0:10:45 lr 0.000781 time 1.2564 (0.6146) loss 0.5208 (0.5047) grad_norm 0.0030 (0.0077) mem 18243MB
[2022-02-05 16:22:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][300/1251] eta 0:13:19 lr 0.000781 time 0.4131 (0.8408) loss 0.5163 (0.5054) grad_norm 0.0067 (0.0078) mem 18243MB
[2022-02-05 16:24:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][400/1251] eta 0:12:00 lr 0.000780 time 0.4386 (0.8464) loss 0.5159 (0.5057) grad_norm 0.0075 (0.0083) mem 18243MB
[2022-02-05 16:25:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][500/1251] eta 0:09:37 lr 0.000780 time 0.4158 (0.7694) loss 0.5114 (0.5056) grad_norm 0.0055 (0.0083) mem 18243MB
[2022-02-05 16:26:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][600/1251] eta 0:08:21 lr 0.000780 time 0.4583 (0.7696) loss 0.5191 (0.5058) grad_norm 0.0064 (0.0083) mem 18243MB
[2022-02-05 16:27:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][700/1251] eta 0:06:53 lr 0.000779 time 0.4195 (0.7505) loss 0.4864 (0.5056) grad_norm 0.0081 (0.0085) mem 18243MB
[2022-02-05 16:29:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][800/1251] eta 0:06:11 lr 0.000779 time 1.3727 (0.8233) loss 0.4949 (0.5058) grad_norm 0.0031 (0.0089) mem 18243MB
[2022-02-05 16:31:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][900/1251] eta 0:04:56 lr 0.000779 time 8.4577 (0.8454) loss 0.5168 (0.5056) grad_norm 0.0051 (0.0087) mem 18243MB
[2022-02-05 16:32:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1000/1251] eta 0:03:30 lr 0.000778 time 0.4084 (0.8387) loss 0.5202 (0.5056) grad_norm 0.0031 (0.0088) mem 18243MB
[2022-02-05 16:34:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1100/1251] eta 0:02:08 lr 0.000778 time 0.4177 (0.8530) loss 0.5111 (0.5056) grad_norm 0.0056 (0.0087) mem 18243MB
[2022-02-05 16:35:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1200/1251] eta 0:00:44 lr 0.000778 time 0.4621 (0.8640) loss 0.4990 (0.5055) grad_norm 0.0050 (0.0088) mem 18243MB
[2022-02-05 16:36:41 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 10 training takes 0:18:02
[2022-02-05 16:36:41 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saving......
[2022-02-05 16:36:44 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saved !!!
[2022-02-05 16:36:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][0/1251] eta 1:13:03 lr 0.000778 time 3.5042 (3.5042) loss 0.5118 (0.5118) grad_norm 0.0109 (0.0109) mem 18243MB
[2022-02-05 16:37:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][100/1251] eta 0:12:09 lr 0.000777 time 0.5750 (0.6336) loss 0.5325 (0.5074) grad_norm 0.0052 (0.0076) mem 18243MB
[2022-02-05 16:39:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][200/1251] eta 0:14:14 lr 0.000777 time 0.4904 (0.8130) loss 0.5064 (0.5059) grad_norm 0.0036 (0.0104) mem 18243MB
[2022-02-05 16:41:03 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][300/1251] eta 0:13:38 lr 0.000777 time 0.5833 (0.8607) loss 0.4996 (0.5055) grad_norm 0.0044 (0.0099) mem 18243MB
[2022-02-05 16:42:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][400/1251] eta 0:12:40 lr 0.000776 time 1.4644 (0.8932) loss 0.5054 (0.5057) grad_norm 0.0049 (0.0093) mem 18243MB
[2022-02-05 16:43:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][500/1251] eta 0:10:45 lr 0.000776 time 0.4532 (0.8590) loss 0.4886 (0.5055) grad_norm 0.0048 (0.0096) mem 18243MB
[2022-02-05 16:45:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][600/1251] eta 0:09:28 lr 0.000776 time 0.4424 (0.8730) loss 0.5031 (0.5053) grad_norm 0.0035 (0.0096) mem 18243MB
[2022-02-05 16:46:58 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][700/1251] eta 0:08:02 lr 0.000775 time 0.4489 (0.8760) loss 0.5361 (0.5057) grad_norm 0.0055 (0.0094) mem 18243MB
[2022-02-05 16:47:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][800/1251] eta 0:06:11 lr 0.000775 time 0.4341 (0.8229) loss 0.4948 (0.5057) grad_norm 0.0069 (0.0095) mem 18243MB
[2022-02-05 16:49:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][900/1251] eta 0:04:55 lr 0.000775 time 0.4356 (0.8426) loss 0.5100 (0.5056) grad_norm 0.0119 (0.0096) mem 18243MB
[2022-02-05 16:50:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1000/1251] eta 0:03:23 lr 0.000774 time 0.6975 (0.8099) loss 0.5050 (0.5058) grad_norm 0.0047 (0.0097) mem 18243MB
[2022-02-05 16:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1100/1251] eta 0:02:06 lr 0.000774 time 1.0458 (0.8359) loss 0.5302 (0.5060) grad_norm 0.0044 (0.0095) mem 18243MB
[2022-02-05 16:53:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1200/1251] eta 0:00:43 lr 0.000773 time 0.4221 (0.8547) loss 0.5000 (0.5061) grad_norm 0.0074 (0.0096) mem 18243MB
[2022-02-05 16:54:34 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 11 training takes 0:17:50
[2022-02-05 16:54:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][0/1251] eta 1:16:28 lr 0.000773 time 3.6676 (3.6676) loss 0.5215 (0.5215) grad_norm 0.0116 (0.0116) mem 18243MB
[2022-02-05 16:55:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][100/1251] eta 0:14:50 lr 0.000773 time 0.7810 (0.7737) loss 0.5164 (0.5068) grad_norm 0.0069 (0.0073) mem 18243MB
[2022-02-05 16:57:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][200/1251] eta 0:14:47 lr 0.000773 time 1.7244 (0.8448) loss 0.5187 (0.5058) grad_norm 0.0096 (0.0076) mem 18243MB
[2022-02-05 16:58:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][300/1251] eta 0:13:41 lr 0.000772 time 0.4478 (0.8634) loss 0.4969 (0.5061) grad_norm 0.0054 (0.0086) mem 18243MB
[2022-02-05 17:00:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][400/1251] eta 0:11:49 lr 0.000772 time 0.6186 (0.8340) loss 0.5156 (0.5064) grad_norm 0.0082 (0.0085) mem 18243MB
[2022-02-05 17:01:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][500/1251] eta 0:11:01 lr 0.000772 time 0.4288 (0.8806) loss 0.5079 (0.5062) grad_norm 0.0048 (0.0085) mem 18243MB
[2022-02-05 17:03:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][600/1251] eta 0:09:45 lr 0.000771 time 0.4818 (0.8992) loss 0.4947 (0.5059) grad_norm 0.0039 (0.0084) mem 18243MB
[2022-02-05 17:05:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][700/1251] eta 0:08:20 lr 0.000771 time 0.4087 (0.9092) loss 0.4817 (0.5060) grad_norm 0.0042 (0.0083) mem 18243MB
[2022-02-05 17:06:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][800/1251] eta 0:06:54 lr 0.000770 time 3.8632 (0.9187) loss 0.5162 (0.5057) grad_norm 0.0056 (0.0085) mem 18243MB
[2022-02-05 17:08:22 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][900/1251] eta 0:05:22 lr 0.000770 time 0.6304 (0.9192) loss 0.5097 (0.5058) grad_norm 0.0882 (0.0122) mem 18243MB
[2022-02-05 17:09:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1000/1251] eta 0:03:50 lr 0.000770 time 0.4901 (0.9200) loss 0.5016 (0.5059) grad_norm 0.0458 (0.0284) mem 18243MB
[2022-02-05 17:11:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1100/1251] eta 0:02:20 lr 0.000769 time 4.5322 (0.9288) loss 0.4981 (0.5058) grad_norm 0.0095 (0.0943) mem 18243MB
[2022-02-05 17:13:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1200/1251] eta 0:00:47 lr 0.000769 time 0.6195 (0.9275) loss 0.5148 (0.5059) grad_norm 0.0085 (0.0929) mem 18243MB
[2022-02-05 17:13:54 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 12 training takes 0:19:19
[2022-02-05 17:13:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][0/1251] eta 1:16:38 lr 0.000769 time 3.6756 (3.6756) loss 0.5141 (0.5141) grad_norm 0.0134 (0.0134) mem 18243MB
[2022-02-05 17:14:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][100/1251] eta 0:09:08 lr 0.000768 time 0.4265 (0.4769) loss 0.4834 (0.5059) grad_norm 0.0057 (0.0109) mem 18243MB
[2022-02-05 17:16:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][200/1251] eta 0:11:38 lr 0.000768 time 0.4268 (0.6645) loss 0.5068 (0.5065) grad_norm 0.0053 (0.0105) mem 18243MB
[2022-02-05 17:17:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][300/1251] eta 0:12:06 lr 0.000768 time 0.5157 (0.7636) loss 0.4987 (0.5058) grad_norm 0.0068 (0.0111) mem 18243MB
[2022-02-05 17:19:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][400/1251] eta 0:11:13 lr 0.000767 time 0.4169 (0.7910) loss 0.5224 (0.5060) grad_norm 0.0052 (0.0109) mem 18243MB
[2022-02-05 17:20:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][500/1251] eta 0:09:47 lr 0.000767 time 0.5883 (0.7817) loss 0.4828 (0.5060) grad_norm 0.0175 (0.0106) mem 18243MB
[2022-02-05 17:21:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][600/1251] eta 0:08:12 lr 0.000766 time 3.2032 (0.7566) loss 0.5117 (0.5061) grad_norm 0.0134 (0.0104) mem 18243MB
[2022-02-05 17:23:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][700/1251] eta 0:07:20 lr 0.000766 time 0.4028 (0.8003) loss 0.5239 (0.5060) grad_norm 0.0044 (0.0108) mem 18243MB
[2022-02-05 17:25:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][800/1251] eta 0:06:15 lr 0.000766 time 0.3985 (0.8326) loss 0.5014 (0.5062) grad_norm 0.0114 (0.0110) mem 18243MB
[2022-02-05 17:26:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][900/1251] eta 0:04:58 lr 0.000765 time 0.6148 (0.8492) loss 0.5003 (0.5062) grad_norm 0.0027 (0.0109) mem 18243MB
[2022-02-05 17:28:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1000/1251] eta 0:03:36 lr 0.000765 time 0.4273 (0.8616) loss 0.4993 (0.5062) grad_norm 0.0140 (0.0108) mem 18243MB
[2022-02-05 17:30:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1100/1251] eta 0:02:12 lr 0.000764 time 0.4532 (0.8779) loss 0.5004 (0.5061) grad_norm 0.0107 (0.0109) mem 18243MB
[2022-02-05 17:31:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1200/1251] eta 0:00:44 lr 0.000764 time 0.5378 (0.8783) loss 0.4941 (0.5061) grad_norm 0.0027 (0.0106) mem 18243MB
[2022-02-05 17:32:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 13 training takes 0:18:16
[2022-02-05 17:32:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][0/1251] eta 1:28:50 lr 0.000764 time 4.2612 (4.2612) loss 0.4906 (0.4906) grad_norm 0.0057 (0.0057) mem 18243MB
[2022-02-05 17:33:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][100/1251] eta 0:09:30 lr 0.000763 time 0.4356 (0.4958) loss 0.4973 (0.5057) grad_norm 0.0031 (0.0087) mem 18243MB
[2022-02-05 17:34:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][200/1251] eta 0:11:46 lr 0.000763 time 0.4247 (0.6724) loss 0.5087 (0.5059) grad_norm 0.0055 (0.0088) mem 18243MB
[2022-02-05 17:36:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][300/1251] eta 0:13:02 lr 0.000763 time 0.5249 (0.8233) loss 0.5212 (0.5071) grad_norm 0.0065 (0.0095) mem 18243MB
[2022-02-05 17:37:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][400/1251] eta 0:12:15 lr 0.000762 time 0.6521 (0.8638) loss 0.4970 (0.5065) grad_norm 0.0145 (0.0094) mem 18243MB
[2022-02-05 17:39:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][500/1251] eta 0:11:08 lr 0.000762 time 1.7114 (0.8902) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][600/1251] eta 0:09:49 lr 0.000761 time 3.6215 (0.9061) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:42:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][700/1251] eta 0:08:23 lr 0.000761 time 0.5390 (0.9145) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:44:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][800/1251] eta 0:06:54 lr 0.000761 time 0.8262 (0.9181) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:46:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][900/1251] eta 0:05:25 lr 0.000760 time 0.7292 (0.9266) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:47:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1000/1251] eta 0:03:53 lr 0.000760 time 0.5330 (0.9303) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:49:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1100/1251] eta 0:02:20 lr 0.000759 time 1.6054 (0.9295) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:50:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1200/1251] eta 0:00:47 lr 0.000759 time 0.4148 (0.9302) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:51:29 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 14 training takes 0:19:19
[2022-02-05 17:51:33 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][0/1251] eta 1:25:26 lr 0.000759 time 4.0980 (4.0980) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][100/1251] eta 0:09:03 lr 0.000758 time 0.4545 (0.4721) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:53:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][200/1251] eta 0:11:56 lr 0.000758 time 0.4286 (0.6820) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:54:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][300/1251] eta 0:10:23 lr 0.000757 time 0.4350 (0.6558) loss nan (nan) grad_norm nan (nan) mem 18243MB
I did not modify any of the configs except for specifying --accumulation-steps 2 from the command line to fit in memory on an 8-GPU machine. I'm using CUDA 11.1, CUDNN 8 and Pytorch 1.9.0 (which is sufficiently new). Could you help take a look what went wrong and how to fix this?
Thank you!
The text was updated successfully, but these errors were encountered:
Hi, thank you for releasing such a wonderful work. I tried to replicate the results using the following command:
which gave me
nan
loss after 14 epochs:I did not modify any of the configs except for specifying
--accumulation-steps 2
from the command line to fit in memory on an 8-GPU machine. I'm using CUDA 11.1, CUDNN 8 and Pytorch 1.9.0 (which is sufficiently new). Could you help take a look what went wrong and how to fix this?Thank you!
The text was updated successfully, but these errors were encountered: