Reproduce Gridnet's SOTA agent with Trueskill Evaluation #36

vwxyzjn · 2022-01-18T02:11:17Z

Now that we are trying to get the self-play agent working, it's important to set baselines that we want to achieve and excel. Our best past experiment is this (which I just now realized Chris had run with --num-bot-envs 48), which can achieve a Trueskill of 35.55 (source).

I am going to try reproduce with python ppo_gridnet.py --num-bot-envs 24 --num-selfplay-envs 0 --total-timesteps 100000000 --num-models 300, see if we can get the same level of performance, so

python ppo_gridnet.py \
    --num-bot-envs 24 --num-selfplay-envs 0 \
    --total-timesteps 100000000 --num-models 300 \
    --capture-video --prod-mode

After this, I am going to check if I can reproduce the same results with the new vecenv implementation in #34

The text was updated successfully, but these errors were encountered:

vwxyzjn · 2022-01-19T15:32:39Z

Unfortunately, I wasn't able to reproduce the best past results in https://wandb.ai/costa-huang/gym-microrts/runs/17moy8qp. Maybe I need to run with the default parameters in https://wandb.ai/vwxyzjn/gym-microrts-paper/runs/asrpz468 (--num-bot-envs 48)

vwxyzjn · 2022-01-19T19:42:43Z

Actually, I am going to try to use #34 to run the following. I'd rather not have to wait for 2 weeks again to reproduce the original results.

python ppo_gridnet.py \
    --num-bot-envs 48 --num-selfplay-envs 0 \
    --total-timesteps 300000000 --num-models 300 \
    --capture-video --prod-mode

Turns out I don't have that much memory, so had to run with -num-bot-envs 24

python ppo_gridnet.py \
    --num-bot-envs 24 --num-selfplay-envs 0 \
    --total-timesteps 300000000 --num-models 300 \
    --capture-video --prod-mode

vwxyzjn · 2022-01-19T20:09:00Z

https://wandb.ai/costa-huang/gym-microrts/reports/MicroRTSGridModeSharedMemVecEnv---VmlldzoxNDYwNDE0 tracks this progress

vwxyzjn · 2022-01-22T18:25:23Z

A new run https://wandb.ai/costa-huang/gym-microrts/runs/2v658xqx/logs?workspace=user-costa-huang seems successful, although the true skill evaluation is a bit buggy: see #41

vwxyzjn · 2022-01-24T16:04:32Z

This run successfully reproduced past best results. Closing the issue now.

vwxyzjn · 2022-01-24T20:40:22Z

Now try reproducing the same results with MicroRTSGridModeSharedMemVecEnv from #34 in https://wandb.ai/gym-microrts/gym-microrts/runs/39stn3xh

vwxyzjn · 2022-01-28T21:11:49Z

Was able to reproduce same results with MicroRTSGridModeSharedMemVecEnv.

Also, SPS is about 10% faster! If we could make the NN faster, SPS will be even faster.

vwxyzjn self-assigned this Jan 18, 2022

vwxyzjn closed this as completed Jan 24, 2022

vwxyzjn reopened this Jan 24, 2022

vwxyzjn mentioned this issue Jan 27, 2022

Faster Convergence #51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce Gridnet's SOTA agent with Trueskill Evaluation #36

Reproduce Gridnet's SOTA agent with Trueskill Evaluation #36

vwxyzjn commented Jan 18, 2022

vwxyzjn commented Jan 19, 2022

vwxyzjn commented Jan 19, 2022 •

edited

Loading

vwxyzjn commented Jan 19, 2022

vwxyzjn commented Jan 22, 2022

vwxyzjn commented Jan 24, 2022

vwxyzjn commented Jan 24, 2022

vwxyzjn commented Jan 28, 2022

Reproduce Gridnet's SOTA agent with Trueskill Evaluation #36

Reproduce Gridnet's SOTA agent with Trueskill Evaluation #36

Comments

vwxyzjn commented Jan 18, 2022

vwxyzjn commented Jan 19, 2022

vwxyzjn commented Jan 19, 2022 • edited Loading

vwxyzjn commented Jan 19, 2022

vwxyzjn commented Jan 22, 2022

vwxyzjn commented Jan 24, 2022

vwxyzjn commented Jan 24, 2022

vwxyzjn commented Jan 28, 2022

vwxyzjn commented Jan 19, 2022 •

edited

Loading