Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce Gridnet's SOTA agent with Trueskill Evaluation #36

Open
vwxyzjn opened this issue Jan 18, 2022 · 7 comments
Open

Reproduce Gridnet's SOTA agent with Trueskill Evaluation #36

vwxyzjn opened this issue Jan 18, 2022 · 7 comments
Assignees

Comments

@vwxyzjn
Copy link
Collaborator

vwxyzjn commented Jan 18, 2022

Now that we are trying to get the self-play agent working, it's important to set baselines that we want to achieve and excel. Our best past experiment is this (which I just now realized Chris had run with --num-bot-envs 48), which can achieve a Trueskill of 35.55 (source).

image

I am going to try reproduce with python ppo_gridnet.py --num-bot-envs 24 --num-selfplay-envs 0 --total-timesteps 100000000 --num-models 300, see if we can get the same level of performance, so

python ppo_gridnet.py \
    --num-bot-envs 24 --num-selfplay-envs 0 \
    --total-timesteps 100000000 --num-models 300 \
    --capture-video --prod-mode

After this, I am going to check if I can reproduce the same results with the new vecenv implementation in #34

@vwxyzjn vwxyzjn self-assigned this Jan 18, 2022
@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 19, 2022

Unfortunately, I wasn't able to reproduce the best past results in https://wandb.ai/costa-huang/gym-microrts/runs/17moy8qp. Maybe I need to run with the default parameters in https://wandb.ai/vwxyzjn/gym-microrts-paper/runs/asrpz468 (--num-bot-envs 48)

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 19, 2022

Actually, I am going to try to use #34 to run the following. I'd rather not have to wait for 2 weeks again to reproduce the original results.

python ppo_gridnet.py \
    --num-bot-envs 48 --num-selfplay-envs 0 \
    --total-timesteps 300000000 --num-models 300 \
    --capture-video --prod-mode

Turns out I don't have that much memory, so had to run with -num-bot-envs 24

python ppo_gridnet.py \
    --num-bot-envs 24 --num-selfplay-envs 0 \
    --total-timesteps 300000000 --num-models 300 \
    --capture-video --prod-mode

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 19, 2022

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 22, 2022

A new run https://wandb.ai/costa-huang/gym-microrts/runs/2v658xqx/logs?workspace=user-costa-huang seems successful, although the true skill evaluation is a bit buggy: see #41

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 24, 2022

This run successfully reproduced past best results. Closing the issue now.

image

@vwxyzjn vwxyzjn closed this as completed Jan 24, 2022
@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 24, 2022

Now try reproducing the same results with MicroRTSGridModeSharedMemVecEnv from #34 in https://wandb.ai/gym-microrts/gym-microrts/runs/39stn3xh

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 28, 2022

Was able to reproduce same results with MicroRTSGridModeSharedMemVecEnv.

image

Also, SPS is about 10% faster! If we could make the NN faster, SPS will be even faster.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant