[rllib] Provide atari results across all algorithms (as applicable) #2663

ericl · 2018-08-15T22:18:46Z

Describe the problem

We should publish results for at least a few of the standard Atari games on all applicable algorithms, and fix any discrepancies, e.g. #2654

Results uploaded to this repo: https://github.com/ray-project/rl-experiments

Envs to run: PongNoFrameskip-v4, BreakoutNoFrameskip-v4, BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4

(Chosen such that all but pong can run concurrently on a g3.16xl machine).

Some references:
https://github.com/btaba/yarlp
openai/baselines#176

richardliaw · 2018-08-15T22:23:41Z

Also relevant reference: https://github.com/hill-a/stable-baselines

ericl · 2018-08-16T02:54:02Z

Just ran a "30% full speed" IMPALA across a couple environments. The results are pretty reasonable at 40M frames, with Qbert / Space invaders about inline with results from the A3C paper, and Breakout / Beamrider a bit below. Note that the episode max reward for Breakout and Beamrider are pretty good, but the mean is not quite up there.

I'm guessing we can improve on this with some tuning.

# Runs on a single g3.16xl node
atari-impala:
    env:
        grid_search:
            - BreakoutNoFrameskip-v4
            - BeamRiderNoFrameskip-v4
            - QbertNoFrameskip-v4
            - SpaceInvadersNoFrameskip-v4 
    run: IMPALA
    config:
        sample_batch_size: 250  # 50 * num_envs_per_worker
        train_batch_size: 500
        num_workers: 12
        num_envs_per_worker: 5

robertnishihara · 2018-08-16T06:11:52Z

In what format does it make sense to publish the results? E.g., a collection of full learning curves (e.g., as CSV)? Or actual visualizations like you have above? Or something else?

ericl · 2018-08-16T06:41:24Z

If we have a public ray perf dashboard, that would be a good place to put these.

Otherwise, I think posting some summary visualizations on github or the docs would do (for example, just having the tuned example yamls with pointers to this issue). The full learning curve data probably isn't that interesting, but we could also upload that to S3 pretty easily.

luochao1024 · 2018-08-19T06:00:46Z

Do you have any result about A3C or A3C-LSTM?

ericl · 2018-08-19T06:06:58Z

I did an initial run with A3C, however the results were much worse than the Impala ones. I didn't try tuning the learning rate though as mentioned in the A3C paper.

…

On Sat, Aug 18, 2018, 11:00 PM luochao1024 ***@***.***> wrote: Do you have any result about A3C or A3C-LSTM? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6Shy688eDmu__FkhmWz28hA3ISZ4Bks5uSP8TgaJpZM4V-5mK> .

luochao1024 · 2018-08-19T06:19:00Z

A3C is very sensitive with learning rate as the staleness of gradients increases with learning rate

ericl · 2018-08-19T06:24:31Z

For reference, here is the run and params (with the default lr=0.0001, and grad_clip=40.0). Note that the gradient magnitude scales with the lr * batch size = 20.

This is also on this branch: #2679

# Runs on a single m4.16xl node
atari-a3c:
    env:
        grid_search:
            - BreakoutNoFrameskip-v4
            - BeamRiderNoFrameskip-v4
            - QbertNoFrameskip-v4
            - SpaceInvadersNoFrameskip-v4 
    run: A3C
    config:
        num_workers: 11
        sample_batch_size: 20
        optimizer:
            grads_per_step: 1000

ericl · 2018-08-19T06:32:49Z

That PR also adds A2C. Since A2C is deterministic, it should be easy to copy hyperparameters from another A2C implementation to compare results (I'm doing some runs right now, but it might take a while).

luochao1024 · 2018-08-20T04:27:30Z

you are using 11 workers for experiment. I would recommend 16 workers.

ericl · 2018-08-20T05:23:41Z

One discovery: we're handling EpisodicLifeEnv resets incorrectly. For example, for BeamRider you get three lives, which we are treating as three episodes, but you're supposed to count as one.

This kind of explains why BeamRider's starting score is about 3x too low.

ericl · 2018-08-21T06:26:35Z

@luochao1024 this PR reproduces standard Atari results for IMPALA and A2C: #2700

I'm still having trouble finding the right hyperparams for A3C (vf_explained_var tends to dive to <0 with A3C whereas it is always close to 1 with A2C / IMPALA), but since it works in A2C it's probably just a matter of tweaking the lr / batch size / grad clipping.

luochao1024 · 2018-08-25T17:37:45Z

Do you have some right hyperparams that work for a3c now?

ericl · 2018-08-25T18:03:36Z

I don't have the bandwidth to tune A3C right now, but if you want to give it a shot perhaps starting from the A2C hyperparams with some lr adjustment could work?

…

On Sat, Aug 25, 2018, 10:37 AM luochao1024 ***@***.***> wrote: Do you have some right hyperparams that work for a3c now? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6Srxb0zgTda_k0Mv-fcU5NnWXg4Zvks5uUYtugaJpZM4V-5mK> .

luochao1024 · 2018-08-29T16:50:22Z

@ericl Can you give it a try for BreakoutNoFrameskip-v4? I try a grid search for the lr, but I still get some really bad results. Here is the configs I use:

atari-a3c:
    env: BreakoutNoFrameskip-v4
    run: A3C
    config:
        num_workers: 8
        sample_batch_size: 20
        use_pytorch: false
        vf_loss_coeff: 0.5
        entropy_coeff: -0.01
        gamma: 0.99
        grad_clip: 40.0
        lambda: 1.0
        lr:
            grid_search:
                - 0.000005
                - 0.00001
                - 0.00005
                - 0.0001
        observation_filter: NoFilter
        preprocessor_pref: rllib
        num_envs_per_workers: 5
        optimizer:
            grads_per_step: 1000

ericl · 2018-08-29T16:58:00Z

You'll definitely need to use the deepmind preprocessors, since the rllib knees don't have the right episodic life wrappers. Perhaps we should remove those. Also, maybe don't use LSTM and start from the A2C config.

…

On Wed, Aug 29, 2018, 9:50 AM luochao1024 ***@***.***> wrote: @ericl <https://github.com/ericl> Can you give it a try for BreakoutNoFrameskip-v4? I try a grid search for the lr, but I still get some really bad results. Here is the configs I use: atari-a3c: env: BreakoutNoFrameskip-v4 run: A3C config: num_workers: 8 sample_batch_size: 20 use_pytorch: false vf_loss_coeff: 0.5 entropy_coeff: -0.01 gamma: 0.99 grad_clip: 40.0 lambda: 1.0 lr: grid_search: - 0.000005 - 0.00001 - 0.00005 - 0.0001 observation_filter: NoFilter preprocessor_pref: rllib num_envs_per_workers: 5 optimizer: grads_per_step: 1000 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6Sm2sAAl2Kk3Y5wpeyEY7lc7XYONrks5uVsZlgaJpZM4V-5mK> .

luochao1024 · 2018-08-29T17:21:04Z

Now I am running A3C with the following config:

atari-a3c:
    env:
        BreakoutNoFrameskip-v4
    run: A3C
    config:
        num_workers: 5
        sample_batch_size: 20
        preprocessor_pref: deepmind
        lr:
           grid_search:
               - 0.000005
               - 0.00001
               - 0.00005
               - 0.0001
               - 0.0005
               - 0.001
        num_envs_per_worker: 5
        optimizer:
            grads_per_step: 1000

Do you think the configs are reasonable now? I am also running BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4 at then same time. I will report it when I finish the training.

ericl · 2018-08-29T18:40:01Z

There's this one weird thing where num_envs_per_worker will reduce your effective unroll length per env (so 20 / 5 = unroll length of 4). So just watch out for that and you might consider trying 1 env per worker instead, or setting sample_batch_size=50 for a longer unroll. Beyond that the config looks fine. Note that I found a lr schedule is important for some envs (but it's probably too much to try right now).

…

On Wed, Aug 29, 2018 at 10:21 AM luochao1024 ***@***.***> wrote: Now I am running A3C with the following config: atari-a3c: env: BreakoutNoFrameskip-v4 run: A3C config: num_workers: 5 sample_batch_size: 20 preprocessor_pref: deepmind lr: grid_search: - 0.000005 - 0.00001 - 0.00005 - 0.0001 - 0.0005 - 0.001 num_envs_per_worker: 5 optimizer: grads_per_step: 1000 Do you think the configs are reasonable now? I am also running BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4 at then same time. I will report it when I finish the training. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6SoyKSWaPu8oF1F4Qk6AHp-Tq7SNNks5uVs2GgaJpZM4V-5mK> .

luochao1024 · 2018-08-29T19:48:08Z

The result seems normal now with num_workers=5

BreakoutNoFrameskip-v4:

SpaceInvadersNoFrameskip-v4:

QbertNoFrameskip-v4:

I will set the num_envs_per_worker=1 later

ericl · 2018-09-15T07:03:08Z

Closing this in favor of individual tickets. Main TODOs are the DQN family.

ericl mentioned this issue Aug 21, 2018

[rllib] Fix atari reward calculations, add LR annealing, explained var stat for A2C / impala #2700

Merged

ericl closed this as completed Sep 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Provide atari results across all algorithms (as applicable) #2663

[rllib] Provide atari results across all algorithms (as applicable) #2663

ericl commented Aug 15, 2018 •

edited

Loading

richardliaw commented Aug 15, 2018

ericl commented Aug 16, 2018

robertnishihara commented Aug 16, 2018

ericl commented Aug 16, 2018 •

edited

Loading

luochao1024 commented Aug 19, 2018

ericl commented Aug 19, 2018 via email •

edited

Loading

luochao1024 commented Aug 19, 2018

ericl commented Aug 19, 2018 •

edited

Loading

ericl commented Aug 19, 2018 •

edited

Loading

luochao1024 commented Aug 20, 2018

ericl commented Aug 20, 2018

ericl commented Aug 21, 2018 •

edited

Loading

luochao1024 commented Aug 25, 2018

ericl commented Aug 25, 2018 via email

luochao1024 commented Aug 29, 2018

ericl commented Aug 29, 2018 via email

luochao1024 commented Aug 29, 2018

ericl commented Aug 29, 2018 via email

luochao1024 commented Aug 29, 2018 •

edited

Loading

ericl commented Sep 15, 2018

[rllib] Provide atari results across all algorithms (as applicable) #2663

[rllib] Provide atari results across all algorithms (as applicable) #2663

Comments

ericl commented Aug 15, 2018 • edited Loading

Describe the problem

richardliaw commented Aug 15, 2018

ericl commented Aug 16, 2018

robertnishihara commented Aug 16, 2018

ericl commented Aug 16, 2018 • edited Loading

luochao1024 commented Aug 19, 2018

ericl commented Aug 19, 2018 via email • edited Loading

luochao1024 commented Aug 19, 2018

ericl commented Aug 19, 2018 • edited Loading

ericl commented Aug 19, 2018 • edited Loading

luochao1024 commented Aug 20, 2018

ericl commented Aug 20, 2018

ericl commented Aug 21, 2018 • edited Loading

luochao1024 commented Aug 25, 2018

ericl commented Aug 25, 2018 via email

luochao1024 commented Aug 29, 2018

ericl commented Aug 29, 2018 via email

luochao1024 commented Aug 29, 2018

ericl commented Aug 29, 2018 via email

luochao1024 commented Aug 29, 2018 • edited Loading

ericl commented Sep 15, 2018

ericl commented Aug 15, 2018 •

edited

Loading

ericl commented Aug 16, 2018 •

edited

Loading

ericl commented Aug 19, 2018 via email •

edited

Loading

ericl commented Aug 19, 2018 •

edited

Loading

ericl commented Aug 19, 2018 •

edited

Loading

ericl commented Aug 21, 2018 •

edited

Loading

luochao1024 commented Aug 29, 2018 •

edited

Loading