-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Provide atari results across all algorithms (as applicable) #2663
Comments
Also relevant reference: https://github.com/hill-a/stable-baselines |
Just ran a "30% full speed" IMPALA across a couple environments. The results are pretty reasonable at 40M frames, with Qbert / Space invaders about inline with results from the A3C paper, and Breakout / Beamrider a bit below. Note that the episode max reward for Breakout and Beamrider are pretty good, but the mean is not quite up there. I'm guessing we can improve on this with some tuning.
|
In what format does it make sense to publish the results? E.g., a collection of full learning curves (e.g., as CSV)? Or actual visualizations like you have above? Or something else? |
If we have a public ray perf dashboard, that would be a good place to put these. Otherwise, I think posting some summary visualizations on github or the docs would do (for example, just having the tuned example yamls with pointers to this issue). The full learning curve data probably isn't that interesting, but we could also upload that to S3 pretty easily. |
Do you have any result about A3C or A3C-LSTM? |
I did an initial run with A3C, however the results were much worse than the
Impala ones. I didn't try tuning the learning rate though as mentioned in
the A3C paper.
…On Sat, Aug 18, 2018, 11:00 PM luochao1024 ***@***.***> wrote:
Do you have any result about A3C or A3C-LSTM?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2663 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA6Shy688eDmu__FkhmWz28hA3ISZ4Bks5uSP8TgaJpZM4V-5mK>
.
|
A3C is very sensitive with learning rate as the staleness of gradients increases with learning rate |
For reference, here is the run and params (with the default lr=0.0001, and grad_clip=40.0). Note that the gradient magnitude scales with the lr * batch size = 20. This is also on this branch: #2679
|
That PR also adds A2C. Since A2C is deterministic, it should be easy to copy hyperparameters from another A2C implementation to compare results (I'm doing some runs right now, but it might take a while). |
you are using 11 workers for experiment. I would recommend 16 workers. |
One discovery: we're handling EpisodicLifeEnv resets incorrectly. For example, for BeamRider you get three lives, which we are treating as three episodes, but you're supposed to count as one. This kind of explains why BeamRider's starting score is about 3x too low. |
@luochao1024 this PR reproduces standard Atari results for IMPALA and A2C: #2700 I'm still having trouble finding the right hyperparams for A3C ( |
Do you have some right hyperparams that work for a3c now? |
I don't have the bandwidth to tune A3C right now, but if you want to give
it a shot perhaps starting from the A2C hyperparams with some lr adjustment
could work?
…On Sat, Aug 25, 2018, 10:37 AM luochao1024 ***@***.***> wrote:
Do you have some right hyperparams that work for a3c now?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2663 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA6Srxb0zgTda_k0Mv-fcU5NnWXg4Zvks5uUYtugaJpZM4V-5mK>
.
|
@ericl Can you give it a try for BreakoutNoFrameskip-v4? I try a grid search for the lr, but I still get some really bad results. Here is the configs I use:
|
You'll definitely need to use the deepmind preprocessors, since the rllib
knees don't have the right episodic life wrappers. Perhaps we should remove
those. Also, maybe don't use LSTM and start from the A2C config.
…On Wed, Aug 29, 2018, 9:50 AM luochao1024 ***@***.***> wrote:
@ericl <https://github.com/ericl> Can you give it a try for
BreakoutNoFrameskip-v4? I try a grid search for the lr, but I still get
some really bad results. Here is the configs I use:
atari-a3c:
env: BreakoutNoFrameskip-v4
run: A3C
config:
num_workers: 8
sample_batch_size: 20
use_pytorch: false
vf_loss_coeff: 0.5
entropy_coeff: -0.01
gamma: 0.99
grad_clip: 40.0
lambda: 1.0
lr:
grid_search:
- 0.000005
- 0.00001
- 0.00005
- 0.0001
observation_filter: NoFilter
preprocessor_pref: rllib
num_envs_per_workers: 5
optimizer:
grads_per_step: 1000
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2663 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA6Sm2sAAl2Kk3Y5wpeyEY7lc7XYONrks5uVsZlgaJpZM4V-5mK>
.
|
Now I am running A3C with the following config:
Do you think the configs are reasonable now? I am also running BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4 at then same time. I will report it when I finish the training. |
There's this one weird thing where num_envs_per_worker will reduce your
effective unroll length per env (so 20 / 5 = unroll length of 4). So just
watch out for that and you might consider trying 1 env per worker instead,
or setting sample_batch_size=50 for a longer unroll.
Beyond that the config looks fine. Note that I found a lr schedule is
important for some envs (but it's probably too much to try right now).
…On Wed, Aug 29, 2018 at 10:21 AM luochao1024 ***@***.***> wrote:
Now I am running A3C with the following config:
atari-a3c:
env:
BreakoutNoFrameskip-v4
run: A3C
config:
num_workers: 5
sample_batch_size: 20
preprocessor_pref: deepmind
lr:
grid_search:
- 0.000005
- 0.00001
- 0.00005
- 0.0001
- 0.0005
- 0.001
num_envs_per_worker: 5
optimizer:
grads_per_step: 1000
Do you think the configs are reasonable now? I am also running
BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4
at then same time. I will report it when I finish the training.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2663 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA6SoyKSWaPu8oF1F4Qk6AHp-Tq7SNNks5uVs2GgaJpZM4V-5mK>
.
|
Closing this in favor of individual tickets. Main TODOs are the DQN family. |
Describe the problem
We should publish results for at least a few of the standard Atari games on all applicable algorithms, and fix any discrepancies, e.g. #2654
Results uploaded to this repo: https://github.com/ray-project/rl-experiments
Envs to run: PongNoFrameskip-v4, BreakoutNoFrameskip-v4, BeamRiderNoFrameskip-v4, QbertNoFrameskip-v4, SpaceInvadersNoFrameskip-v4
(Chosen such that all but pong can run concurrently on a g3.16xl machine).
Some references:
https://github.com/btaba/yarlp
openai/baselines#176
The text was updated successfully, but these errors were encountered: