-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform fire on reset after doing no-ops, make fire_on_reset configurable #158
Conversation
Codecov Report
@@ Coverage Diff @@
## master #158 +/- ##
==========================================
- Coverage 22.53% 22.51% -0.03%
==========================================
Files 128 128
Lines 8002 8005 +3
==========================================
- Hits 1803 1802 -1
- Misses 6199 6203 +4
Continue to review full report at Codecov.
|
Check this openai/baselines#240 - I confirmed with DM that they don't use it. |
Thanks a lot Kai, my own experiments with DER suggest not using fire on reset is better as well! While we have you here, could you clarify the following settings as well?
|
|
Thanks, super helpful! For 1, I was just looking at the DER paper, so maybe it only helps there. |
Oh - that specifies that it was used in both. I found the DER paper was useful for specifying hyperparameters, so I'll add it in to my repo... |
Good catch! If you want to be super consistent, both version don't seem to be using Dueling as well. |
Dueling is one of the main components of Rainbow (both)? |
I don't think so. From the Rainbow paper:
And, the DER paper mentions for both variants:
which I assume implies not using dueling |
Seems like that means there's no significant difference, but they do keep it. As for the update rule, distributional changes the form, and double Q-learning changes the form, but dueling is an architectural change that is agnostic to the update rule. |
I see, that might be the case! |
OK looks good, thanks for digging into the details! I thought fire was needed in some games to start the action after a lost life? I guess you're right tho on Breakout, if you press fire, then wait up to 30 steps, the ball might have already gone too low to reach....could be why I'm getting low scores in that game sometimes hahah hmmm... |
@astooke and everyone else here, I am just curious, but are there benchmarks on Breakout with and without this change, to see if reward is better? I also notice (at least anecdotally) that before this pull request, my Breakout scores are OK sometimes but can be lower than expected. I have not tried with this change. |
…able (astooke#158) * Perform fire on reset after doing no-ops, make fire_on_reset configurable * Fix typo * Set fire_on_reset to be False by default
@Kaixhin
Can you please share the source of "2. Evaluation is ε-greedy with ε = 0.001, but no noise in the noisy layers". I thought the Rainbow Paper (https://arxiv.org/pdf/1710.02298.pdf) on page 4 says to set ε = 0. |
@rfali the part you highlighted from the Rainbow paper seems to refer to training settings, rather than evaluation settings. Unfortunately this bit of info was a bit hard to find, and I'm afraid I can't remember where I picked it up (it's been several years since I worked on Atari), so if you do find a primary source that says otherwise you should trust that instead. |
Thanks @Kaixhin, the only other source I have found that does something similar is RLLib benchmark experiments which used e=0.01. https://github.com/ray-project/rl-experiments#dqn--rainbow
|
Update: I did find that Dopamine also sets epsilon=0.001 during evaluation. See the Full Rainbow config here. However, the above config also sets |
Fire on reset should probably happen after doing the no-ops, not before. Firing and then doing no-ops for 30 steps is probably harmful in some cases. OpenAI baselines follow this order as well: https://github.com/openai/baselines/blob/8c2aea2addc9f3ba36d4a0c937e6a2d09830afc7/baselines/ppo1/run_atari.py
I have also made
fire_on_reset
configurable since repositories that reproduce DeepMind numbers don't use it (dopamine and @Kaixhin's rainbow). Setting it toFalse
is probably a better default but I didn't change that yet.Edit: Switched to False by default at Kai's suggestion, my own experiments with Data efficient rainbow also suggest that using FireOnReset hurts.