-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference for DeepMind using FireResetEnv wrapper's functionality #240
Comments
I'm also curious whether the DeepMind papers use the same trick. I was unable to find it in their code (dqn, alewrap and xitari). |
@muupan I've continued looking into this since writing that issue. I have not gotten 100% confirmation anywhere. I did send a mail to John Schulman (who wrote the initial commit in which the comment appeared in this repository claiming that this is deepmind-style, although he did not originally implement that wrapper), and he has no idea where it came from. Additionally, it appears like the intended functionality (automatically playing actions that are required to start the game on reset) is actually already implemented in the Arcade Learning Environment itself, in the following line of code: https://github.com/mgbellemare/Arcade-Learning-Environment/blob/master/src/environment/stella_environment.cpp#L88 So, to me it appears like anyone who uses the Arcade Learning Environment (which is basically everyone, it's DeepMind, but it's also everyone who runs atari games through OpenAI gym, etc.) already gets this functionality out of the box, even without the FireResetEnv. I suspect the FireResetEnv may be completely useless. I've never personally tested how it affects performance anywhere, that would still be interesting to do, just to make sure. I did very briefly test the AirRaid and Asterix games (which, as far as I've been able to find, are two games that supposedly require pressing FIRE to start a game), and they appeared to play just fine both with and without the FireResetEnv. |
@DennisSoemers Thank you for the information. At least, |
Ah, I see, thanks for letting me know about that one. I guess the only way to tell for sure is to contact DeepMind and ask them if they're using anything like this. Right now I'm inclined to bet on "no" though. |
I sent an email to the author of the DQN paper. I hope he will answer it. |
@muupan Just wondering if you ever got a reply? |
Unfortunately not. |
After doing some experiments I suspect that they don't use this wrapper, and like @muupan I was unable to find it in their code. It seems that for the DQN-based agents at least, DeepMind evaluates using an ɛ-greedy policy, where ɛ = 0.05. So during evaluation in the early stages of training, the ball does end up getting released quickly in Breakout (whereas when I used ɛ = 0.001 evaluation basically got stuck). I'm training a Rainbow agent now, will try to remember to report back with results once it is done. By the way, if you're interested in reproducibility, DeepMind's code shows that they use bilinear interpolation for downsampling, as opposed to the wrapper here which uses pixel area relation. |
Got confirmation from Charles Beattie at DeepMind that they do not use anything like the |
Which specific papers do you mean by the DQN-based agents? As far as I know, at least the PER paper seems to use 0.01 for up-to-30-noop evaluation (from Table 5 of http://arxiv.org/abs/1511.05952), while the QR-DQN paper seems to use 0.001 for up-to-30-noop evaluation (from "Best agent performance" subsection of https://arxiv.org/abs/1710.10044). Correct me if I'm wrong. |
Sorry yes they may differ from paper to paper. I got ɛ = 0.05 from the (Nature) DQN paper, but recently it seems like they've been using ɛ = 0.001. Unfortunately they are still changing settings - the new Pop-Art + IMPALA paper takes away the termination on loss of life wrapper. I hope they'll settle on the setup in the Revisiting ALE paper, but as long as DM is concerned with improving upon their own results it seems unlikely. |
Thank you very much for clarifying it. I hope so, too. |
Hi @Kaixhin , Best, |
@steffenvan I assume that if you don't use the |
In atari_wrappers.py (specifically, the wrap_deepmind function all the way at the bottom), the docstring says that that function configures the environment for "DeepMind-style Atari".
I was curious if anyone can provide me with a reference to any DeepMind paper mentioning that they do in fact use functionality as implemented by the FireResetEnv wrapper? I was able to find mentions of the functionality implemented by all the other wrappers applied in that function (and also the ones in the make_atari function above) in various DeepMind papers (such as the Mnih et al. (2015) DQN Nature paper), but was unable to find any text resembling the functionality of the FireResetEnv.
It may just be a minor detail, but I do think it's important to be precise with this kind of stuff for the sake of reproducibility.
The text was updated successfully, but these errors were encountered: