Cannot reproduce the benchmark results of DQN on Breakout #672

bywbilly · 2018-10-23T16:46:53Z

I use the instruction below to train DQN on breakout environment, here is my instruction:

python3 -m baselines.run --alg=deepq --env=BreakoutNoFrameskip-v4 --num_timesteps=10000000

And at the end of training, I could only get 14-15 for 100 episode reward mean, I want to know how could I reproduce the results?

The text was updated successfully, but these errors were encountered:

DanielTakeshi · 2018-10-23T18:31:07Z

@bywbilly

To get DQN to work you need to adjust hyperparameters that are different from what's the default in Baselines.

I got Breakout to work several times for different random seeds, all within the past week from master. Here is one example of a training curve I have with a code base I'm testing with (the top left is probably what you want, past 100 episode reward):

Off the top of my head:

Use the exploration schedule above or something closer to what OpenAI was using before they changed DQN around Oct 2017
Replay buffer size: 1e6
Learning starts: 80k
Update target net: 40k
Adam lr 1e-4, adam epsilon 1e-4

edit: this is PDD-DQN, just to be clear. I ran for 2.5e7 steps.

bywbilly · 2018-10-23T19:30:03Z

@DanielTakeshi
Thank you very much! I am now running the experiments to see the performance on vanilla DQN. And I am going to close this issue.

bywbilly · 2018-10-30T16:59:55Z

@DanielTakeshi

I copied your hyperparameters except for the exploration method and Adam epsilon. Then I can not get the benchmark results, I could only get like 17.0 after 1e7 training steps. However, I don't think the Adam epsilon matters in this issue, maybe the problem is caused by the default exploration schedule?

Do you have some ideas about that or point out what exploration schedule should I use?

Thanks!

DanielTakeshi · 2018-10-31T04:37:38Z

@bywbilly Did you use the exploration schedule I had earlier? I probably should have made it clear, the exploration schedule is shown in the lower left plot in my figure above. Here it is in my actual code:

    exploration = PiecewiseSchedule([
            (0,        1.0),
            (int(1e6), 0.1),
            (int(1e7), 0.01)
    ], outside_value=0.01)

bywbilly · 2018-10-31T18:01:27Z

@DanielTakeshi Oh, I didn't notice that! Thanks for pointing out that, and I am going to try that! Thanks!

asiddharth · 2019-08-09T08:48:45Z

Hi @DanielTakeshi ,
I am facing the same issue where the vanilla DQN and the PDD DQN agents are not learning as expected on BreakoutNoFrameskip-v4.

I copied over the hyper parameters and the exploration schedule mentioned above. I am running the experiments with this baselines commit.

Here is a list of the hyper parameters being used (I modified defaults.py) for PDD-DQN.
network='conv_only',
lr=1e-4,
buffer_size=int(1e6),
exploration_fraction=0.1,
exploration_final_eps=0.01,
train_freq=4,
learning_starts=80000,
target_network_update_freq=40000,
gamma=0.99,
prioritized_replay=True,
prioritized_replay_alpha=0.6,
checkpoint_freq=10000,
checkpoint_path=None,
dueling=True

I used the same hyperparameters but set dueling=False, prioritized_replay=False in defaults.py, and set double_q to false in build_graph.py for the vanilla DQN agent.

As mentioned in the readme, I also tried to reproduce results with commit (7bfbcf1), without a changing the hyperparameters. But I was not able to reproduce the results.

Would be really helpful if you could please let me know if I am doing anything wrong, and if any other hyper parameter combination is better.

Thanks!

Some results with the changed hyper parameters and code commit.
Results for PDD-DQN :

| % time spent exploring | 2 |
| episodes | 5.1e+04 |
| mean 100 episode reward | 22.1 |
| steps | 8.34e+06 |

Saving model due to mean reward increase: 21.5 -> 22.4

Results for vanilla DQN :

| % time spent exploring | 1 |
| episodes | 5.17e+04 |
| mean 100 episode reward | 23.2 |
| steps | 1.05e+07 |

(Highest score for vanilla DQN is 28 at this point)

sytelus · 2019-11-02T03:52:25Z

Not sure why is this closed. I cannot reproduce DQN/Breakout as well and there is no resolution proposed in this thread. I also tried out new parameters proposed by @DanielTakeshi that included new exploration schedule. None of these converge.

Related issues:
#176
#983

DanielTakeshi · 2019-11-02T03:55:24Z

@sytelus I don't remember who closed it but maybe @bywbilly got it working?
With these it can also be hard to say whether it's due to versions of gym, versions of atari-py. Tons of things can affect results. :(

sytelus · 2019-11-02T03:59:27Z

@DanielTakeshi - yes but the purpose of the baseline is supposed to be reproducible by anyone :). We just need a simple script that would run all baselines periodically, put result in Markdown file and push in the repo.

bywbilly closed this as completed Oct 23, 2018

DanielTakeshi mentioned this issue Nov 1, 2018

Train_Pong in the deepq/experiments does not converge #697

Closed

asiddharth mentioned this issue Aug 9, 2019

Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

Open

sytelus mentioned this issue Nov 2, 2019

Inclusion of baseline results araffin/rl-baselines-zoo#48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the benchmark results of DQN on Breakout #672

Cannot reproduce the benchmark results of DQN on Breakout #672

bywbilly commented Oct 23, 2018

DanielTakeshi commented Oct 23, 2018 •

edited

Loading

bywbilly commented Oct 23, 2018

bywbilly commented Oct 30, 2018

DanielTakeshi commented Oct 31, 2018

bywbilly commented Oct 31, 2018

asiddharth commented Aug 9, 2019 •

edited

Loading

sytelus commented Nov 2, 2019 •

edited

Loading

DanielTakeshi commented Nov 2, 2019

sytelus commented Nov 2, 2019 •

edited

Loading

Cannot reproduce the benchmark results of DQN on Breakout #672

Cannot reproduce the benchmark results of DQN on Breakout #672

Comments

bywbilly commented Oct 23, 2018

DanielTakeshi commented Oct 23, 2018 • edited Loading

bywbilly commented Oct 23, 2018

bywbilly commented Oct 30, 2018

DanielTakeshi commented Oct 31, 2018

bywbilly commented Oct 31, 2018

asiddharth commented Aug 9, 2019 • edited Loading

Some results with the changed hyper parameters and code commit. Results for PDD-DQN :

| % time spent exploring | 2 | | episodes | 5.1e+04 | | mean 100 episode reward | 22.1 | | steps | 8.34e+06 |

Results for vanilla DQN :

| % time spent exploring | 1 | | episodes | 5.17e+04 | | mean 100 episode reward | 23.2 | | steps | 1.05e+07 |

sytelus commented Nov 2, 2019 • edited Loading

DanielTakeshi commented Nov 2, 2019

sytelus commented Nov 2, 2019 • edited Loading

DanielTakeshi commented Oct 23, 2018 •

edited

Loading

asiddharth commented Aug 9, 2019 •

edited

Loading

Some results with the changed hyper parameters and code commit.
Results for PDD-DQN :

| % time spent exploring | 2 |
| episodes | 5.1e+04 |
| mean 100 episode reward | 22.1 |
| steps | 8.34e+06 |

| % time spent exploring | 1 |
| episodes | 5.17e+04 |
| mean 100 episode reward | 23.2 |
| steps | 1.05e+07 |

sytelus commented Nov 2, 2019 •

edited

Loading

sytelus commented Nov 2, 2019 •

edited

Loading