Question about online exploration rounds #1

hukz18 · 2022-07-21T10:02:26Z

Thanks for your implementation, Daniel!
I have a question about online data collection. In your readme file, you said "This implementation seems to converge after ~60 exploration rounds instead of ~250 as shown in the paper." I wonder where did you get the information that the paper's version converges after 250 rounds?
I only get that they collected 200k online samples from the caption of Table 5.1, and how much data did they collect for each round?

daniellawson9999 · 2022-07-25T01:27:19Z

Hi,

They do not directly mention this, but I roughly estimated how many rounds it should take for hopper from the figures. First, from Figure 4.1 I see that the peak reward is first reached in ~250 rounds. Also, in Figure 5.1, showing a different run, it seems this point is reached in about 180k-200k online samples, showing a bit of the equivalence between samples and rounds.

During each round, a single trajectory is added to the replay buffer. Each trajectory has a maximum length of 1000, but a trajectory can end earlier if the robot falls, so <= 1000 timesteps are added each round. If my implementation converges in say 70 rounds, this would translates to <= 70, 000 timesteps, but probably closer to around ~55-60k.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about online exploration rounds #1

Question about online exploration rounds #1

hukz18 commented Jul 21, 2022

daniellawson9999 commented Jul 25, 2022 •

edited

Loading

Question about online exploration rounds #1

Question about online exploration rounds #1

Comments

hukz18 commented Jul 21, 2022

daniellawson9999 commented Jul 25, 2022 • edited Loading

daniellawson9999 commented Jul 25, 2022 •

edited

Loading