Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about online exploration rounds #1

Open
hukz18 opened this issue Jul 21, 2022 · 1 comment
Open

Question about online exploration rounds #1

hukz18 opened this issue Jul 21, 2022 · 1 comment

Comments

@hukz18
Copy link

hukz18 commented Jul 21, 2022

Thanks for your implementation, Daniel!
I have a question about online data collection. In your readme file, you said "This implementation seems to converge after ~60 exploration rounds instead of ~250 as shown in the paper." I wonder where did you get the information that the paper's version converges after 250 rounds?
I only get that they collected 200k online samples from the caption of Table 5.1, and how much data did they collect for each round?

@daniellawson9999
Copy link
Owner

daniellawson9999 commented Jul 25, 2022

Hi,

They do not directly mention this, but I roughly estimated how many rounds it should take for hopper from the figures. First, from Figure 4.1 I see that the peak reward is first reached in ~250 rounds. Also, in Figure 5.1, showing a different run, it seems this point is reached in about 180k-200k online samples, showing a bit of the equivalence between samples and rounds.

During each round, a single trajectory is added to the replay buffer. Each trajectory has a maximum length of 1000, but a trajectory can end earlier if the robot falls, so <= 1000 timesteps are added each round. If my implementation converges in say 70 rounds, this would translates to <= 70, 000 timesteps, but probably closer to around ~55-60k.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants