-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the range of output of PPO controlled by the action_space setting? #710
Comments
This is a question about (deep) RL fundamentals and implementation, and not really about garage. You may find my explanation unsatisfying, because our time is quite limited and this is not really the proper forum for these type of questions.
Please take a look at the code if you want an in-depth understanding. You might also find the slides for this course helpful. Lecture 5 gives a nice overview of policy gradients. I am going to close this question as off-topic. The scope of this project is creating and sustaining great implementations of deep RL algorithms, but not teaching people deep RL fundamentals. You are welcome to post questions about how to achieve things with the software, enhance it, or solving bugs, but please refrain from using this issue tracker as a resource for learning about deep RL itself. Unfortunately, we can't be all things to all people and our resources are limited. |
@ryanjulian Though I think the issue should not be closed so early, what I want to know is that if "garage" had considered the possible inconsistency between "env's action range" and "model's Gaussian output". As far as I know, openai baseline does not have a good consideration on this for most policy gradient algorithms using Gaussian distribution until last year. You can see the discussion on this issue. Sometimes when the output of the model exceeds the hard limit of env's action, the output is meaningless in env. If we simply clip it into some range, that does not help the model learn how to generate a reasonable action in a reasonable range. I think this is why I ask this question. Thanks! |
Hi @kashif @jbn @bichengcao @zhongwen @ViktorM ,
I have a question about the setting of action_space in Env class.
If I set up the low limit and high limit of action_space (suppose [5,10]). Is the range of action output via PPO algorithm is controlled by my action_space setting? since the output of PPO includes the mean of Gaussian distribution. Is the mean is always zero, or changed by action_space range.
Thanks !
The text was updated successfully, but these errors were encountered: