-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion with continuous action space (DiagGaussian
)
#109
Comments
I am not very familiar with non-discrete action spaces but in general your policy model is not aware of the range specified in the action space. In the case of a continuous action space our policy model implements a Gaussian distribution to sample values as actions. As far as I know, the mapping to your specific range is then usually done in the |
So I have two different environments, one with the action space as above, and another with
So I'm curious as to how best to approach the clipping/augmentation. Seeing as it's gaussian and 0-centred (I assume?), clipping negatives already throws away half of the distribution which will greatly bias the agent. |
I'm not sure if you've noticed, but there is a similar and more extensive conversation on this clip/scale in the environment vs handle in the model topic at openai/baselines#121 |
I would say the right way to handle this is to apply tanh and handle probabilities properly: See Appendix C: |
Tanh sounds good but I really like what somebody mentioned on the thread @rwightman referenced, to use the beta distribution, i.e. return not a mean/stddev pair for a normal distribution but the two parameters for a beta distribution (then transform the 0..1 range to between the Box limits). I guess the best way to return the params would be in the -inf..inf range, and then apply softplus to constrain them to be positive. 1 is a special value for the beta parameters so that's why I would go with softplus and not exp; but I may be wrong. |
My action space is as such:
Box(np.array([0]), np.array([0.1]))
Using a
Box
automatically makes the agent model the action space using theDiagGaussian
distribution. However, the actions sampled from this distribution do not lie within[0, 0.1]
. Can you please explain how to interpret this, and how to effectively use the correct action from the sampled value?The text was updated successfully, but these errors were encountered: