Confusion with continuous action space (`DiagGaussian`) #109

KeirSimmons · 2018-08-14T15:20:21Z

My action space is as such: Box(np.array([0]), np.array([0.1]))

Using a Box automatically makes the agent model the action space using the DiagGaussian distribution. However, the actions sampled from this distribution do not lie within [0, 0.1]. Can you please explain how to interpret this, and how to effectively use the correct action from the sampled value?

The text was updated successfully, but these errors were encountered:

timmeinhardt · 2018-08-14T16:02:42Z

I am not very familiar with non-discrete action spaces but in general your policy model is not aware of the range specified in the action space. In the case of a continuous action space our policy model implements a Gaussian distribution to sample values as actions. As far as I know, the mapping to your specific range is then usually done in the step method of your environment. See for example the continuous mountain car where they clip the action value.

KeirSimmons · 2018-08-14T16:09:20Z

So I have two different environments, one with the action space as above, and another with [0, 1000] (both continuous). I have tried two different approaches:

Take the action value as is, no clipping or augmentation. This produces poor results (obviously) but it seems like the values very slowly approach the given range (due to the reward signal pushing it there). So for the [0,1000] space the values were originally around [-1.0, 1.0] I assume and this pushed up to [-10.0, 10.0] (obviously the range is actually infinite, but variance increased is what I am trying to get at).
I tried to augment the action, by adding 1 and multiplying by 500. So [-1.0, 1.0] to [0, 1000]. This seemed to do well in training, but when it came to 'enjoying' the values outputted by the Gaussian (now using mode rather than sample) were much larger in magnitude than during testing, and so the chosen action was always 1000 after augmenting.

So I'm curious as to how best to approach the clipping/augmentation. Seeing as it's gaussian and 0-centred (I assume?), clipping negatives already throws away half of the distribution which will greatly bias the agent.

rwightman · 2018-08-25T20:13:29Z

I'm not sure if you've noticed, but there is a similar and more extensive conversation on this clip/scale in the environment vs handle in the model topic at openai/baselines#121

ikostrikov · 2018-08-30T13:14:52Z

I would say the right way to handle this is to apply tanh and handle probabilities properly:

See Appendix C:
https://arxiv.org/pdf/1801.01290.pdf

wranai · 2018-08-31T14:40:45Z

Tanh sounds good but I really like what somebody mentioned on the thread @rwightman referenced, to use the beta distribution, i.e. return not a mean/stddev pair for a normal distribution but the two parameters for a beta distribution (then transform the 0..1 range to between the Box limits). I guess the best way to return the params would be in the -inf..inf range, and then apply softplus to constrain them to be positive. 1 is a special value for the beta parameters so that's why I would go with softplus and not exp; but I may be wrong.

…date argument

ikostrikov closed this as completed Mar 15, 2019

idobrusin added a commit to lukashermann/pytorch-a2c-ppo-acktr that referenced this issue Oct 14, 2020

ikostrikov#109 Add type to discount factor (tau) in target network up…

975063b

…date argument

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion with continuous action space (`DiagGaussian`) #109

Confusion with continuous action space (`DiagGaussian`) #109

KeirSimmons commented Aug 14, 2018

timmeinhardt commented Aug 14, 2018

KeirSimmons commented Aug 14, 2018

rwightman commented Aug 25, 2018

ikostrikov commented Aug 30, 2018

wranai commented Aug 31, 2018 •

edited

Loading

Confusion with continuous action space (DiagGaussian) #109

Confusion with continuous action space (DiagGaussian) #109

Comments

KeirSimmons commented Aug 14, 2018

timmeinhardt commented Aug 14, 2018

KeirSimmons commented Aug 14, 2018

rwightman commented Aug 25, 2018

ikostrikov commented Aug 30, 2018

wranai commented Aug 31, 2018 • edited Loading

Confusion with continuous action space (`DiagGaussian`) #109

Confusion with continuous action space (`DiagGaussian`) #109

wranai commented Aug 31, 2018 •

edited

Loading