-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActorDistributionNetwork with bounded array_specs #216
Comments
What NormalProjectionNetwork does is squashing actions with tanh and it shouldn't go out of the bounds if |
Here's a minimal example: I make a very simple custom environment with 1 observation, 1 action, both 10-element vectors with each entry in [0,1]. The dynamics of the environment are such that no matter what action one takes, it terminates with reward 0 after a single step (this is intentionally a very dumb environment)
I then wrap this environment to a
This yields random outputs which are not always constrained to [0,1], for example on one run it yields:
|
Is there any update on this issue ? I have an environment that defines actions as
And the boundaries of |
PPO does not respect action boundaries: openai/baselines#121. Environment is expected to clip action values. If you set scale_distribution to True, it will do tanh-squashing. We are adding action clipping to environment wrapper. Before it happens, you can handle it in your own environment or environment wrapper. |
So TF-Agents ddpg does clipping in policy: If you are using TF-Agents PPO, you should use the ActionClipWrapper that @oars mentioned above. |
I have encountered the following problems, how can I solve them:TypeError: init() got an unexpected keyword argument 'outer_rank' |
When building an ActorDistributionNetwork with bounded array_specs, the network occasionally produces actions that violate the bounds. This seems to be a result of the line
scale_distribution=False
in line 48 ofactor_distribution_network.py
.I was able to workaround the problem by copying this function, changing
scale_distribution
toTrue
and passing it as an argument to the initializer forActorDistributionNetwork
, but perhaps we can consider changing the default to beTrue
The text was updated successfully, but these errors were encountered: