Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActorDistributionNetwork with bounded array_specs #216

Open
basvanopheusden opened this issue Oct 9, 2019 · 7 comments
Open

ActorDistributionNetwork with bounded array_specs #216

basvanopheusden opened this issue Oct 9, 2019 · 7 comments
Assignees
Labels
type:bug Something isn't working

Comments

@basvanopheusden
Copy link

basvanopheusden commented Oct 9, 2019

When building an ActorDistributionNetwork with bounded array_specs, the network occasionally produces actions that violate the bounds. This seems to be a result of the line scale_distribution=False in line 48 of actor_distribution_network.py.

  return normal_projection_network.NormalProjectionNetwork(
      action_spec,
      init_means_output_factor=init_means_output_factor,
      std_bias_initializer_value=std_bias_initializer_value,
      scale_distribution=False)

I was able to workaround the problem by copying this function, changing scale_distribution to True and passing it as an argument to the initializer for ActorDistributionNetwork, but perhaps we can consider changing the default to be True

@kuanghuei
Copy link
Contributor

What NormalProjectionNetwork does is squashing actions with tanh and it shouldn't go out of the bounds if scale_distribution=False. I am not very sure why this can happen. Would you like to provide more context?

@basvanopheusden
Copy link
Author

Here's a minimal example: I make a very simple custom environment with 1 observation, 1 action, both 10-element vectors with each entry in [0,1]. The dynamics of the environment are such that no matter what action one takes, it terminates with reward 0 after a single step (this is intentionally a very dumb environment)

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np

from tf_agents.environments import py_environment
from tf_agents.environments import tf_py_environment
from tf_agents.networks import actor_distribution_network
from tf_agents.trajectories import time_step as ts
from tf_agents.specs import array_spec

tf.compat.v1.enable_v2_behavior()

class TestEnv(py_environment.PyEnvironment):

    def __init__(self):
        self._action_spec = array_spec.BoundedArraySpec(
                shape=(10,), dtype=np.float32, minimum=0, maximum=1, name='action')
        self._observation_spec = array_spec.BoundedArraySpec(
                shape=(10,), dtype=np.float32, minimum=0, maximum=1, name='observation')
        self._reset()

    def action_spec(self):
        return self._action_spec

    def observation_spec(self):
        return self._observation_spec

    def _reset(self):
        self._state = np.zeros(10,)
        return ts.restart(np.array([self._state], dtype=np.float32))

    def _step(self, action):
        return ts.termination(np.array([self._state], dtype=np.float32), reward=0)

I then wrap this environment to a TfPyEnvironment, create an ActorDistributionNetwork, and sample an action (without any training, just the initial network weights)

env = tf_py_environment.TFPyEnvironment(TestEnv())

actor_net = actor_distribution_network.ActorDistributionNetwork(
    env.observation_spec(),
    env.action_spec(),
    fc_layer_params=(10,10,10)
)

time_step = env.reset()
action_dist,network_state = actor_net(time_step.observation, time_step.step_type,{})
print(action_dist.sample().numpy().flatten())

This yields random outputs which are not always constrained to [0,1], for example on one run it yields:

[-0.00887287  0.10404605  0.70652753  0.684862   -0.00132495  0.1625691
  1.0059502   0.6492924   1.1910927   0.29230005]

@kuanghuei kuanghuei added the type:bug Something isn't working label Oct 10, 2019
@kuanghuei kuanghuei self-assigned this Oct 10, 2019
@LucCADORET
Copy link

Is there any update on this issue ? I have an environment that defines actions as

        self._action_spec = (
            array_spec.BoundedArraySpec(
                shape=(1,), dtype=np.int32, minimum=0, maximum=2, name='action'),
            array_spec.BoundedArraySpec(
                shape=(1,), dtype=np.float32, minimum=0, maximum=1, name='action_pct')
        )

And the boundaries of action_pct are also violated (going negative mostly) in the TfPyEnvironment, even though it passes the validate_py_environment with the PyEnvironment originally. Is setting the scale_distribution to True a valid workaround ?

@kuanghuei
Copy link
Contributor

kuanghuei commented Mar 2, 2020

PPO does not respect action boundaries: openai/baselines#121. Environment is expected to clip action values.
DDPG/D4PG clips action values in its policy.
SAC nicely handles this with a tanh squashed action distribution.

If you set scale_distribution to True, it will do tanh-squashing. We are adding action clipping to environment wrapper. Before it happens, you can handle it in your own environment or environment wrapper.

@oars
Copy link
Contributor

oars commented Mar 2, 2020

See:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/wrappers.py#L442

@kuanghuei
Copy link
Contributor

So TF-Agents ddpg does clipping in policy:
https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ddpg/ddpg_agent.py#L166
If you are using ddpg, you should be good.

If you are using TF-Agents PPO, you should use the ActionClipWrapper that @oars mentioned above.

@lqchl
Copy link

lqchl commented Apr 8, 2024

I have encountered the following problems, how can I solve them:TypeError: init() got an unexpected keyword argument 'outer_rank'
In call to configurable 'NormalProjectionNetwork' (<class 'tf_agents.networks.normal_projection_network.NormalProjectionNetwork'>)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants