ActorDistributionNetwork with bounded array_specs #216

basvanopheusden · 2019-10-09T17:43:32Z

When building an ActorDistributionNetwork with bounded array_specs, the network occasionally produces actions that violate the bounds. This seems to be a result of the line scale_distribution=False in line 48 of actor_distribution_network.py.

  return normal_projection_network.NormalProjectionNetwork(
      action_spec,
      init_means_output_factor=init_means_output_factor,
      std_bias_initializer_value=std_bias_initializer_value,
      scale_distribution=False)

I was able to workaround the problem by copying this function, changing scale_distribution to True and passing it as an argument to the initializer for ActorDistributionNetwork, but perhaps we can consider changing the default to be True

The text was updated successfully, but these errors were encountered:

kuanghuei · 2019-10-09T23:07:19Z

What NormalProjectionNetwork does is squashing actions with tanh and it shouldn't go out of the bounds if scale_distribution=False. I am not very sure why this can happen. Would you like to provide more context?

basvanopheusden · 2019-10-10T16:28:56Z

Here's a minimal example: I make a very simple custom environment with 1 observation, 1 action, both 10-element vectors with each entry in [0,1]. The dynamics of the environment are such that no matter what action one takes, it terminates with reward 0 after a single step (this is intentionally a very dumb environment)

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np

from tf_agents.environments import py_environment
from tf_agents.environments import tf_py_environment
from tf_agents.networks import actor_distribution_network
from tf_agents.trajectories import time_step as ts
from tf_agents.specs import array_spec

tf.compat.v1.enable_v2_behavior()

class TestEnv(py_environment.PyEnvironment):

    def __init__(self):
        self._action_spec = array_spec.BoundedArraySpec(
                shape=(10,), dtype=np.float32, minimum=0, maximum=1, name='action')
        self._observation_spec = array_spec.BoundedArraySpec(
                shape=(10,), dtype=np.float32, minimum=0, maximum=1, name='observation')
        self._reset()

    def action_spec(self):
        return self._action_spec

    def observation_spec(self):
        return self._observation_spec

    def _reset(self):
        self._state = np.zeros(10,)
        return ts.restart(np.array([self._state], dtype=np.float32))

    def _step(self, action):
        return ts.termination(np.array([self._state], dtype=np.float32), reward=0)

I then wrap this environment to a TfPyEnvironment, create an ActorDistributionNetwork, and sample an action (without any training, just the initial network weights)

env = tf_py_environment.TFPyEnvironment(TestEnv())

actor_net = actor_distribution_network.ActorDistributionNetwork(
    env.observation_spec(),
    env.action_spec(),
    fc_layer_params=(10,10,10)
)

time_step = env.reset()
action_dist,network_state = actor_net(time_step.observation, time_step.step_type,{})
print(action_dist.sample().numpy().flatten())

This yields random outputs which are not always constrained to [0,1], for example on one run it yields:

[-0.00887287  0.10404605  0.70652753  0.684862   -0.00132495  0.1625691
  1.0059502   0.6492924   1.1910927   0.29230005]

LucCADORET · 2020-02-08T14:06:42Z

Is there any update on this issue ? I have an environment that defines actions as

        self._action_spec = (
            array_spec.BoundedArraySpec(
                shape=(1,), dtype=np.int32, minimum=0, maximum=2, name='action'),
            array_spec.BoundedArraySpec(
                shape=(1,), dtype=np.float32, minimum=0, maximum=1, name='action_pct')
        )

And the boundaries of action_pct are also violated (going negative mostly) in the TfPyEnvironment, even though it passes the validate_py_environment with the PyEnvironment originally. Is setting the scale_distribution to True a valid workaround ?

kuanghuei · 2020-03-02T22:29:55Z

PPO does not respect action boundaries: openai/baselines#121. Environment is expected to clip action values.
DDPG/D4PG clips action values in its policy.
SAC nicely handles this with a tanh squashed action distribution.

If you set scale_distribution to True, it will do tanh-squashing. We are adding action clipping to environment wrapper. Before it happens, you can handle it in your own environment or environment wrapper.

oars · 2020-03-02T22:31:53Z

See:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/wrappers.py#L442

kuanghuei · 2020-03-05T23:14:44Z

So TF-Agents ddpg does clipping in policy:
https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ddpg/ddpg_agent.py#L166
If you are using ddpg, you should be good.

If you are using TF-Agents PPO, you should use the ActionClipWrapper that @oars mentioned above.

lqchl · 2024-04-08T03:10:32Z

I have encountered the following problems, how can I solve them：TypeError: init() got an unexpected keyword argument 'outer_rank'
In call to configurable 'NormalProjectionNetwork' (<class 'tf_agents.networks.normal_projection_network.NormalProjectionNetwork'>)

kuanghuei added the type:bug Something isn't working label Oct 10, 2019

kuanghuei self-assigned this Oct 10, 2019

kuanghuei mentioned this issue Mar 2, 2020

action constraints in PPO #320

Open

RachithP mentioned this issue Jun 23, 2021

[PPO] Trajectory action out of range #564

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ActorDistributionNetwork with bounded array_specs #216

ActorDistributionNetwork with bounded array_specs #216

basvanopheusden commented Oct 9, 2019 •

edited

Loading

kuanghuei commented Oct 9, 2019

basvanopheusden commented Oct 10, 2019

LucCADORET commented Feb 8, 2020

kuanghuei commented Mar 2, 2020 •

edited

Loading

oars commented Mar 2, 2020

kuanghuei commented Mar 5, 2020

lqchl commented Apr 8, 2024

ActorDistributionNetwork with bounded array_specs #216

ActorDistributionNetwork with bounded array_specs #216

Comments

basvanopheusden commented Oct 9, 2019 • edited Loading

kuanghuei commented Oct 9, 2019

basvanopheusden commented Oct 10, 2019

LucCADORET commented Feb 8, 2020

kuanghuei commented Mar 2, 2020 • edited Loading

oars commented Mar 2, 2020

kuanghuei commented Mar 5, 2020

lqchl commented Apr 8, 2024

basvanopheusden commented Oct 9, 2019 •

edited

Loading

kuanghuei commented Mar 2, 2020 •

edited

Loading