You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the action selected by PPO algorithms is not confined in the limits defined in the environments.
For example, the actionspace for the below testcase is self.action_space = spaces.Box(np.array([0, 0]).astype(np.float32), np.array([1, 1]).astype(np.float32)). But history actions from PPO are outside the boundary of [0,1]. mpc-drl-tl/testcases/gym-environments/five-zones-air/test_v2/test_ppo_tianshou.py
It is strange since the Tianshou PPO has the attributes action_scaling and action_bound_method. They have been activated in the above testcase but somehow does not work.
terrancelu92
changed the title
The environmental action limits are not considered in Tianshou PPO
The environmental action limits do not work in Tianshou PPO
Dec 12, 2021
This function is called in :meth:`~tianshou.data.Collector.collect` and only
affects action sending to env. Remapped action will not be stored in buffer
and thus can be viewed as a part of env (a black box action transformation).
Action mapping includes 2 standard procedures: bounding and scaling. Bounding
procedure expects original action range is (-inf, inf) and maps it to [-1, 1],
while scaling procedure expects original action range is (-1, 1) and maps it
to [action_space.low, action_space.high]. Bounding procedure is applied first.
terrancelu92
changed the title
The environmental action limits do not work in Tianshou PPO
The environmental action limits seems not work in Tianshou PPO
Dec 12, 2021
since the remapped action will not be stored in buffer and viewed as a black box action transformation, the action output from the buffer should also do the transformation.
First bounding to [-1,1]
It seems that the action selected by PPO algorithms is not confined in the limits defined in the environments.
For example, the actionspace for the below testcase is self.action_space = spaces.Box(np.array([0, 0]).astype(np.float32), np.array([1, 1]).astype(np.float32)). But history actions from PPO are outside the boundary of [0,1].
mpc-drl-tl/testcases/gym-environments/five-zones-air/test_v2/test_ppo_tianshou.py
It is strange since the Tianshou PPO has the attributes
action_scaling
andaction_bound_method
. They have been activated in the above testcase but somehow does not work.map_action
is here in Tianshou.Similar issues are reported in other RL libraries.
openai/baselines#121
rlworkgroup/garage#710
The text was updated successfully, but these errors were encountered: