Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The environmental action limits seems not work in Tianshou PPO #92

Closed
terrancelu92 opened this issue Dec 12, 2021 · 2 comments
Closed
Assignees

Comments

@terrancelu92
Copy link
Collaborator

terrancelu92 commented Dec 12, 2021

It seems that the action selected by PPO algorithms is not confined in the limits defined in the environments.
For example, the actionspace for the below testcase is self.action_space = spaces.Box(np.array([0, 0]).astype(np.float32), np.array([1, 1]).astype(np.float32)). But history actions from PPO are outside the boundary of [0,1].
mpc-drl-tl/testcases/gym-environments/five-zones-air/test_v2/test_ppo_tianshou.py

It is strange since the Tianshou PPO has the attributes action_scaling and action_bound_method. They have been activated in the above testcase but somehow does not work.

map_action is here in Tianshou.

Similar issues are reported in other RL libraries.
openai/baselines#121
rlworkgroup/garage#710

@terrancelu92 terrancelu92 self-assigned this Dec 12, 2021
@terrancelu92 terrancelu92 changed the title The environmental action limits are not considered in Tianshou PPO The environmental action limits do not work in Tianshou PPO Dec 12, 2021
@terrancelu92
Copy link
Collaborator Author

terrancelu92 commented Dec 12, 2021

In Tianshou for action mapping,

This function is called in :meth:`~tianshou.data.Collector.collect` and only
        affects action sending to env. Remapped action will not be stored in buffer
        and thus can be viewed as a part of env (a black box action transformation).
        Action mapping includes 2 standard procedures: bounding and scaling. Bounding
        procedure expects original action range is (-inf, inf) and maps it to [-1, 1],
        while scaling procedure expects original action range is (-1, 1) and maps it
        to [action_space.low, action_space.high]. Bounding procedure is applied first.

@terrancelu92 terrancelu92 changed the title The environmental action limits do not work in Tianshou PPO The environmental action limits seems not work in Tianshou PPO Dec 12, 2021
@terrancelu92
Copy link
Collaborator Author

terrancelu92 commented Dec 12, 2021

since the remapped action will not be stored in buffer and viewed as a black box action transformation, the action output from the buffer should also do the transformation.
First bounding to [-1,1]

act = np.clip(act, -1.0, 1.0)

Then scaling

low, high = self.action_space.low, self.action_space.high
                act = low + (high - low) * (act + 1.0) / 2.0

Close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant