Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Giving action to environments out of the limits still work and may mislead the users #1442

Closed
moguzozcan opened this issue Apr 14, 2019 · 2 comments

Comments

@moguzozcan
Copy link

moguzozcan commented Apr 14, 2019

The action space in "mountaincarcontinuous" environment accepts rewards between [-1, 1]. However, in the code, there is no control for that. If the user wants to take action like 3, the environment successfully returns the next state and reward. However, this may lead to some problems in terms of the user if he/she made a mistake during discretization. I think I would be good to add this control into the code base and warn the user.

reward = 0

if done:

    reward = 100.0

reward -= math.pow(action[0],2)*0.1
@abhinavsagar
Copy link
Contributor

@moguzozcan Sounds good. It would be much better if you open a PR so that we could test it upright.

@christopherhesse
Copy link
Contributor

I believe we have avoiding checking the validity of observations, rewards, and actions in existing environments. It should be possible to construct a wrapper that checks these and throws an exception if they are outside of the expected range.

If someone wants to make that wrapper we can link it from the wrappers page: https://github.com/openai/gym/blob/master/docs/wrappers.md

There's some more discussion around this topic over on baselines: openai/baselines#121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants