-
-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] check_step_determinism obscur working #1111
Comments
The only thing I can think of is perhaps your environment does not properly reset internal state after the second reset Does this pass?: import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence
env = gym.make("PandaPickAndPlace-v3").unwrapped
check_env(env, skip_render_check=True) # does this fail?? |
It fails. Doesn't make any sense 😅 import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence
import traceback
env = gym.make("PandaPickAndPlace-v3").unwrapped
try:
check_env(env, skip_render_check=True) # Fails
except Exception as exc:
traceback.print_exception(exc)
seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)
assert data_equivalence(obs_0, obs_1) # Passes
|
I have no idea how, this is failing
Gymnasium/gymnasium/utils/env_checker.py Lines 188 to 218 in a09dcfd
you could try adding a |
I can reproduce the error but seems to require a strange setup You must reset, step, reset, step for the second step to fail equivalence If I change the error to assert then I can discover the AssertionError: data_1 - data_2=array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, -4.6938658e-07,
4.4703484e-07, -3.2596290e-07, 1.1971366e-05, -1.0663000e-05,
3.4053983e-06, 2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
1.1928683e-03, 1.7360186e-03, 2.0111194e-04], dtype=float32) Looking at all the environment, this is a problem for most of them except Reach and Slide import gymnasium as gym
from gymnasium.utils.env_checker import data_equivalence
import panda_gym
from panda_gym.envs import PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv
gym.register_envs(panda_gym)
for env_cls in [PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv]:
env = env_cls()
print(f'{env}')
seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()
obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)
obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
obs_7, _, _, _, _ = env.step(action_1)
data_equivalence(obs_0, obs_4)
data_equivalence(obs_1, obs_5)
print(f'{obs_1["observation"] - obs_5["observation"]=}')
data_equivalence(obs_2, obs_6)
data_equivalence(obs_3, obs_7) Of which, all of these differences only exist in the task (not the robot) |
I've found a sort of source for the noise in noise=array([0.04380345, 0.00589226, 0. ])
<PandaPickAndPlaceEnv instance>
noise=array([-0.09722823, 0.09362835, 0. ])
noise=array([-0.07651062, 0.09727248, 0. ])
noise=array([-0.09722823, 0.09362835, 0. ])
noise=array([-0.07651062, 0.09727248, 0. ])
obs_1["observation"] - obs_5["observation"]=array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, -4.6938658e-07,
4.4703484e-07, -3.2596290e-07, 1.1971366e-05, -1.0663000e-05,
3.4053983e-06, 2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
1.1928683e-03, 1.7360186e-03, 2.0111194e-04], dtype=float32) I can't figure out why adding this noise is causing the output to change def _sample_object(self) -> np.ndarray:
"""Randomize start position of object."""
object_position = np.array([0.0, 0.0, self.object_size / 2])
noise = self.np_random.uniform(self.obj_range_low, self.obj_range_high)
object_position += np.array([-0.09722823, 0.09362835, 0. ])
return object_position the problem persists even if we are still adding the noise EDIT: The next day I can't replicate the last point |
Looking at the next day, I can't replicate the problem I noted at the end I tested the minimal example seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()
obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1) # This line is necessary
obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1) # This line isn't necessary for the issue
print(f'{obs_1["observation"] - obs_5["observation"]=}') Another test I made was to add another reset case to compare the 3 observations seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()
obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1) # necessary
obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1) # unnecessary
obs_8, _ = env.reset(seed=seed)
obs_9, _, _, _, _ = env.step(action_0)
obs_10, _ = env.reset()
# obs_11, _, _, _, _ = env.step(action_1) # unnecessary
print(f'{obs_1["observation"] - obs_5["observation"]=}')
print(f'{obs_5["observation"] - obs_9["observation"]=}')
print(f'{obs_1["observation"] - obs_9["observation"]=}') Checking the seeding, separating the action space and reset seeding, only the reset seeding affects the observation, i.e., the actual action taken doesn't matter The last check I've made is related to the |
Describe the bug
It's the kind of issue that's hard to name or explain, or even reduce to simple code. But here's what I've observed since
check_step_determinism
was added: when I do the check myself, it passes. When it's the checker, it doesn't. For the moment the code depends on panda_gym, sorry for that, I'll reduce it in the past, but I wanted to postpone it as soon as possible.Code example
What's even weirder, is that it only happens in two environments. I'll keep digging and let you know.
System info
Gymnasium 1.0.0a2
Panda-gym 10c4d8a
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: