-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gym.vector] Add BatchedVectorEnv, (chunking + flexible n_envs) #2072
Conversation
Resolves #1795 |
@lebrice can you please get this to pass tests? This is something of note to me. |
@lebrice this appears to still be failing tests? |
Hey @justinkterry yes, the tests for python versions below 3.6 are failing, I think it might have to do with the type hints. I'll take a quick look, but if it comes to the type hints, I'm not currently planning on removing them to be honest, and this PR probably won't get accepted even if I did, given that the people at openai are probably working on other things.. |
Hey @justinkterry, @pzhokhov @tristandeleu , the issues above have been fixed. Would you mind maybe taking a look over the code and letting me know what you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main strength of BatchedVectorEnv
is to allow the batch size to not be a multiple of the number of workers. In my opinion this is a somewhat niche use case. Even the chunking option in SubprocVecEnv, which prompted #1795, only handles cases where the overall number of environments is a multiple of the number of workers.
I would rather see a fix to AsyncVectorEnv
to allow it to run SyncVectorEnv
environments first, possibly with a VectorEnvWrapper
to flatten the first two batch dimensions (similar to you unroll
function), before going for a new VectorEnv
subclass. And if there is major interest in having instances where the batch size is not a multiple of the number of workers, then we can think about adding BatchedVectorEnv
.
That being said @pzhokhov & @christopherhesse have the final say on it, and this is a good PR!
from gym.vector.utils.spaces import _BaseGymSpaces | ||
|
||
|
||
@singledispatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really cool! It could be worth updating some of the utility functions in gym.vector
with singledispatch
(in another PR of course, this would be outside the scope of this one)
EDIT: I just saw #2093 that keeps track of that thanks!
# return self.viewer.isopen | ||
|
||
|
||
def distribute(values: Sequence[T], n_groups: int) -> List[Sequence[T]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utility functions could possibly be moved to somewhere in gym.vector.utils
if you think this makes sense.
They also require tests (for distribute
, chunk
, unroll
, fuse_and_batch
, n_consecutive
and zip_dicts
).
env = BatchedVectorEnv(env_fns, n_workers=n_workers) | ||
env.seed(123) | ||
|
||
assert env.single_observation_space[0].shape == (4,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more explicit to equal env.single_observation_space
to its Tuple
space directly, instead of checking individual elements. Maybe something like
env.single_observation_space == Tuple(Box(-high, high, dtype=np.float32), Discrete(1))
And same thing for env.observation_space
below. I understand that the observation space for CartPole is a bit verbose in its definition of high
, which might not be relevant in this test, but maybe this calls for a simpler environment (e.g. one of gym.envs.unittest
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the hard-coded "CartPole" environment reference, and I compared with the single observation spaces and their batched counterparts.
gym/vector/batched_vector_env.py
Outdated
return groups | ||
|
||
|
||
def unroll(chunks: Sequence[Sequence[T]], item_space: Space = None) -> List[T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks if chunks
is a dict
(if the space is a dict space). Here is a snippet:
import gym
import numpy as np
from collections import OrderedDict
from gym.spaces import Dict, Box, Discrete
from gym.vector import BatchedVectorEnv
class DictEnv(gym.Env):
def __init__(self):
super(DictEnv, self).__init__()
self.observation_space = Dict(OrderedDict([
('position', Box(-2., 2., shape=(2,), dtype=np.float32)),
('velocity', Box(0., 1., shape=(2,), dtype=np.float32))
]))
self.action_space = Discrete(2)
def reset(self):
return self.observation_space.sample()
def step(self, action):
observation = self.observation_space.sample()
reward, done = 0., False
return (observation, reward, done, {})
def make_env(seed):
def _make_env():
return DictEnv()
return _make_env
env = BatchedVectorEnv([make_env(i) for i in range(10)], n_workers=4)
observations = env.reset()
Here obs_a
(after the call to unroll) in reset
is ['p', 'o', 's', 'i', 't', 'i', 'o', 'n', 'v', 'e', 'l', 'o', 'c', 'i', 't', 'y']
.
EDIT: For reference, with env = AsyncVectorEnv([make_env(i) for i in range(4)])
, observations
is a dict
with two arrays of size (4, 2)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks for pointing this out. I was indeed missing a test case for this, so I reused your snippet directly as a test, and this is now behaving correctly.
I'll probably also add other tests to cover when the actions are Tuples/Dicts/weird spaces as well.
@lebrice I'm a maintainer now so I can actually merge this. If you could please add detailed automated testing of this I should be happy to, pending the secondary approval of @benblack769. |
Reviewer: @benblack769 |
Adds the following features, compared to using the vectorized Async and Sync VectorEnvs: - Chunking: Running more than one environment per worker. This is done by passing `SyncVectorEnv`s as the env_fns to the `AsyncVectorEnv`. - Flexible batch size: Supports any number of environments, irrespective of the number of workers or of CPUs. The number of environments will be spread out as equally as possible between the workers. For example, if you want to have a batch_size of 17, and n_workers is 6, then the number of environments per worker will be: [3, 3, 3, 3, 3, 2]. Internally, this works by creating up to two AsyncVectorEnvs, env_a and env_b. If the number of envs (batch_size) isn't a multiple of the number of workers, then we create the second AsyncVectorEnv (env_b). In the first environment (env_a), each env will contain ceil(n_envs / n_workers) each. If env_b is needed, then each of its envs will contain floor(n_envs / n_workers) environments. The observations/actions/rewards are reshaped to be (n_envs, *shape), i.e. they don't have an extra 'chunk' dimension. Additionally, this adds the following change to the AsyncVectorEnv and SyncVectorEnv classes: - When some environments have `done=True` while stepping, those environments are reset, as was done previously. Additionally, the final observation for those environments is placed in the info dict at key FINAL_STATE_KEY (currently 'final_state'). Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
ec8cbbf
to
b274060
Compare
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
@lebrice Do you have any plans for this PR? |
Hey @pseudo-rnd-thoughts , Yes I do, but I'm waiting on #2104 before I work more on this. |
Adds the following features, compared to using the vectorized Async and Sync
VectorEnvs:
Chunking: Running more than one environment per worker. This is done by
passing
SyncVectorEnv
s as the env_fns to theAsyncVectorEnv
.Flexible batch size: Supports any number of environments, irrespective
of the number of workers or of CPUs. The number of environments will be
spread out as equally as possible between the workers.
For example, if you want to have a batch_size of 17, and n_workers is 6,
then the number of environments per worker will be: [3, 3, 3, 3, 3, 2].
Internally, this works by creating up to two AsyncVectorEnvs, env_a and
env_b. If the number of envs (batch_size) isn't a multiple of the number
of workers, then we create the second AsyncVectorEnv (env_b).
In the first environment (env_a), each env will contain
ceil(n_envs / n_workers) each. If env_b is needed, then each of its envs
will contain floor(n_envs / n_workers) environments.
The observations/actions/rewards are reshaped to be (n_envs, *shape), i.e.
they don't have an extra 'chunk' dimension.
Signed-off-by: Fabrice Normandin [email protected]