-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional gym wrapper #1007
Optional gym wrapper #1007
Conversation
Hi, @Sohojoe, I know you put together your own gym wrapper for your MujocoUnity benchmarking work. Would you be willing to take a look at this one (or share yours) so we can get a sense of whether or not we might be missing certain relevant functions? |
python/unityagents/unity_gym_env.py
Outdated
"if it is wrapped in a gym.") | ||
|
||
|
||
class UnityEnv(gym.Env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest calling this class something more specific, out of context, it is very similar to UnityEnvironment
Thanks for the wrapper, this is going to be very useful! Could you talk a little bit about the 'random actions for all other agents' bit? Is that something many users will want to happen during |
@iandanforth I am not sure it will stay that way. The question is how to handle cases where people want to launch multi-agent environments. I see a few possible options. The first is that we only allow the wrapper to be used in single-agent environment, and provide an error otherwise. In that case we could perhaps provide two wrappers, a single-agent and a multi-agent one. Right now I am splitting the difference, and allowing multi-agent environments, but only enabling control for the first agent. Since the agents have to take some action, they take a random one. Not really ideal though. |
Current Multi-Agent Gym environments use a list of numpy arrays as input to List of numpy arrays is used for Multi-Agent environments because different agents aren't guaranteed to have action_array of the same dimension as each other. Single-Agent syntax is a numpy array; Multi-Agent Syntax is a list of numpy arrays. Use this difference in syntax to automatically distinguish between when user intends to have a single agent that has multidimensional action (e.g. controlling multiple limbs of ant) vs when user intends to pass in action(s) for each agent. Only edge case I can think of would be when a single agent takes continuous & discrete actions simultaneously, in which case a list of numpy arrays would be the input of a single agent (unless the discrete actions are just converted to floats so that they can be passed along with the continuous actions in a single float32 numpy array). |
@awjuliani were you able to get the trained DQN model back into Unity? I got DDQN to train (see below) but then remembered that it doesn't save the model and that I had to extend their code which was a pain to maintain. This issue talks to modifying DDPG to add saving / loading openai/baselines#162 - I'm not sure how to make it save in the .byte format. So looks like to get DDPG support we will need to pretty much make a copy of the baseline files to handle those two changes. Regarding multi-agent - if by this we mean multiple instances of the same brain... In baselines, this is done via MPI whereby it spins up multiple gym enviroments and multiple python workers. I've not tried MPI as it doesn't nativly support windows. Steps I used to train a ml-agent with baselines DDPG
from unityagents import UnityEnvironment
from gym_wrapper import GymWrapper
# Create envs.
env = gym.make(env_id) to: # Create envs.
raw_env = UnityEnvironment(env_id)
env = GymWrapper(raw_env)
|
Do you think you'd support the GoalEnv spec: https://github.com/openai/gym/blob/master/gym/core.py#L154 |
Great to see that this is in the working! OpenAI Gym has tuple and dictionary spaces that allow for nested observations and actions. It would be great if multiple modalities (coordinates, image, etc) would come as a dictionary. I've worked with both Gym and dm_control and having the modalities separated in most environments is super useful for research. There are also env wrappers in Gym to flatten dict observations. For multi-agent envs, the tuple space also seems worth considering. After all, the idea of Gym is to provide a single interface and I think it's capable of supporting (synchronous) multi-agent envs. Structured like this, people could even train normal algorithms on multi-agent envs without having to change anything. They just need to flatten and concatenate the observation element to feed them into their network, which is needed anyway. |
Thanks for the feedback, everyone. @Sohojoe Thanks for checking this out. I tried it myself today, and can verify that 3DBall trains using DDPG. Going to go through the other main baselines in the next couple days. @machinaut I definitely would love to add this interface too. We have on our roadmap to add the ability to expose a goal vector via the API, so once we add that, then we can also make it compatible with @danijar Thanks for pointing this out. In the couple weeks I have spent diving into the gym ecosystem, I have been happily surprised by how much work has been done to extend it in all sorts of interesting ways. I am very open to the possibility of getting to the point that our own internal work takes advantage of some of these wrappers. One thing I want to be conscious of is ensuring compatibility (partially why I made the ask for feedback on Twitter). If we decide to go with a specific way of doing multi-agent (where agents have different observation/action spaces) for example, we want to ensure that it will be something people can plug into others algorithms easily. If there are standards though, we are happy to use them 😃 |
It might also be worth taking a look at OpenAI's multi envs for the competitive self-play paper. It seems like also take in a tuple of actions and return a tuple of observations. They do deviate a bit from the standard Gym interface in that the rewards, done flags, and info dicts become tuples as well. |
@@ -0,0 +1,259 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the getting-started-gym.ipynb is specific to the gym-unity folder, does it make more sense to move it under the gym-unity folder?
Also another way is to move the gym-unity folder under python folder.
Also within the getting-started-gym.ipynb and getting-started.ipynb, the default env_name is pointed to "../envs/3DBall" and "../envs/GridWorld", and in the python/notebooks/ folder, currently its parent folder doesn't contain any envs folder, which makes it confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the re-org we will be doing, we will move the notebooks to the top level.
I will be more specific about the need for an envs
folder.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"env_name = \"../envs/GridWorld\" # Name of the Unity environment binary to launch\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here mention that for the "Gym Wrapper Basics", only environment with 1 agent is supported.
# Conflicts: # python/tests/mock_communicator.py
from setuptools import setup, Command, find_packages | ||
|
||
setup(name='gym_unity', | ||
version='0.1.0', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this version number determined?
python/gym/README.md
Outdated
A common way in which researchers interact with simulation environments is via wrapper provided by OpenAI called `gym`. Here we provide a gym wrapper, and instructions for using it with existing research projects which utilize gyms. | ||
|
||
## `unity_gym.py` | ||
First draft on a gym wrapper for ML-Agents. To launch an environmnent use : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/enviromnent/environment
gym-unity/Readme.md
Outdated
The gym wrapper can be installed using: | ||
|
||
``` | ||
pip install gym-unity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/gym-unity
to s/gym_unity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
gym-unity/Readme.md
Outdated
pip install gym-unity | ||
``` | ||
|
||
or by running the following from the `/gym-unity` directory of the repository: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/gym-unity
to s/gym_unity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the correct folder name.
Adds optional gym wrapper
UnityEnv
to use as python interfaces to Unity environments.Current thinking is that our main python interface
UnityEnvironment
will remain fully featured, and the focus of the gym wrapper will be on maintaining compatibility with pre-existing algorithms written around OpenAI gym. As such, complex observation spaces which combine multiple visual observations, or visual and vector observations will not be available out of the box.Multi-agent is supported through the use of lists, as done in: https://github.com/openai/multiagent-particle-envs
Limitations:
To-Do: