Optional gym wrapper #1007

awjuliani · 2018-07-24T22:50:09Z

Adds optional gym wrapper UnityEnv to use as python interfaces to Unity environments.

Current thinking is that our main python interface UnityEnvironment will remain fully featured, and the focus of the gym wrapper will be on maintaining compatibility with pre-existing algorithms written around OpenAI gym. As such, complex observation spaces which combine multiple visual observations, or visual and vector observations will not be available out of the box.

Multi-agent is supported through the use of lists, as done in: https://github.com/openai/multiagent-particle-envs

Limitations:

Multi-Brain not supported. Environments with multiple brains return an error.
Multiple visual observations or combined visual/vector observations not supported.
Gym Environment registration not supported. This is because it is difficult to pre-register (and enforce) a global environment binary location for each separate environment.

To-Do:

awjuliani · 2018-07-25T20:59:44Z

Hi, @Sohojoe, I know you put together your own gym wrapper for your MujocoUnity benchmarking work. Would you be willing to take a look at this one (or share yours) so we can get a sense of whether or not we might be missing certain relevant functions?

eshvk · 2018-07-25T22:35:41Z

python/unityagents/unity_gym_env.py

+            "if it is wrapped in a gym.")
+
+
+class UnityEnv(gym.Env):


Suggest calling this class something more specific, out of context, it is very similar to UnityEnvironment

iandanforth · 2018-07-26T17:52:51Z

Thanks for the wrapper, this is going to be very useful! Could you talk a little bit about the 'random actions for all other agents' bit? Is that something many users will want to happen during step()?

awjuliani · 2018-07-26T18:01:24Z

@iandanforth I am not sure it will stay that way. The question is how to handle cases where people want to launch multi-agent environments. I see a few possible options. The first is that we only allow the wrapper to be used in single-agent environment, and provide an error otherwise. In that case we could perhaps provide two wrappers, a single-agent and a multi-agent one. Right now I am splitting the difference, and allowing multi-agent environments, but only enabling control for the first agent. Since the agents have to take some action, they take a random one. Not really ideal though.

ethancaballero · 2018-07-26T19:10:36Z

Current Multi-Agent Gym environments use a list of numpy arrays as input to step():
https://github.com/openai/maddpg/blob/master/experiments/train.py#L112-L114

List of numpy arrays is used for Multi-Agent environments because different agents aren't guaranteed to have action_array of the same dimension as each other.

Single-Agent syntax is a numpy array; Multi-Agent Syntax is a list of numpy arrays.

Use this difference in syntax to automatically distinguish between when user intends to have a single agent that has multidimensional action (e.g. controlling multiple limbs of ant) vs when user intends to pass in action(s) for each agent.

Only edge case I can think of would be when a single agent takes continuous & discrete actions simultaneously, in which case a list of numpy arrays would be the input of a single agent (unless the discrete actions are just converted to floats so that they can be passed along with the continuous actions in a single float32 numpy array).

Sohojoe · 2018-07-27T07:01:01Z

@awjuliani were you able to get the trained DQN model back into Unity?

I got DDQN to train (see below) but then remembered that it doesn't save the model and that I had to extend their code which was a pain to maintain.

This issue talks to modifying DDPG to add saving / loading openai/baselines#162 - I'm not sure how to make it save in the .byte format.

So looks like to get DDPG support we will need to pretty much make a copy of the baseline files to handle those two changes.

Regarding multi-agent - if by this we mean multiple instances of the same brain... In baselines, this is done via MPI whereby it spins up multiple gym enviroments and multiple python workers. I've not tried MPI as it doesn't nativly support windows.

Steps I used to train a ml-agent with baselines DDPG

Copy gym_wrapper.py to the python folder
Create a copy of baselines.ddpg.main.py called run_ddpg.py in python folder
in run_ddpg.py add:

from unityagents import UnityEnvironment
from gym_wrapper import GymWrapper

in run_ddpg.py change:

  # Create envs.
  env = gym.make(env_id)

to:

    # Create envs.
    raw_env = UnityEnvironment(env_id)
    env = GymWrapper(raw_env)

run:

python learn_gym_ddpg.py MyMlAgentsEnv -params

machinaut · 2018-07-27T07:04:04Z

Do you think you'd support the GoalEnv interface as well -- this would make it compatible with the Hindsight Experience Replay baseline implementation. We've found that it helps with sample efficiency in simulated robotics tasks, so hopefully worth considering!

GoalEnv spec: https://github.com/openai/gym/blob/master/gym/core.py#L154

danijar · 2018-07-27T09:45:54Z

Great to see that this is in the working! OpenAI Gym has tuple and dictionary spaces that allow for nested observations and actions. It would be great if multiple modalities (coordinates, image, etc) would come as a dictionary. I've worked with both Gym and dm_control and having the modalities separated in most environments is super useful for research. There are also env wrappers in Gym to flatten dict observations.

For multi-agent envs, the tuple space also seems worth considering. After all, the idea of Gym is to provide a single interface and I think it's capable of supporting (synchronous) multi-agent envs. Structured like this, people could even train normal algorithms on multi-agent envs without having to change anything. They just need to flatten and concatenate the observation element to feed them into their network, which is needed anyway.

awjuliani · 2018-07-28T00:44:10Z

Thanks for the feedback, everyone.

@Sohojoe Thanks for checking this out. I tried it myself today, and can verify that 3DBall trains using DDPG. Going to go through the other main baselines in the next couple days.

@machinaut I definitely would love to add this interface too. We have on our roadmap to add the ability to expose a goal vector via the API, so once we add that, then we can also make it compatible with GoalEnv. I can imagine a lot of nice, simple example environments we could build to show it off as ell.

@danijar Thanks for pointing this out. In the couple weeks I have spent diving into the gym ecosystem, I have been happily surprised by how much work has been done to extend it in all sorts of interesting ways. I am very open to the possibility of getting to the point that our own internal work takes advantage of some of these wrappers. One thing I want to be conscious of is ensuring compatibility (partially why I made the ask for feedback on Twitter). If we decide to go with a specific way of doing multi-agent (where agents have different observation/action spaces) for example, we want to ensure that it will be something people can plug into others algorithms easily. If there are standards though, we are happy to use them 😃

danijar · 2018-07-28T16:50:24Z

It might also be worth taking a look at OpenAI's multi envs for the competitive self-play paper. It seems like also take in a tuple of actions and return a tuple of observations. They do deviate a bit from the standard Gym interface in that the rewards, done flags, and info dicts become tuples as well.

xiaomaogy · 2018-08-03T21:51:04Z

python/notebooks/getting-started-gym.ipynb

@@ -0,0 +1,259 @@
+{


Since the getting-started-gym.ipynb is specific to the gym-unity folder, does it make more sense to move it under the gym-unity folder?

Also another way is to move the gym-unity folder under python folder.

Also within the getting-started-gym.ipynb and getting-started.ipynb, the default env_name is pointed to "../envs/3DBall" and "../envs/GridWorld", and in the python/notebooks/ folder, currently its parent folder doesn't contain any envs folder, which makes it confusing.

I think in the re-org we will be doing, we will move the notebooks to the top level.

I will be more specific about the need for an envs folder.

xiaomaogy · 2018-08-06T18:05:58Z

python/notebooks/getting-started-gym.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "env_name = \"../envs/GridWorld\"  # Name of the Unity environment binary to launch\n",


Maybe here mention that for the "Gym Wrapper Basics", only environment with 1 agent is supported.

# Conflicts: # python/tests/mock_communicator.py

eshvk · 2018-08-03T20:05:26Z

gym-unity/setup.py

+from setuptools import setup, Command, find_packages
+
+setup(name='gym_unity',
+      version='0.1.0',


How is this version number determined?

eshvk · 2018-08-03T20:07:21Z

python/gym/README.md

+A common way in which researchers interact with simulation environments is via wrapper provided by OpenAI called `gym`. Here we provide a gym wrapper, and instructions for using it with existing research projects which utilize gyms. 
+
+## `unity_gym.py`
+First draft on a gym wrapper for ML-Agents. To launch an environmnent use :


s/enviromnent/environment

eshvk · 2018-08-03T20:10:37Z

gym-unity/Readme.md

+The gym wrapper can be installed using:
+
+```
+pip install gym-unity


s/gym-unity to s/gym_unity

eshvk · 2018-08-03T20:10:45Z

gym-unity/Readme.md

+pip install gym-unity
+```
+
+or by running the following from the `/gym-unity` directory of the repository:


s/gym-unity to s/gym_unity

This is the correct folder name.

awjuliani added 8 commits June 25, 2018 16:52

Add initial gym wrapper

9b97bf0

Default to visual observations when present

ab8779a

Rename class

22dd085

Rename to unity_gym

b657115

Restructure files

48c505f

Merge branch 'develop' into develop-gym

ede6f66

Merge remote-tracking branch 'origin/develop' into develop-gym

b90d7f3

Fix line length

bcd8d99

awjuliani requested review from eshvk, xiaomaogy and vincentpierre July 24, 2018 22:50

eshvk reviewed Jul 25, 2018

View reviewed changes

awjuliani added 6 commits July 25, 2018 16:44

Rename and add documentation

ca60778

New notebook

91bde1b

Additional documentation improvements

24635f7

Update documentation

155b166

Remove unnecessary docstrings

7c05c1f

Fix single agent bug

5072e36

awjuliani changed the title ~~Add optional gym interface~~ Optional gym interface Jul 26, 2018

Merge remote-tracking branch 'origin/develop' into develop-gym

814007b

awjuliani added 6 commits July 26, 2018 14:37

Re-organize folders

5a7d095

Merge remote-tracking branch 'origin/develop' into develop-gym

3d72cc6

Clean notebook

d984c30

Update documentation

97fdb16

Remove /

e7337e4

Fix link reference

167383f

Use find_packages() in setup

54c3dbe

Merge into single wrapper

64cb610

awjuliani changed the title ~~Optional gym interface~~ Optional gym wrapper Jul 30, 2018

awjuliani added 4 commits July 30, 2018 16:40

Update getting started notebook

7133d54

Ignore checkpoints

0a61ef8

Code cleanup

1dbe32e

Add initial tests

1a988c5

awjuliani requested a review from mmattar July 31, 2018 00:43

awjuliani added 5 commits July 30, 2018 17:46

Ignore pytest cache

b1658a2

Merge branch 'develop' into develop-gym

47d49f8

Add documentation on using baselines

0746c68

Typo and formatting

c28ced2

Replace references to Basics.ipynb

61457b6

xiaomaogy reviewed Aug 3, 2018

View reviewed changes

xiaomaogy reviewed Aug 6, 2018

View reviewed changes

awjuliani added 2 commits August 6, 2018 11:52

Address comments

e8e6190

Merge remote-tracking branch 'origin/develop' into develop-gym

c588198

# Conflicts: # python/tests/mock_communicator.py

eshvk reviewed Aug 7, 2018

View reviewed changes

awjuliani added 4 commits August 7, 2018 13:43

Fix pip install name

9994d70

Fix for multi-discrete

5446462

Fix for continuous control

5aecf66

Change use_visual functionality

e0e3218

awjuliani merged commit 71b0085 into develop Aug 7, 2018

awjuliani deleted the develop-gym branch August 7, 2018 23:01

github-actions bot locked as resolved and limited conversation to collaborators May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional gym wrapper #1007

Optional gym wrapper #1007

awjuliani commented Jul 24, 2018 •

edited by AdamPalmarUnity

Loading

awjuliani commented Jul 25, 2018

eshvk Jul 25, 2018

iandanforth commented Jul 26, 2018

awjuliani commented Jul 26, 2018

ethancaballero commented Jul 26, 2018 •

edited

Loading

Sohojoe commented Jul 27, 2018

machinaut commented Jul 27, 2018

danijar commented Jul 27, 2018

awjuliani commented Jul 28, 2018

danijar commented Jul 28, 2018

xiaomaogy Aug 3, 2018

awjuliani Aug 6, 2018

xiaomaogy Aug 6, 2018

eshvk Aug 3, 2018

eshvk Aug 3, 2018

eshvk Aug 3, 2018

awjuliani Aug 7, 2018

eshvk Aug 3, 2018

awjuliani Aug 7, 2018

Optional gym wrapper #1007

Optional gym wrapper #1007

Conversation

awjuliani commented Jul 24, 2018 • edited by AdamPalmarUnity Loading

awjuliani commented Jul 25, 2018

Choose a reason for hiding this comment

iandanforth commented Jul 26, 2018

awjuliani commented Jul 26, 2018

ethancaballero commented Jul 26, 2018 • edited Loading

Sohojoe commented Jul 27, 2018

machinaut commented Jul 27, 2018

danijar commented Jul 27, 2018

awjuliani commented Jul 28, 2018

danijar commented Jul 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awjuliani commented Jul 24, 2018 •

edited by AdamPalmarUnity

Loading

ethancaballero commented Jul 26, 2018 •

edited

Loading