Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional gym wrapper #1007

Merged
merged 43 commits into from
Aug 7, 2018
Merged

Optional gym wrapper #1007

merged 43 commits into from
Aug 7, 2018

Conversation

awjuliani
Copy link
Contributor

@awjuliani awjuliani commented Jul 24, 2018

Adds optional gym wrapper UnityEnv to use as python interfaces to Unity environments.

Current thinking is that our main python interface UnityEnvironment will remain fully featured, and the focus of the gym wrapper will be on maintaining compatibility with pre-existing algorithms written around OpenAI gym. As such, complex observation spaces which combine multiple visual observations, or visual and vector observations will not be available out of the box.

Multi-agent is supported through the use of lists, as done in: https://github.com/openai/multiagent-particle-envs

Limitations:

  • Multi-Brain not supported. Environments with multiple brains return an error.
  • Multiple visual observations or combined visual/vector observations not supported.
  • Gym Environment registration not supported. This is because it is difficult to pre-register (and enforce) a global environment binary location for each separate environment.

To-Do:

  • Test against DQN Baseline
  • Test against DDPG Baselines
  • Test against PPO Baseline
  • Test against TRPO Baseline
  • Test against A2C Baseline
  • Write Unit Tests
  • Add documentation on using with Baselines repo

@awjuliani
Copy link
Contributor Author

Hi, @Sohojoe, I know you put together your own gym wrapper for your MujocoUnity benchmarking work. Would you be willing to take a look at this one (or share yours) so we can get a sense of whether or not we might be missing certain relevant functions?

"if it is wrapped in a gym.")


class UnityEnv(gym.Env):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest calling this class something more specific, out of context, it is very similar to UnityEnvironment

@awjuliani awjuliani changed the title Add optional gym interface Optional gym interface Jul 26, 2018
@iandanforth
Copy link

Thanks for the wrapper, this is going to be very useful! Could you talk a little bit about the 'random actions for all other agents' bit? Is that something many users will want to happen during step()?

@awjuliani
Copy link
Contributor Author

@iandanforth I am not sure it will stay that way. The question is how to handle cases where people want to launch multi-agent environments. I see a few possible options. The first is that we only allow the wrapper to be used in single-agent environment, and provide an error otherwise. In that case we could perhaps provide two wrappers, a single-agent and a multi-agent one. Right now I am splitting the difference, and allowing multi-agent environments, but only enabling control for the first agent. Since the agents have to take some action, they take a random one. Not really ideal though.

@ethancaballero
Copy link

ethancaballero commented Jul 26, 2018

Current Multi-Agent Gym environments use a list of numpy arrays as input to step():
https://github.com/openai/maddpg/blob/master/experiments/train.py#L112-L114

List of numpy arrays is used for Multi-Agent environments because different agents aren't guaranteed to have action_array of the same dimension as each other.

Single-Agent syntax is a numpy array; Multi-Agent Syntax is a list of numpy arrays.

Use this difference in syntax to automatically distinguish between when user intends to have a single agent that has multidimensional action (e.g. controlling multiple limbs of ant) vs when user intends to pass in action(s) for each agent.

Only edge case I can think of would be when a single agent takes continuous & discrete actions simultaneously, in which case a list of numpy arrays would be the input of a single agent (unless the discrete actions are just converted to floats so that they can be passed along with the continuous actions in a single float32 numpy array).

@Sohojoe
Copy link
Contributor

Sohojoe commented Jul 27, 2018

@awjuliani were you able to get the trained DQN model back into Unity?

I got DDQN to train (see below) but then remembered that it doesn't save the model and that I had to extend their code which was a pain to maintain.

This issue talks to modifying DDPG to add saving / loading openai/baselines#162 - I'm not sure how to make it save in the .byte format.

So looks like to get DDPG support we will need to pretty much make a copy of the baseline files to handle those two changes.

Regarding multi-agent - if by this we mean multiple instances of the same brain... In baselines, this is done via MPI whereby it spins up multiple gym enviroments and multiple python workers. I've not tried MPI as it doesn't nativly support windows.


Steps I used to train a ml-agent with baselines DDPG

  1. Copy gym_wrapper.py to the python folder
  2. Create a copy of baselines.ddpg.main.py called run_ddpg.py in python folder
  3. in run_ddpg.py add:
from unityagents import UnityEnvironment
from gym_wrapper import GymWrapper
  1. in run_ddpg.py change:
  # Create envs.
  env = gym.make(env_id)

to:

    # Create envs.
    raw_env = UnityEnvironment(env_id)
    env = GymWrapper(raw_env)
  1. run:

python learn_gym_ddpg.py MyMlAgentsEnv -params

@machinaut
Copy link

Do you think you'd support the GoalEnv interface as well -- this would make it compatible with the Hindsight Experience Replay baseline implementation. We've found that it helps with sample efficiency in simulated robotics tasks, so hopefully worth considering!

GoalEnv spec: https://github.com/openai/gym/blob/master/gym/core.py#L154

@danijar
Copy link

danijar commented Jul 27, 2018

Great to see that this is in the working! OpenAI Gym has tuple and dictionary spaces that allow for nested observations and actions. It would be great if multiple modalities (coordinates, image, etc) would come as a dictionary. I've worked with both Gym and dm_control and having the modalities separated in most environments is super useful for research. There are also env wrappers in Gym to flatten dict observations.

For multi-agent envs, the tuple space also seems worth considering. After all, the idea of Gym is to provide a single interface and I think it's capable of supporting (synchronous) multi-agent envs. Structured like this, people could even train normal algorithms on multi-agent envs without having to change anything. They just need to flatten and concatenate the observation element to feed them into their network, which is needed anyway.

@awjuliani
Copy link
Contributor Author

Thanks for the feedback, everyone.

@Sohojoe Thanks for checking this out. I tried it myself today, and can verify that 3DBall trains using DDPG. Going to go through the other main baselines in the next couple days.

@machinaut I definitely would love to add this interface too. We have on our roadmap to add the ability to expose a goal vector via the API, so once we add that, then we can also make it compatible with GoalEnv. I can imagine a lot of nice, simple example environments we could build to show it off as ell.

@danijar Thanks for pointing this out. In the couple weeks I have spent diving into the gym ecosystem, I have been happily surprised by how much work has been done to extend it in all sorts of interesting ways. I am very open to the possibility of getting to the point that our own internal work takes advantage of some of these wrappers. One thing I want to be conscious of is ensuring compatibility (partially why I made the ask for feedback on Twitter). If we decide to go with a specific way of doing multi-agent (where agents have different observation/action spaces) for example, we want to ensure that it will be something people can plug into others algorithms easily. If there are standards though, we are happy to use them 😃

@danijar
Copy link

danijar commented Jul 28, 2018

It might also be worth taking a look at OpenAI's multi envs for the competitive self-play paper. It seems like also take in a tuple of actions and return a tuple of observations. They do deviate a bit from the standard Gym interface in that the rewards, done flags, and info dicts become tuples as well.

@awjuliani awjuliani changed the title Optional gym interface Optional gym wrapper Jul 30, 2018
@awjuliani awjuliani requested a review from mmattar July 31, 2018 00:43
@@ -0,0 +1,259 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the getting-started-gym.ipynb is specific to the gym-unity folder, does it make more sense to move it under the gym-unity folder?

Also another way is to move the gym-unity folder under python folder.

Also within the getting-started-gym.ipynb and getting-started.ipynb, the default env_name is pointed to "../envs/3DBall" and "../envs/GridWorld", and in the python/notebooks/ folder, currently its parent folder doesn't contain any envs folder, which makes it confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the re-org we will be doing, we will move the notebooks to the top level.

I will be more specific about the need for an envs folder.

"metadata": {},
"outputs": [],
"source": [
"env_name = \"../envs/GridWorld\" # Name of the Unity environment binary to launch\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here mention that for the "Gym Wrapper Basics", only environment with 1 agent is supported.

from setuptools import setup, Command, find_packages

setup(name='gym_unity',
version='0.1.0',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this version number determined?

A common way in which researchers interact with simulation environments is via wrapper provided by OpenAI called `gym`. Here we provide a gym wrapper, and instructions for using it with existing research projects which utilize gyms.

## `unity_gym.py`
First draft on a gym wrapper for ML-Agents. To launch an environmnent use :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/enviromnent/environment

The gym wrapper can be installed using:

```
pip install gym-unity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gym-unity to s/gym_unity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

pip install gym-unity
```

or by running the following from the `/gym-unity` directory of the repository:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gym-unity to s/gym_unity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the correct folder name.

@awjuliani awjuliani merged commit 71b0085 into develop Aug 7, 2018
@awjuliani awjuliani deleted the develop-gym branch August 7, 2018 23:01
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants