Skip to content

aklein1995/heterogeneous_agents_curiosity_vizdoom

Repository files navigation

Collaborative Training of Heterogeneous Reinforcement Learning Agents in Environments with Sparse Rewards:What and When to Share?

In this paper we analyze how two independent agents that are deployed in the same environment (but at different instances of the latter) can learn faster when they do it in a Collaborative manner. For that purpose, both agents share knowledge in an online fashion without any previous expertise.

The key part in this work is that the agents are Heterogeneous, this is, they have different action spaces that allow one of the agents to have access to a bigger space domain. More importantly, that state space hinders better optimal solutions. Consequently, when sharing knowledge between agents negative transfer may arise.

The study is carried out at very sparse scenario, a modification of My Way Home scenario, where the agent is spawned at the bottom room and it has to reach to the goal, where the vest is located:

Environment

A unique reward of +1 is provided when arriving to the goal; 0 otherwise.

The state space is composed by first-view images. We process them to be 42x42 grayscale observations.

At this modification, a corridor (which is indeed a shortcut) has been added although is obstructed by a closed door (depicted with a white circle at the map).

Given this information, the agent has to learn how to best select actions. Four discrete actions are available for one of the agents, W1, corresponding to:

  • 0 - do nothing.
  • 1 - move forward.
  • 2 - turn left.
  • 3 - turn right.

Additionally, the other agent (W0) has another action that allows him to open doors (but does not report any advantage respect to do nothing if not necessary):

  • 4 - open door.

Therefore, the work emerges on how to accelerate the training between both agents, taking into account that they will have different optimal solutions and some of the trajectories may well hinder undesired performances.

Requirements

  • Python 3.6
  • VizDooM (check out dependencies)

Dependencies

To set up the python environment to run the code in this repository and ensure all the used packages are installed, install the dependencies based on the requirements.txt:

pip3 -r install requirements.txt

Basic Usage

You will find a config.conf file where all the different parameters are selected.

By default, only the analyzed environment is referred. If you want to use other .wad, just insert them into the wads folder and refer into the config.conf file.

Important to note that the .wad is called from .cfg file, where you can also modify other environment related parameters (i.e. actions).

Results

The final policy of each worker when taking their optimal respective parts are next shown:

Agent capable of open the door (W0)

w0 gif

Normal Agent (W1)

w1 gif

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages