Skip to content

Lifelong-ML/LPG-FTW

Repository files navigation

LPG-FTW

This is the source code used for Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting (Mendez et al., 2020).

This package contains the implementations of two variants of LPG-FTW, with REINFORCE and NPG as the base learners. Code used for running experiments is provided in the experiments/ directory. Before running any experiment for the MuJoCo tasks, you must create the tasks by running the relevant script from the experiments/mtl_*_tasks/create_tasks directory. The main files containing the code for LPG-FTW are

  • mjrl/utils/algorithms/
    • batch_reinforce_ftw.py
    • npg_cg_ftw.py
  • mjrl/utils/policies/
    • gaussian_linear_lpg_ftw.py
    • gaussian_mlp_lpg_ftw.py

The main dependencies are python=3.7, gym, mujoco-py=1.50, gym-extensions-mod, and pytorch=1.3. The first three dependencies are included in this package for ease of installation. The fourth dependency is a modified version of the gym-extensions package, as described in the paper. The final dependency is metaworld, also included in this package.

Installation instructions (Linux)

  • Download MuJoCo binaries from the official website and also obtain the license key.
  • Unzip the downloaded mjpro150 directory into ~/.mujoco/mjpro150, and place your license key (mjkey.txt) at ~/.mujoco/mjkey.txt
  • Install osmesa related dependencies:
$ sudo apt-get install libgl1-mesa-dev libgl1-mesa-glx libglew-dev libosmesa6-dev build-essential libglfw3
  • Update bashrc by adding the following lines and source it
export LD_LIBRARY_PATH="<path/to/.mujoco>/mjpro150/bin:$LD_LIBRARY_PATH"
export MUJOCO_PY_FORCE_CPU=True
alias MJPL='LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-384/libGL.so'
  • Install this package using
$ conda update conda
$ cd path/to/lpg_ftw
$ conda env create -f env.yml
$ conda activate lpg-ftw
$ cd gym/mujoco-py
$ pip install -e .
$ cd ../
$ pip install -e .
$ cd ../gym-extensions-mod
$ pip install -e .
$ cd ../metaworld
$ pip install -e .
$ cd ..
$ pip install -e .

Reproducing results

Tables 1 and 2 contain the main results in the paper.

To reproduce the results, you must first train policies for each of the algorithms as below:

$ python experiments/metaworld_tasks/metaworld_[algorithm].py
$ python experiments/metaworld_tasks/metaworldMT50_[algorithm].py

for the Mujoco multi-task, you must first create the tasks before running the experiments:

$ python experiments/mtl_bodypart_tasks/create_tasks/[env]_create.py

Note that PG-ELLA must be trained after STL, since it uses the STL pre-trained policies. Then, to evaluate the policy at different stages of training (start, tune, and update), execute:

$ python experiments/[stage]/metaworld_[algorithm].py
$ python experiments/[stage]/metaworldMT50_[algorithm].py

This will save all the files needed for creating the tables. Then simply execute the following command to generate the tables:

$ python mjrl/utils/make_results_tables_metaworld.py

To generate the tables. Instructions for recreating results on the OpenAI domains are almost identical, replacing the metaworld_tasks/ directory for mtl_bodypart_tasks/ or mtl_gravity_tasks/ where appropriate, and using make_results_tables_openai.py for creating the tables.

Table 1: Results of LPG-FTW on MT10 (equivalent to Figure 3.a (bottom))

Start Tune Update Final
LPG-FTW 4523±508 160553±2847 161142±3415 154873±5415
STL 5217±100 135328±8534
EWC 5009±407 145060±12859 22202±5065
ER 3679±650 48495±7141 5083±1710
PG-ELLA 44796±4606 12546±5448

Table 2: Results of LPG-FTW on MT50 (equivalent to Figure 3.b (bottom))

Start Tune Update Final
LPG-FTW 2505±166 161047±3497 161060±3892 160739±3933
STL 4308±128 136837±2888
EWC 1001±150 71168±16915 567±120
ER 3373±310 33323±2229 25389±1910
PG-ELLA 10292±1113 125±130

Video Results

Please navigate to the videos/ directory to visualize the training process of LPG-FTW on the MuJoCo and Meta-World domains.

Citing LPG-FTW

If you use LPG-FTW, please cite our paper

@article{mendez2020lifelong,
  title={Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting},
  author={Mendez, Jorge A. and Wang, Boyu and Eaton, Eric},
  journal={arXiv preprint arXiv:2007.07011},
  year={2020}
}

This package was built on the mjrl package. The README for the original package can be found here