Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deobfuscation of the code base + pep8 and fixes #481

Closed
wants to merge 307 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
307 commits
Select commit Hold shift + click to select a range
0fd3ff2
Merge branch 'fixes_cleanup' of https://github.com/hill-a/stable-base…
hill-a Jul 12, 2018
eb4cdb2
partial documentation of gail
hill-a Jul 12, 2018
ae3aaad
finished gail + fixed some argument calling in gail
hill-a Jul 13, 2018
1500a3b
quick cleanup
hill-a Jul 13, 2018
94cd63f
Start cleaning up trpo_mpi
araffin Jul 13, 2018
72599ff
Merge branch 'fixes_cleanup' of github.com:hill-a/stable-baselines in…
araffin Jul 13, 2018
873ebf7
Style fixes
araffin Jul 13, 2018
78f3d95
Refactor ppo1 policies
araffin Jul 13, 2018
b8863d8
Clean up a2c utils
araffin Jul 13, 2018
8277eee
Clean up A2C policies + utils again
araffin Jul 13, 2018
42be3e8
Continue clean up of a2c/acer/acktr/common
araffin Jul 13, 2018
256a459
formatting & few comments on ppo2
kalifou Jul 13, 2018
6a8bbfa
Merge branch 'fixes_cleanup' of https://github.com/hill-a/stable-base…
kalifou Jul 13, 2018
1fad26f
Clean up gail/ddpg
araffin Jul 13, 2018
0fc0a10
Merge branch 'fixes_cleanup' of https://github.com/hill-a/stable-base…
kalifou Jul 13, 2018
8a2a667
commenting ppo1/2
kalifou Jul 13, 2018
5410573
Update README.md
hill-a Jul 13, 2018
c5f4658
fixing few style issues
kalifou Jul 13, 2018
30f4264
Typos
araffin Jul 13, 2018
fdc9ca4
Continue clean up of mainly ddpg/her/deepq
araffin Jul 14, 2018
f47e311
Various clean up
araffin Jul 14, 2018
4e89d92
Fix tests
araffin Jul 14, 2018
d2e1dbd
Clean up HER + ACER + ...
araffin Jul 16, 2018
576a770
Move test folder
araffin Jul 16, 2018
afb8ba6
Various cleaning
araffin Jul 16, 2018
e0253f6
Fix tests
araffin Jul 16, 2018
cf7765d
issues fix in acer/
kalifou Jul 16, 2018
f61a264
few fixes for issues in baselines/
kalifou Jul 16, 2018
8828097
Various renaming
araffin Jul 17, 2018
d58af64
Fixes for ddpg
araffin Jul 17, 2018
cc7943b
finished documentation
hill-a Jul 17, 2018
7d05725
Merge branch 'fixes_cleanup' of https://github.com/hill-a/stable-base…
hill-a Jul 17, 2018
ffedecf
clean placeholder issues
hill-a Jul 17, 2018
bcdbf50
hotfix
hill-a Jul 17, 2018
f6a6f1a
fixed placeholder overlap
hill-a Jul 17, 2018
b2683f7
fixes + renaming
hill-a Jul 17, 2018
f2f56da
moved to stable_baselines
hill-a Jul 17, 2018
3d1e2a2
fix rename
hill-a Jul 17, 2018
1114a83
Fix various codestyle issues
araffin Jul 17, 2018
b74120a
Fix tests
araffin Jul 17, 2018
64bc3c8
Fix deepq test + remove wildcards
araffin Jul 17, 2018
c6314ef
fixed a2c and acktr vf weight
hill-a Jul 18, 2018
7c95b24
Massive renaming (nsteps -> n_steps, ...)
araffin Jul 18, 2018
f18a90c
Merge branch 'fixes_cleanup' of github.com:hill-a/stable-baselines in…
araffin Jul 18, 2018
61cc751
Fixed a2c and acktr vf weight (erased by previous commit)
araffin Jul 18, 2018
2b1bf48
Minor edit: rename a variable in trpo
araffin Jul 18, 2018
edd9bf7
added more policies to A2C and PPO2
hill-a Jul 18, 2018
b39cbbd
parameter hotfix
hill-a Jul 18, 2018
b908c8a
Fix line too long
araffin Jul 18, 2018
1f6d3a9
removed useless continuous arguement from a2c policies
hill-a Jul 18, 2018
6c50d71
fixed tests
hill-a Jul 18, 2018
d52f0d7
fixed docker
hill-a Jul 18, 2018
b359e7e
refactored a2c
hill-a Jul 18, 2018
5d91a66
added base class for RL models
hill-a Jul 18, 2018
f9f5e9e
refactored acer + fixes
hill-a Jul 18, 2018
5e33e14
partial refactor of acktr
hill-a Jul 18, 2018
d43624d
refactored DDPG + added DDPG CNN policies
hill-a Jul 18, 2018
2cbb22d
refactored deepq + ddpg alterations
hill-a Jul 19, 2018
5e6c1ba
Merge branch 'fixes_cleanup' into stable
araffin Jul 20, 2018
d04cb28
Update coveragerc
araffin Jul 20, 2018
6497b64
refactored TRPO
hill-a Jul 20, 2018
5b65087
Fix import error
araffin Jul 20, 2018
c374e57
fixed a2c test
hill-a Jul 20, 2018
d62fdc8
fixed TRPO argument issue
hill-a Jul 20, 2018
3672a90
Rename nstack -> n_stack
araffin Jul 20, 2018
610c335
Merge branch 'fixes_cleanup' into stable
araffin Jul 20, 2018
745e160
Fix renaming side effect
araffin Jul 21, 2018
29d1245
Merge branch 'fixes_cleanup' into stable
araffin Jul 21, 2018
99ea253
refactored PPO2 + corrected loading calls
hill-a Jul 23, 2018
7281158
refactored PPO1
hill-a Jul 23, 2018
233ead8
refactored PPO1
hill-a Jul 23, 2018
d86303f
changed total timestep management + bugfixes
hill-a Jul 24, 2018
9fef74f
refactored environment management + improved all the saving methods
hill-a Jul 24, 2018
ba1ee44
fixed test issues
hill-a Jul 24, 2018
247c6b4
finished action prediction and action proba
hill-a Jul 24, 2018
4f7abdd
Rename variables (action_0 -> action)
araffin Jul 25, 2018
9eb98e9
Fix shadowing methods
araffin Jul 25, 2018
bde688c
Merge branch 'fixes_cleanup' into stable
araffin Jul 25, 2018
afa1223
Various renaming + cleanup
araffin Jul 25, 2018
75fa4b0
Merge branch 'fixes_cleanup' into stable
araffin Jul 25, 2018
a3746c7
fixed graphs + added more identity tests + acer upgrades
hill-a Jul 25, 2018
ad7e14f
restored name and acktr bias + incremented version number due to inco…
hill-a Jul 25, 2018
e7e3dca
fixed models not supporting VecEnvs
hill-a Jul 25, 2018
4c1ad6e
fixed seed issues + fixed training issues + added saving and loading …
hill-a Jul 26, 2018
7c8200f
fixed tests + verbosity fixed
hill-a Jul 26, 2018
1de90b2
added continuous tests + ddpg cleaned up
hill-a Jul 27, 2018
978e116
Merge pull request #4 from openai/master
hill-a Jul 27, 2018
a187fe6
merged with master
hill-a Jul 27, 2018
3a4dcbd
cleaning GAIL + doc fix
hill-a Jul 27, 2018
5f11927
Merge pull request #1 from hill-a/fixes_cleanup
hill-a Jul 27, 2018
21772de
fixed continuous actions for A2C
hill-a Jul 27, 2018
17e0134
fadded spaces.MultiDiscrete and spaces.MultiBinary to A2C policies
hill-a Jul 28, 2018
64713a8
changed acer policies
hill-a Jul 31, 2018
3779b0c
fixed a2c issues
hill-a Jul 31, 2018
7b9e978
fixed ACER performance
hill-a Jul 31, 2018
d97f21d
Readme update
hill-a Jul 31, 2018
88e7eab
cleared up table in Readme
hill-a Jul 31, 2018
ffe5add
refactored DDPG policies + cleanup verbosity
hill-a Aug 1, 2018
c7cce73
refactored PPO1 and TRPO policy + fixed performance issues
hill-a Aug 2, 2018
ff6f5eb
refactored GAIL
hill-a Aug 2, 2018
f8e8f09
cleanup + readme examples + begin HER refactor
hill-a Aug 2, 2018
96f3d1a
merged with Master + added action_space tests
hill-a Aug 3, 2018
6142744
began ACER continuous implementation
hill-a Aug 3, 2018
12e9c3f
Fix warnings
araffin Aug 8, 2018
e6352d7
Update TRPO test
araffin Aug 8, 2018
ebda2a6
Remove unused imports
araffin Aug 8, 2018
437594f
Disable subprocess call in test_continuous (trying to fix CI build)
araffin Aug 8, 2018
379b7c9
Revert "Disable subprocess call in test_continuous (trying to fix CI …
araffin Aug 8, 2018
360e629
Remove subprocess calls for deepq tests
araffin Aug 9, 2018
5caacd8
Trying to fix CI memory issue
araffin Aug 9, 2018
d7a3225
Switch back to subprocess call for deepq tests
araffin Aug 9, 2018
e00c5bb
Revert "Switch back to subprocess call for deepq tests"
araffin Aug 9, 2018
b3dace8
Add comment
araffin Aug 9, 2018
d39282a
Trying to fix CI memory error (bis)
araffin Aug 9, 2018
1f4100d
Try skipping test for CI
araffin Aug 10, 2018
f890bbe
Attempt to fix memory issue
araffin Aug 10, 2018
80f9465
Reduce number of timesteps for tests
araffin Aug 10, 2018
47bd786
Remove subprocess call for testing logger
araffin Aug 10, 2018
d0e3919
Typo
araffin Aug 10, 2018
35ba53f
Update README
araffin Aug 10, 2018
5d733f7
Update README.md
hill-a Aug 13, 2018
1d03f65
fixed multi env issues
hill-a Aug 13, 2018
1e711a8
[ci skip] Update README.md
hill-a Aug 13, 2018
d6fc91c
[ci skip] Update README.md
hill-a Aug 13, 2018
e683b91
fixed readme + fixed atari wrapper + added saveable normalize wrapper
hill-a Aug 13, 2018
48597ca
fixed default norm_rewards
hill-a Aug 13, 2018
5baedb3
Merge branch 'master' into stable
hill-a Aug 14, 2018
a2dd972
merged with refactoring
hill-a Aug 14, 2018
203a279
hotfix
hill-a Aug 14, 2018
2c9fcb2
merged with refactoring continued
hill-a Aug 14, 2018
8eee9ad
Merge branch 'refactoring' into stable
hill-a Aug 14, 2018
788a35f
updated readme for stable baselines
hill-a Aug 14, 2018
9f0144d
fixed readme example
hill-a Aug 14, 2018
f5fc57a
Merge branch 'refactoring' into stable
hill-a Aug 14, 2018
b9c592a
Fix codestyle
araffin Aug 14, 2018
d15cd28
Merge branch 'refactoring' of github.com:hill-a/stable-baselines into…
araffin Aug 14, 2018
23c119f
updated install readme
hill-a Aug 14, 2018
c289e44
Merge branch 'refactoring' into stable
araffin Aug 14, 2018
8df0bf2
Fix import issues
araffin Aug 14, 2018
7c60e78
[ci skip] Update README.md
hill-a Aug 14, 2018
d560153
Fix tf deprecation warnings
araffin Aug 15, 2018
0edc955
Merge branch 'refactoring' into stable
araffin Aug 15, 2018
a9e79ea
fixed DDPG issue + continous out of bounds fix + predict fix
hill-a Aug 16, 2018
8c828df
Merge branch 'refactoring' of https://github.com/hill-a/stable-baseli…
hill-a Aug 16, 2018
b98def8
Merge branch 'refactoring' into stable
hill-a Aug 16, 2018
0bc655b
fixed box bounds for policies + fixed ACKTR leaning + fixed predict f…
hill-a Aug 16, 2018
85609dc
changed tensorflow verions
hill-a Aug 16, 2018
bc8cf70
Merge branch 'refactoring' into stable
hill-a Aug 16, 2018
ac4f940
fixed memorized vectorized if environment
hill-a Aug 16, 2018
6d1fb69
Merge branch 'refactoring' into stable
hill-a Aug 16, 2018
60fb980
fixed action probability + changed import positions for cleaner imports
hill-a Aug 16, 2018
e4e79e2
hotfix
hill-a Aug 16, 2018
0e63b15
merged with refactoring branch
hill-a Aug 16, 2018
0e32f48
fixed lstm policies + made DDPG deterministic again + clipping fix
hill-a Aug 16, 2018
35e4894
[ci skip] Comment backend selection for plotting
araffin Aug 17, 2018
5d4c9cc
improved policies + fixed LSTM issues
hill-a Aug 17, 2018
e89aa89
Merge branch 'refactoring' of https://github.com/hill-a/stable-baseli…
hill-a Aug 17, 2018
40442fc
merged with refactoring branch
hill-a Aug 17, 2018
5215bcc
removed assumed value n_stack=4 in the acer buffer
hill-a Aug 17, 2018
8ed7a1b
Merge branch 'refactoring' into stable
hill-a Aug 17, 2018
02dfa5d
Fix warnings + reorder imports
araffin Aug 17, 2018
48374a8
Merge branch 'refactoring' into stable
araffin Aug 17, 2018
1a23578
Try fixing circular import
araffin Aug 17, 2018
02e7875
Revert "Try fixing circular import"
araffin Aug 17, 2018
8e38e87
fixed initial state issues with the policies + added environment crea…
hill-a Aug 17, 2018
1c398a9
Merge branch 'refactoring' of https://github.com/hill-a/stable-baseli…
hill-a Aug 17, 2018
873429b
merged wit refactoring branch
hill-a Aug 17, 2018
d11dc58
Update README.md
hill-a Aug 17, 2018
455b4ad
Update README.md
hill-a Aug 17, 2018
c681570
[ci skip] Update README
araffin Aug 18, 2018
d756880
[ci skip] Change default verbosity level for grad_add
araffin Aug 18, 2018
511ccfd
Merge branch 'refactoring' into stable
araffin Aug 18, 2018
5b4e33e
removed stacking from ACER
hill-a Aug 18, 2018
49fd8cc
fixed ACER runner issues
hill-a Aug 18, 2018
d451bca
fixed ACER with discrete observation space
hill-a Aug 18, 2018
e8a038e
Merge branch 'refactoring' into stable
hill-a Aug 18, 2018
bfcf7a5
Update README
araffin Aug 19, 2018
66aae4a
Update ACER hyperparams for test_identity + minor edits
araffin Aug 19, 2018
73b6c27
Merge branch 'refactoring' into stable
araffin Aug 19, 2018
93e5bc1
Restore test identity
araffin Aug 19, 2018
5ca1052
Merge branch 'refactoring' into stable
araffin Aug 19, 2018
47c877b
Update ACER hyperparams
araffin Aug 19, 2018
75fa2dc
Trying to fix CI
araffin Aug 19, 2018
2d29ece
Revert to default ACER hyperparams
araffin Aug 19, 2018
bc001f3
Update ACER test
araffin Aug 19, 2018
0f8232e
Merge branch 'refactoring' into stable
araffin Aug 19, 2018
ff3665a
improved vectorized environment changing
hill-a Aug 20, 2018
dae6d3d
merged with refactoring branch
hill-a Aug 20, 2018
f1c47c1
forced ACER to only use environment with the same n_envs
hill-a Aug 20, 2018
daad2cb
merged with refactoring
hill-a Aug 20, 2018
5a0669c
fixes and cleanup
hill-a Aug 20, 2018
8b6c8e2
Update test identity
araffin Aug 20, 2018
fbc10ff
Merge branch 'refactoring' of github.com:hill-a/stable-baselines into…
araffin Aug 20, 2018
66e088b
Update PPO1 hyperparams for test
araffin Aug 20, 2018
7e5a666
Merge branch 'refactoring' into stable
araffin Aug 20, 2018
2b82f46
fixed early reset issues with monitor + extra error messages
hill-a Aug 20, 2018
16acde7
Merge branch 'refactoring' of https://github.com/hill-a/stable-baseli…
hill-a Aug 20, 2018
4eee032
merged with refactoring branch
hill-a Aug 20, 2018
f98ad5d
Fix test identity hyperparams
araffin Aug 20, 2018
62061c1
Merge branch 'refactoring' of github.com:hill-a/stable-baselines into…
araffin Aug 20, 2018
bbfb154
Merge branch 'refactoring' of github.com:hill-a/stable-baselines into…
araffin Aug 20, 2018
16dc1b0
hotfix
hill-a Aug 20, 2018
b6b063d
Merge branch 'stable' of https://github.com/hill-a/stable-baselines i…
hill-a Aug 20, 2018
8bb4186
hotfix
hill-a Aug 20, 2018
c9d0725
Merge branch 'refactoring' of https://github.com/hill-a/stable-baseli…
hill-a Aug 20, 2018
afd31eb
Bump version
araffin Aug 20, 2018
25540bc
Merge branch 'refactoring' into stable
araffin Aug 20, 2018
07e2393
First Stable version
araffin Aug 20, 2018
282e2ec
fixed ACER + added setup files
hill-a Aug 20, 2018
9fd2123
merged with refactoring
hill-a Aug 20, 2018
68dd857
remove forgotten conflict
hill-a Aug 20, 2018
50ed8c0
updated setup
hill-a Aug 20, 2018
d43814f
updated pypi description
hill-a Aug 20, 2018
21de76d
updated setup.py
hill-a Aug 20, 2018
5ad0b11
updated setup.py
hill-a Aug 20, 2018
1eeff3d
updated setup.py
hill-a Aug 20, 2018
e1ca5a7
[ci skip] update README
hill-a Aug 20, 2018
5bd4ffa
[ci skip] Update README
araffin Aug 21, 2018
74995e6
improved README.md in each algorithm + fix deepq and ddpg normalizati…
hill-a Aug 22, 2018
e4fb0db
Merge branch 'stable' of https://github.com/hill-a/stable-baselines i…
hill-a Aug 22, 2018
59b2139
fixed READMEs + fixed Box observation space issues
hill-a Aug 22, 2018
e6cc787
fixed flake issues
hill-a Aug 22, 2018
58b7a67
fixed Identity issue + only auto scale images
hill-a Aug 22, 2018
e59b814
[ci skip] Update README
araffin Aug 24, 2018
22daf9a
Begin html doc
araffin Aug 25, 2018
eafb07a
Update docs
araffin Aug 25, 2018
a6463fd
[ci skip] Fix doc build
araffin Aug 25, 2018
22c7a4e
[ci skip] Remove import in docs
araffin Aug 25, 2018
1583098
[ci skip] Fix version number
araffin Aug 25, 2018
301791e
[ci skip] Add docs badge
araffin Aug 25, 2018
25b7273
[ci skip] Update badge branch
araffin Aug 25, 2018
1c780d5
[ci skip] Update documentation
araffin Aug 26, 2018
ee945aa
[ci skip] Fix doc build
araffin Aug 26, 2018
45267d4
[ci skip] Update docs
araffin Aug 26, 2018
b943aea
[ci skip] Update docs
araffin Aug 26, 2018
5c330e7
[ci skip] Remove HER from table
araffin Aug 26, 2018
54e6079
[ci skip] Fix table issue
araffin Aug 26, 2018
14bc8aa
[ci skip] Revert "[ci skip] Fix table issue"
araffin Aug 26, 2018
93c2dca
[ci skip] Add section name
araffin Aug 26, 2018
0626918
[ci skip] Autodoc test
araffin Aug 27, 2018
3598140
Move docstrings
araffin Aug 27, 2018
d56cc77
[ci skip] Add ddpg and dqn to docs
araffin Aug 27, 2018
147a809
[ci skip] Remove requirements.txt to build docs
araffin Aug 27, 2018
302ae52
[ci skip] Update doc: installation + getting started
araffin Aug 27, 2018
37d5cd3
[ci skip] Add doc on vec env
araffin Aug 27, 2018
2a565b1
[ci skip] Add examples
araffin Aug 28, 2018
06c9da5
[ci skip] Add missing algorithms + update README
araffin Aug 28, 2018
39e5ea2
[ci skip] Add changelog, remove READMEs
araffin Aug 29, 2018
bfe5f9e
Merge pull request #12 from hill-a/docs
hill-a Aug 29, 2018
3a57007
[ci skip] Bump version
araffin Aug 29, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[run]
branch = False
omit =
# Mujoco requires a licence
stable_baselines/*/run_mujoco.py
stable_baselines/ppo1/run_humanoid.py
stable_baselines/ppo1/run_robotics.py
# HER requires mpi and Mujoco
stable_baselines/her/experiment/

[report]
exclude_lines =
pragma: no cover
raise NotImplementedError()
if KFAC_DEBUG:
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@
*.pyc
*.pkl
*.py~
*.bak
.pytest_cache
.DS_Store
.idea
.coverage
.coverage.*
__pycache__/
_build/

# Setuptools distribution and build folders.
/dist/
Expand Down Expand Up @@ -34,5 +39,3 @@ src
.cache

MUJOCO_LOG.TXT


7 changes: 5 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ language: python
python:
- "3.6"

notifications:
email: false

services:
- docker

Expand All @@ -10,5 +13,5 @@ install:
- docker build . -t baselines-test

script:
- flake8 --select=F baselines/common
- docker run baselines-test pytest
- flake8 --select=F stable_baselines/common
- docker run --env CODACY_PROJECT_TOKEN=$CODACY_PROJECT_TOKEN baselines-test sh -c 'pytest --cov-config .coveragerc --cov-report term --cov-report xml --cov=. && python-codacy-coverage -r coverage.xml --token=$CODACY_PROJECT_TOKEN'
35 changes: 30 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,20 +1,45 @@
FROM ubuntu:16.04

RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake
RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake libglib2.0-0 libsm6 libxext6 libfontconfig1 libxrender1
ENV CODE_DIR /root/code
ENV VENV /root/venv

COPY . $CODE_DIR/baselines
RUN \
pip install virtualenv && \
virtualenv $VENV --python=python3 && \
. $VENV/bin/activate && \
mkdir $CODE_DIR && \
cd $CODE_DIR && \
pip install --upgrade pip && \
pip install -e baselines && \
pip install pytest
pip install pytest && \
pip install pytest-cov && \
pip install codacy-coverage && \
pip install scipy && \
pip install tqdm && \
pip install joblib && \
pip install zmq && \
pip install dill && \
pip install progressbar2 && \
pip install mpi4py && \
pip install cloudpickle && \
pip install tensorflow>=1.5.0 && \
pip install click && \
pip install opencv-python && \
pip install numpy && \
pip install pandas && \
pip install pytest && \
pip install matplotlib && \
pip install seaborn && \
pip install glob2 && \
pip install gym[mujoco,atari,classic_control,robotics]

COPY . $CODE_DIR/stable_baselines
RUN \
. $VENV/bin/activate && \
cd $CODE_DIR && \
pip install -e stable_baselines

ENV PATH=$VENV/bin:$PATH
WORKDIR $CODE_DIR/baselines
WORKDIR $CODE_DIR/stable_baselines

CMD /bin/bash
188 changes: 132 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,87 +1,163 @@
<img src="data/logo.jpg" width=25% align="right" /> [![Build status](https://travis-ci.org/openai/baselines.svg?branch=master)](https://travis-ci.org/openai/baselines)
[![Build Status](https://travis-ci.com/hill-a/stable-baselines.svg?branch=stable)](https://travis-ci.com/hill-a/stable-baselines) [![Documentation Status](https://readthedocs.org/projects/stable-baselines/badge/?version=docs)](https://stable-baselines.readthedocs.io/en/docs/?badge=master) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/3bcb4cd6d76a4270acb16b5fe6dd9efa)](https://www.codacy.com/app/baselines_janitors/stable-baselines?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=hill-a/stable-baselines&amp;utm_campaign=Badge_Grade) [![Codacy Badge](https://api.codacy.com/project/badge/Coverage/3bcb4cd6d76a4270acb16b5fe6dd9efa)](https://www.codacy.com/app/baselines_janitors/stable-baselines?utm_source=github.com&utm_medium=referral&utm_content=hill-a/stable-baselines&utm_campaign=Badge_Coverage)

# Baselines
# Stable Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.
Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI [Baselines](https://github.com/openai/baselines/).

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.
You can read a detailed presentation of Stable Baselines in the [Medium article](https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-reinforcement-learning-made-easy-df87c4b2fc82).

## Prerequisites

These algorithms will make it easier for the research community and industry to replicate, refine, and identify new ideas, and will create good baselines to build projects on top of. We expect these tools will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones. We also hope that the simplicity of these tools will allow beginners to experiment with a more advanced toolset, without being buried in implementation details.

## Main differences with OpenAI Baselines

This toolset is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups:
- Unified structure for all algorithms
- PEP8 compliant (unified code style)
- Documented functions and classes
- More tests & more code coverage

## Documentation

Documentation is available online: [http://stable-baselines.readthedocs.io/](http://stable-baselines.readthedocs.io/)

## Installation

### Prerequisites
Baselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows
### Ubuntu


#### Ubuntu

```bash
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev
```
### Mac OS X

#### Mac OS X
Installation of system packages on Mac requires [Homebrew](https://brew.sh). With Homebrew installed, run the follwing:
```bash
brew install cmake openmpi
```

## Virtual environment
From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via
```bash
pip install virtualenv
```
Virtualenvs are essentially folders that have copies of python executable and all python packages.
To create a virtualenv called venv with python3, one runs
```bash
virtualenv /path/to/venv --python=python3
```
To activate a virtualenv:

### Install using pip
Install the Stable Baselines package

Using pip from pypi:
```
. /path/to/venv/bin/activate
pip install stable-baselines
```
More thorough tutorial on virtualenvs and options can be found [here](https://virtualenv.pypa.io/en/stable/)

Please read the [documentation](http://stable-baselines.readthedocs.io/) for more details and alternatives.

## Installation
Clone the repo and cd into it:
```bash
git clone https://github.com/openai/baselines.git
cd baselines
```
If using virtualenv, create a new virtualenv and activate it
```bash
virtualenv env --python=python3
. env/bin/activate

## Example

Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.

Here is a quick example of how to train and run PPO2 on a cartpole environment:
```python
import gym

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2

env = gym.make('CartPole-v1')
env = DummyVecEnv([lambda: env]) # The algorithms require a vectorized environment to run

model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=10000)

obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
```
Install baselines package
```bash
pip install -e .

Or just train a model with a one liner if [the environment is registed in Gym](https://github.com/openai/gym/wiki/Environments):

```python

from stable_baselines.common.policies import MlpPolicy
from stable_baselines import PPO2

model = PPO2(MlpPolicy, 'CartPole-v1').learn(10000)

```
### MuJoCo

Please read the [documentation](http://stable-baselines.readthedocs.io/) for more examples.


## Try it online with Colab Notebooks !

All the following examples can be executed online using Google colab notebooks:

- [Getting Started](https://colab.research.google.com/drive/1_1H5bjWKYBVKbbs-Kj83dsfuZieDNcFU)
- [Training, Saving, Loading](https://colab.research.google.com/drive/1KoAQ1C_BNtGV3sVvZCnNZaER9rstmy0s)
- [Multiprocessing](https://colab.research.google.com/drive/1ZzNFMUUi923foaVsYb4YjPy4mjKtnOxb)
- [Monitor Training and Plotting](https://colab.research.google.com/drive/1L_IMo6v0a0ALK8nefZm6PqPSy0vZIWBT)
- [Atari Games](https://colab.research.google.com/drive/1iYK11yDzOOqnrXi1Sfjm1iekZr4cxLaN)


## Implemented Algorithms

| **Name** | **Refactored**<sup>(1)</sup> | **Recurrent** | ```Box``` | ```Discrete``` | ```MultiDiscrete``` | ```MultiBinary``` | **Multi Processing** |
| ------------------- | ---------------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | --------------------------------- |
| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| ACER | :heavy_check_mark: | :heavy_check_mark: | :x: <sup>(5)</sup> | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
| ACKTR | :heavy_check_mark: | :heavy_check_mark: | :x: <sup>(5)</sup> | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
| DDPG | :heavy_check_mark: | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| DeepQ | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| GAIL <sup>(2)</sup> | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: <sup>(4)</sup> |
| HER <sup>(3)</sup> | :x: <sup>(5)</sup> | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| PPO1 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: <sup>(4)</sup> |
| PPO2 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| TRPO | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: <sup>(4)</sup> |

<sup><sup>(1): Whether or not the algorithm has be refactored to fit the ```BaseRLModel``` class.</sup></sup><br>
<sup><sup>(2): Only implemented for TRPO.</sup></sup><br>
<sup><sup>(3): Only implemented for DDPG.</sup></sup><br>
<sup><sup>(4): Multi Processing with [MPI](https://mpi4py.readthedocs.io/en/stable/).</sup></sup><br>
<sup><sup>(5): TODO, in project scope.</sup></sup>

Actions ```gym.spaces```:
* ```Box```: A N-dimensional box that containes every point in the action space.
* ```Discrete```: A list of possible actions, where each timestep only one of the actions can be used.
* ```MultiDiscrete```: A list of possible actions, where each timestep only one action of each discrete set can be used.
* ```MultiBinary```: A list of possible actions, where each timestep any of the actions can be used in any combination.


## MuJoCo
Some of the baselines examples use [MuJoCo](http://www.mujoco.org) (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from [www.mujoco.org](http://www.mujoco.org)). Instructions on setting up MuJoCo can be found [here](https://github.com/openai/mujoco-py)

## Testing the installation
All unit tests in baselines can be run using pytest runner:
```
pip install pytest
pytest
pip install pytest pytest-cov
pytest --cov-config .coveragerc --cov-report html --cov-report term --cov=.
```

## Subpackages

- [A2C](baselines/a2c)
- [ACER](baselines/acer)
- [ACKTR](baselines/acktr)
- [DDPG](baselines/ddpg)
- [DQN](baselines/deepq)
- [GAIL](baselines/gail)
- [HER](baselines/her)
- [PPO1](baselines/ppo1) (Multi-CPU using MPI)
- [PPO2](baselines/ppo2) (Optimized for GPU)
- [TRPO](baselines/trpo_mpi)
## Citing the Project

To cite this repository in publications:

@misc{baselines,
author = {Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai},
title = {OpenAI Baselines},
year = {2017},
```
@misc{stable-baselines,
author = {Hill, Ashley and Raffin, Antonin and Traore, Rene and Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai},
title = {Stable Baselines},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/openai/baselines}},
howpublished = {\url{https://github.com/hill-a/stable-baselines}},
}
```

## How To Contribute

To any interested in making the baselines better, there is still some documentation that needs to be done.
If you want to contribute, please open an issue first and then propose your pull request.

Nice to have (for the future):
- [ ] Continuous actions support for ACER
- [ ] Continuous actions support for ACKTR
- [ ] Tensorboard integration (see branch `Tensorboard`)
5 changes: 0 additions & 5 deletions baselines/a2c/README.md

This file was deleted.

Loading