Skip to content

Commit

Permalink
PyTorch update going live.
Browse files Browse the repository at this point in the history
  • Loading branch information
jachiam committed Jan 30, 2020
1 parent 2ce0ee9 commit 2092113
Show file tree
Hide file tree
Showing 119 changed files with 172,854 additions and 848 deletions.
31 changes: 28 additions & 3 deletions docs/algorithms/ddpg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,10 +173,35 @@ Pseudocode
Documentation
=============

.. autofunction:: spinup.ddpg
.. admonition:: You Should Know

In what follows, we give documentation for the PyTorch and Tensorflow implementations of DDPG in Spinning Up. They have nearly identical function calls and docstrings, except for details relating to model construction. However, we include both full docstrings for completeness.


Documentation: PyTorch Version
------------------------------

.. autofunction:: spinup.ddpg_pytorch

Saved Model Contents: PyTorch Version
-------------------------------------

The PyTorch saved model can be loaded with ``ac = torch.load('path/to/model.pt')``, yielding an actor-critic object (``ac``) that has the properties described in the docstring for ``ddpg_pytorch``.

You can get actions from this model with

.. code-block:: python
actions = ac.act(torch.as_tensor(obs, dtype=torch.float32))
Documentation: Tensorflow Version
---------------------------------

.. autofunction:: spinup.ddpg_tf1

Saved Model Contents
--------------------
Saved Model Contents: Tensorflow Version
----------------------------------------

The computation graph saved by the logger includes:

Expand Down
33 changes: 30 additions & 3 deletions docs/algorithms/ppo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,13 +147,40 @@ Pseudocode
\end{algorithm}
Documentation
=============

.. autofunction:: spinup.ppo
.. admonition:: You Should Know

In what follows, we give documentation for the PyTorch and Tensorflow implementations of PPO in Spinning Up. They have nearly identical function calls and docstrings, except for details relating to model construction. However, we include both full docstrings for completeness.


Documentation: PyTorch Version
------------------------------

.. autofunction:: spinup.ppo_pytorch

Saved Model Contents: PyTorch Version
-------------------------------------

The PyTorch saved model can be loaded with ``ac = torch.load('path/to/model.pt')``, yielding an actor-critic object (``ac``) that has the properties described in the docstring for ``ppo_pytorch``.

You can get actions from this model with

.. code-block:: python
actions = ac.act(torch.as_tensor(obs, dtype=torch.float32))
Documentation: Tensorflow Version
---------------------------------

.. autofunction:: spinup.ppo_tf1

Saved Model Contents
--------------------
Saved Model Contents: Tensorflow Version
----------------------------------------

The computation graph saved by the logger includes:

Expand Down
139 changes: 98 additions & 41 deletions docs/algorithms/sac.rst

Large diffs are not rendered by default.

42 changes: 33 additions & 9 deletions docs/algorithms/td3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Background

.. _`Background for DDPG`: ../algorithms/ddpg.html#background

While DDPG can achieve great performance sometimes, it is frequently brittle with respect to hyperparameters and other kinds of tuning. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. Twin Delayed DDPG (TD3) is an algorithm which addresses this issue by introducing three critical tricks:
While DDPG can achieve great performance sometimes, it is frequently brittle with respect to hyperparameters and other kinds of tuning. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. Twin Delayed DDPG (TD3) is an algorithm that addresses this issue by introducing three critical tricks:

**Trick One: Clipped Double-Q Learning.** TD3 learns *two* Q-functions instead of one (hence "twin"), and uses the smaller of the two Q-values to form the targets in the Bellman error loss functions.

Expand Down Expand Up @@ -116,16 +116,16 @@ Pseudocode
\end{equation*}
\STATE Update Q-functions by one step of gradient descent using
\begin{align*}
& \nabla_{\phi_i} \frac{1}{|B|}\sum_{(s,a,r,s',d) \in B} \left( Q_{\phi,i}(s,a) - y(r,s',d) \right)^2 && \text{for } i=1,2
& \nabla_{\phi_i} \frac{1}{|B|}\sum_{(s,a,r,s',d) \in B} \left( Q_{\phi_i}(s,a) - y(r,s',d) \right)^2 && \text{for } i=1,2
\end{align*}
\IF{ $j \mod$ \texttt{policy\_delay} $ = 0$}
\STATE Update policy by one step of gradient ascent using
\begin{equation*}
\nabla_{\theta} \frac{1}{|B|}\sum_{s \in B}Q_{\phi,1}(s, \mu_{\theta}(s))
\nabla_{\theta} \frac{1}{|B|}\sum_{s \in B}Q_{\phi_1}(s, \mu_{\theta}(s))
\end{equation*}
\STATE Update target networks with
\begin{align*}
\phi_{\text{targ},i} &\leftarrow \rho \phi_{\text{targ},i} + (1-\rho) \phi_i && \text{for } i=1,2\\
\phi_{\text{targ},i} &\leftarrow \rho \phi_{\text{targ}, i} + (1-\rho) \phi_i && \text{for } i=1,2\\
\theta_{\text{targ}} &\leftarrow \rho \theta_{\text{targ}} + (1-\rho) \theta
\end{align*}
\ENDIF
Expand All @@ -136,15 +136,39 @@ Pseudocode
\end{algorithm}
Documentation
=============

.. autofunction:: spinup.td3
.. admonition:: You Should Know

In what follows, we give documentation for the PyTorch and Tensorflow implementations of TD3 in Spinning Up. They have nearly identical function calls and docstrings, except for details relating to model construction. However, we include both full docstrings for completeness.



Documentation: PyTorch Version
------------------------------

.. autofunction:: spinup.td3_pytorch

Saved Model Contents: PyTorch Version
-------------------------------------

The PyTorch saved model can be loaded with ``ac = torch.load('path/to/model.pt')``, yielding an actor-critic object (``ac``) that has the properties described in the docstring for ``td3_pytorch``.

You can get actions from this model with

.. code-block:: python
actions = ac.act(torch.as_tensor(obs, dtype=torch.float32))
Documentation: Tensorflow Version
---------------------------------

.. autofunction:: spinup.td3_tf1

Saved Model Contents
--------------------
Saved Model Contents: Tensorflow Version
----------------------------------------

The computation graph saved by the logger includes:

Expand Down
6 changes: 5 additions & 1 deletion docs/algorithms/trpo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,11 @@ Pseudocode
Documentation
=============

.. autofunction:: spinup.trpo
.. admonition:: You Should Know

Spinning Up currently only has a Tensorflow implementation of TRPO.

.. autofunction:: spinup.trpo_tf1


Saved Model Contents
Expand Down
31 changes: 28 additions & 3 deletions docs/algorithms/vpg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,35 @@ Pseudocode
Documentation
=============

.. autofunction:: spinup.vpg
.. admonition:: You Should Know

Saved Model Contents
--------------------
In what follows, we give documentation for the PyTorch and Tensorflow implementations of VPG in Spinning Up. They have nearly identical function calls and docstrings, except for details relating to model construction. However, we include both full docstrings for completeness.


Documentation: PyTorch Version
------------------------------

.. autofunction:: spinup.vpg_pytorch

Saved Model Contents: PyTorch Version
-------------------------------------

The PyTorch saved model can be loaded with ``ac = torch.load('path/to/model.pt')``, yielding an actor-critic object (``ac``) that has the properties described in the docstring for ``vpg_pytorch``.

You can get actions from this model with

.. code-block:: python
actions = ac.act(torch.as_tensor(obs, dtype=torch.float32))
Documentation: Tensorflow Version
---------------------------------

.. autofunction:: spinup.vpg_tf1

Saved Model Contents: Tensorflow Version
----------------------------------------

The computation graph saved by the logger includes:

Expand Down
Binary file added docs/images/ex2-2_ddpg_bug_pytorch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 2092113

Please sign in to comment.