Add documentation on exporting models (openai#475)

* Add documentation on exporting models * Update changelog * Update doc (and maintainers list) * Rework exporting doc
zhuyifengzju · Sep 16, 2019 · 19ed2ca · 19ed2ca
1 parent 153ce70
commit 19ed2ca
Show file tree

Hide file tree

Showing 5 changed files with 108 additions and 51 deletions.
diff --git a/README.md b/README.md
@@ -200,7 +200,7 @@ To cite this repository in publications:
 
 ```
 @misc{stable-baselines,
-  author = {Hill, Ashley and Raffin, Antonin and Ernestus, Maximilian and Gleave, Adam and Traore, Rene and Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai},
+  author = {Hill, Ashley and Raffin, Antonin and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Traore, Rene and Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai},
   title = {Stable Baselines},
   year = {2018},
   publisher = {GitHub},
@@ -211,7 +211,7 @@ To cite this repository in publications:
 
 ## Maintainers
 
-Stable-Baselines is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/erniejunior) (aka @erniejunior) and [Adam Gleave](https://github.com/adamgleave) (@AdamGleave).
+Stable-Baselines is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/erniejunior) (aka @erniejunior), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave) and [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli).
 
 **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
 

diff --git a/docs/guide/export.rst b/docs/guide/export.rst
@@ -0,0 +1,76 @@
+.. _export:
+
+
+Exporting models
+================
+
+After training an agent, you may want to deploy/use it in an other language
+or framework, like PyTorch or `tensorflowjs <https://github.com/tensorflow/tfjs>`_.
+Stable Baselines does not include tools to export models to other frameworks, but
+this document aims to cover parts that are required for exporting along with
+more detailed stories from users of Stable Baselines.
+
+
+Background
+----------
+
+In Stable Baselines, the controller is stored inside :ref:`policies <policies>` which convert
+observations into actions. Each learning algorithm (e.g. DQN, A2C, SAC) contains
+one or more policies, some of which are only used for training. An easy way to find
+the policy is to check the code for the ``predict`` function of the agent:
+This function should only call one policy with simple arguments.
+
+Policies hold the necessary Tensorflow placeholders and tensors to do the
+inference (i.e. predict actions), so it is enough to export these policies
+to do inference in an another framework.
+
+.. note::
+  Learning algorithms also may contain other Tensorflow placeholders, that are used for training only and are
+  not required for inference.
+
+
+.. warning::
+  When using CNN policies, the observation is normalized internally (dividing by 255 to have values in [0, 1])
+
+
+Export to PyTorch
+-----------------
+
+A known working solution is to use :func:`get_parameters <stable_baselines.common.base_class.BaseRLModel.get_parameters>`
+function to obtain model parameters, construct the network manually in PyTorch and assign parameters correctly.
+
+.. warning::
+  PyTorch and Tensorflow have internal differences with e.g. 2D convolutions (see discussion linked below).
+
+
+See `discussion #372 <https://github.com/hill-a/stable-baselines/issues/372>`_ for details.
+
+
+Export to tensorflowjs / tfjs
+-----------------------------
+
+Can be done via Tensorflow's `simple_save <https://www.tensorflow.org/api_docs/python/tf/saved_model/simple_save>`_ function
+and `tensorflowjs_converter <https://www.tensorflow.org/js/tutorials/conversion/import_saved_model>`_.
+
+See `discussion #474 <https://github.com/hill-a/stable-baselines/issues/474>`_ for details.
+
+
+Export to Java
+---------------
+
+Can be done via Tensorflow's `simple_save <https://www.tensorflow.org/api_docs/python/tf/saved_model/simple_save>`_ function.
+
+See `this discussion <https://github.com/hill-a/stable-baselines/issues/329>`_ for details.
+
+
+Manual export
+-------------
+
+You can also manually export required parameters (weights) and construct the
+network in your desired framework, as done with the PyTorch example above.
+
+You can access parameters of the model via agents'
+:func:`get_parameters <stable_baselines.common.base_class.BaseRLModel.get_parameters>`
+function. If you use default policies, you can find the architecture of the networks in
+source for :ref:`policies <policies>`. Otherwise, for DQN/SAC/DDPG or TD3 you need to check the `policies.py` file located
+in their respective folders.
diff --git a/docs/guide/save_format.rst b/docs/guide/save_format.rst
@@ -4,24 +4,24 @@
 On saving and loading
 =====================
 
-Stable baselines stores both neural network parameters and algorithm-related parameters such as 
-exploration schedule, number of environments and observation/action space. This allows continual learning and easy 
+Stable baselines stores both neural network parameters and algorithm-related parameters such as
+exploration schedule, number of environments and observation/action space. This allows continual learning and easy
 use of trained agents without training, but it is not without its issues. Following describes two formats
 used to save agents in stable baselines, their pros and shortcomings.
 
 Terminology used in this page:
 
--  *parameters* refer to neural network parameters (also called "weights"). This is a dictionary 
-   mapping Tensorflow variable name to a NumPy array. 
--  *data* refers to RL algorithm parameters, e.g. learning rate, exploration schedule, action/observation space. 
+-  *parameters* refer to neural network parameters (also called "weights"). This is a dictionary
+   mapping Tensorflow variable name to a NumPy array.
+-  *data* refers to RL algorithm parameters, e.g. learning rate, exploration schedule, action/observation space.
    These depend on the algorithm used. This is a dictionary mapping classes variable names their values.
 
 
 Cloudpickle (stable-baselines<=2.7.0)
 -------------------------------------
 
 Original stable baselines save format. Data and parameters are bundled up into a tuple ``(data, parameters)`` 
-and then serialized with ``cloudpickle`` library (essentially the same as ``pickle``). 
+and then serialized with ``cloudpickle`` library (essentially the same as ``pickle``).
 
 This save format is still available via an argument in model save function in stable-baselines versions above
 v2.7.0 for backwards compatibility reasons, but its usage is discouraged.
@@ -32,31 +32,31 @@ Pros:
 -  Works with almost any type of Python object, including functions.
 
 
-Cons: 
+Cons:
 
 -  Pickle/Cloudpickle is not designed for long-term storage or sharing between Python version.
 -  If one object in file is not readable (e.g. wrong library version), then reading the rest of the
    file is difficult.
 -  Python-specific format, hard to read stored files from other languages.
 
 
-If part of a saved model becomes unreadable for any reason (e.g. different Tensorflow versions), then 
+If part of a saved model becomes unreadable for any reason (e.g. different Tensorflow versions), then
 it may be tricky to restore any of the model. For this reason another save format was designed.
 
 
 Zip-archive (stable-baselines>2.7.0)
 -------------------------------------
 
-A zip-archived JSON dump and NumPy zip archive of the arrays. The data dictionary (class parameters) 
+A zip-archived JSON dump and NumPy zip archive of the arrays. The data dictionary (class parameters)
 is stored as a JSON file, model parameters are serialized with ``numpy.savez`` function and these two files
-are stored under a single .zip archive. 
+are stored under a single .zip archive.
 
 Any objects that are not JSON serializable are serialized with cloudpickle and stored as base64-encoded
 string in the JSON file, along with some information that was stored in the serialization. This allows
 inspecting stored objects without deserializing the object itself.
 
 This format allows skipping elements in the file, i.e. we can skip deserializing objects that are
-broken/non-serializable. This can be done via ``custom_objects`` argument to load functions. 
+broken/non-serializable. This can be done via ``custom_objects`` argument to load functions.
 
 This is the default save format in stable baselines versions after v2.7.0.
 
@@ -69,7 +69,7 @@ File structure:
   ├── parameter_list    JSON file of model parameters and their ordering (list)
   ├── parameters        Bytes from numpy.savez (a zip file of the numpy arrays). ...
       ├── ...           Being a zip-archive itself, this object can also be opened ...
-          ├── ...       as a zip-archive and browsed. 
+          ├── ...       as a zip-archive and browsed.
 
 
 Pros:
@@ -80,7 +80,7 @@ Pros:
    languages.
 
 
-Cons: 
+Cons:
 
 -  More complex implementation.
--  Still relies partly on cloudpickle for complex objects (e.g. custom functions).
+-  Still relies partly on cloudpickle for complex objects (e.g. custom functions).
diff --git a/docs/index.rst b/docs/index.rst
@@ -50,6 +50,7 @@ This toolset is a fork of OpenAI Baselines, with a major structural refactoring,
    guide/pretrain
    guide/checking_nan
    guide/save_format
+   guide/export
 
 
 .. toctree::
@@ -96,7 +97,7 @@ To cite this project in publications:
 .. code-block:: bibtex
 
     @misc{stable-baselines,
-      author = {Hill, Ashley and Raffin, Antonin and Ernestus, Maximilian and Gleave, Adam and Traore, Rene and Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai},
+      author = {Hill, Ashley and Raffin, Antonin and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Traore, Rene and Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai},
       title = {Stable Baselines},
       year = {2018},
       publisher = {GitHub},

diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -15,27 +15,31 @@ Breaking Changes:
   extra. When `mpi4py` is not available, stable-baselines skips imports of
   OpenMPI-dependent algorithms.
   See :ref:`installation notes <openmpi>` and
-  `Issue #430 <https://github.com/hill-a/stable-baselines/issues/430>`.
+  `Issue #430 <https://github.com/hill-a/stable-baselines/issues/430>`_.
 - SubprocVecEnv now defaults to a thread-safe start method, `forkserver` when
   available and otherwise `spawn`. This may require application code be
   wrapped in `if __name__ == '__main__'`. You can restore previous behavior
   by explicitly setting `start_method = 'fork'`. See
   `PR #428 <https://github.com/hill-a/stable-baselines/pull/428>`_.
+- updated dependencies: tensorflow v1.8.0 is now required
 
 New Features:
 ^^^^^^^^^^^^^
+- **important change** Switch to using zip-archived JSON and Numpy `savez` for
+  storing models for better support across library/Python versions. (@Miffyli)
 
 Bug Fixes:
 ^^^^^^^^^^
 - Skip automatic imports of OpenMPI-dependent algorithms to avoid an issue
   where OpenMPI would cause stable-baselines to hang on Ubuntu installs.
   See :ref:`installation notes <openmpi>` and
-  `Issue #430 <https://github.com/hill-a/stable-baselines/issues/430>`.
+  `Issue #430 <https://github.com/hill-a/stable-baselines/issues/430>`_.
 - Fix a bug when calling `logger.configure()` with MPI enabled (@keshaviyengar)
+- set `allow_pickle=True` for numpy>=1.17.0 when loading expert dataset
 
 Deprecations:
 ^^^^^^^^^^^^^
-- Models saved with cloudpickle format (stable-baselines<=2.7.0) are now 
+- Models saved with cloudpickle format (stable-baselines<=2.7.0) are now
   deprecated in favor of zip-archive format for better support across
   Python/Tensorflow versions. (@Miffyli)
 
@@ -46,42 +50,15 @@ Others:
   to `stable_baselines.common.noise`. The API remains backward-compatible;
   for example `from stable_baselines.ddpg.noise import NormalActionNoise` is still
   okay. (@shwang)
-- **important change** Switch to using zip-archived JSON and Numpy `savez` for 
-  storing models for better support across library/Python verions. (@Miffyli)
+- docker images were updated
 
 Documentation:
 ^^^^^^^^^^^^^^
 - Add WaveRL project (@jaberkow)
 - Add Fenics-DRL project (@DonsetPG)
 - Fix and rename custom policy names (@eavelardev)
-
-
-
-Pre-Release 2.7.1a0 (WIP)
---------------------------
-
-
-Breaking Changes:
-^^^^^^^^^^^^^^^^^
-- updated dependencies: tensorflow v1.8.0 is now required
-
-New Features:
-^^^^^^^^^^^^^
-
-Bug Fixes:
-^^^^^^^^^^
-- set `allow_pickle=True` for numpy>=1.17.0 when loading expert dataset
-
-Deprecations:
-^^^^^^^^^^^^^
-
-Others:
-^^^^^^^
-- docker images were updated
-
-Documentation:
-^^^^^^^^^^^^^^
-
+- Add documentation on exporting models.
+- Update maintainers list (Welcome to @Miffyli)
 
 
 Release 2.7.0 (2019-07-31)
@@ -476,14 +453,17 @@ Maintainers
 -----------
 
 Stable-Baselines is currently maintained by `Ashley Hill`_ (aka @hill-a), `Antonin Raffin`_ (aka `@araffin`_),
-`Maximilian Ernestus`_ (aka @erniejunior) and `Adam Gleave`_ (`@AdamGleave`_).
+`Maximilian Ernestus`_ (aka @erniejunior), `Adam Gleave`_ (`@AdamGleave`_) and `Anssi Kanervisto`_ (aka `@Miffyli`_).
 
 .. _Ashley Hill: https://github.com/hill-a
 .. _Antonin Raffin: https://araffin.github.io/
 .. _Maximilian Ernestus: https://github.com/erniejunior
 .. _Adam Gleave: https://gleave.me/
 .. _@araffin: https://github.com/araffin
 .. _@AdamGleave: https://github.com/adamgleave
+.. _Anssi Kanervisto: https://github.com/Miffyli
+.. _@Miffyli: https://github.com/Miffyli
+
 
 Contributors (since v2.0.0):
 ----------------------------