Skip to content

Commit

Permalink
PPO cpp project and C++ export (openai#585)
Browse files Browse the repository at this point in the history
* Adding PPO_CPP project description.

* Changing PPO_CPP project description.

* Changelog addition on new dependant project.

* Update changelog.rst

* Added section on C++ portability of Tensorflow models
  • Loading branch information
Antymon authored and araffin committed Nov 28, 2019
1 parent b461adb commit 04c35e1
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 1 deletion.
27 changes: 27 additions & 0 deletions docs/guide/export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,33 @@ function to obtain model parameters, construct the network manually in PyTorch a
See `discussion #372 <https://github.com/hill-a/stable-baselines/issues/372>`_ for details.


Export to C++
-----------------

Tensorflow, which is the backbone of Stable Baselines, is fundamentally a C/C++ library despite being most commonly accessed
through the Python frontend layer. This design choice means that the models created at Python level should generally be
fully compliant with the respective C++ version of Tensorflow.

.. warning::
It is advisable not to mix-and-match different versions of Tensorflow libraries, particularly in terms of the state.
Moving computational graphs is generally more forgiving. As a matter of fact, mentioned below `PPO_CPP <https://github.com/Antymon/ppo_cpp>`_ project uses
graphs generated with Python Tensorflow 1.x in C++ Tensorflow 2 version.

Stable Baselines comes very handily when hoping to migrate a computational graph and/or a state (weights) as
the existing algorithms define most of the necessary computations for you so you don't need to recreate the core of the algorithms again.
This is exactly the idea that has been used in the `PPO_CPP <https://github.com/Antymon/ppo_cpp>`_ project, which executes the training at the C++ level for the sake of
computational efficiency. The graphs are exported from Stable Baselines' PPO2 implementation through ``tf.train.export_meta_graph``
function. Alternatively, and perhaps more commonly, you could use the C++ layer only for inference. That could be useful
as a deployment step of server backends or optimization for more limited devices.

.. warning::
As a word of caution, C++-level APIs are more imperative than their Python counterparts or more plainly speaking: cruder.
This is particularly apparent in Tensorflow 2.0 where the declarativeness of Autograph exists only at Python level. The
C++ counterpart still operates on Session objects' use, which are known from earlier versions of Tensorflow. In our use case,
availability of graphs utilized by Session depends on the use of ``tf.function`` decorators. However, as of November 2019, Stable Baselines still
uses Tensorflow 1.x in the main version which is slightly easier to use in the context of the C++ portability.


Export to tensorflowjs / tfjs
-----------------------------

Expand Down
4 changes: 3 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ Documentation:
- Fix `result_plotter` example
- Fix typo in algos.rst, "containes" to "contains" (@SyllogismRXS)
- Fix outdated source documentation for load_results
- Add PPO_CPP project (@Antymon)
- Add section on C++ portability of Tensorflow models (@Antymon)

Release 2.8.0 (2019-09-29)
--------------------------
Expand Down Expand Up @@ -544,4 +546,4 @@ Thanks to @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar @abhiskk
@EliasHasle @mrakgr @Bleyddyn @antoine-galataud @junhyeokahn @AdamGleave @keshaviyengar @tperol
@XMaster96 @kantneel @Pastafarianist @GerardMaggiolino @PatrickWalter214 @yutingsz @sc420 @Aaahh @billtubbs
@Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket
@MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow
@MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow @Antymon
10 changes: 10 additions & 0 deletions docs/misc/projects.rst
Original file line number Diff line number Diff line change
Expand Up @@ -168,3 +168,13 @@ this study are from stable-baselines.
| Email: [email protected]
| Github: https://github.com/harvard-edge/quarl
| Paper: https://arxiv.org/pdf/1910.01055.pdf

PPO_CPP: C++ version of a Deep Reinforcement Learning algorithm PPO
-------------------------------------------------------------------
Executes PPO at C++ level yielding notable execution performance speedups.
Uses Stable Baselines to create a computational graph which is then used for training with custom environments by machine-code-compiled binary.

| Authors: Szymon Brych
| Email: [email protected]
| GitHub: https://github.com/Antymon/ppo_cpp

0 comments on commit 04c35e1

Please sign in to comment.