欢迎查看天授平台中文文档

天授是一个基于PyTorch的深度强化学习平台，目前实现的算法有：

DQN :class:`~tianshou.policy.DQNPolicy` Deep Q-Network
双网络DQN :class:`~tianshou.policy.DQNPolicy` Double DQN
C51 :class:`~tianshou.policy.C51Policy` Categorical DQN
QR-DQN :class:`~tianshou.policy.QRDQNPolicy` Quantile Regression DQN
Rainbow :class:`~tianshou.policy.RainbowPolicy` Rainbow DQN
IQN :class:`~tianshou.policy.IQNPolicy` Implicit Quantile Network
FQF :class:`~tianshou.policy.FQFPolicy` Fully-parameterized Quantile Function
策略梯度 :class:`~tianshou.policy.PGPolicy` Policy Gradient
自然策略梯度 :class:`~tianshou.policy.NPGPolicy` Natural Policy Gradient
优势动作评价 (A2C) :class:`~tianshou.policy.A2CPolicy` Advantage Actor-Critic
信任区域策略优化 (TRPO) :class:`~tianshou.policy.TRPOPolicy` Trust Region Policy Optimization
近端策略优化 (PPO) :class:`~tianshou.policy.PPOPolicy` Proximal Policy Optimization
深度确定性策略梯度 (DDPG) :class:`~tianshou.policy.DDPGPolicy` Deep Deterministic Policy Gradient
双延迟深度确定性策略梯度 (TD3) :class:`~tianshou.policy.TD3Policy` Twin Delayed DDPG
软动作评价 (SAC) :class:`~tianshou.policy.SACPolicy` Soft Actor-Critic
离散软动作评价 :class:`~tianshou.policy.DiscreteSACPolicy` Discrete Soft Actor-Critic
模仿学习 :class:`~tianshou.policy.ImitationPolicy` Imitation Learning
BCQ :class:`~tianshou.policy.DiscreteBCQPolicy` Discrete Batch-Constrained deep Q-Learning
CQL :class:`~tianshou.policy.DiscreteCQLPolicy` Discrete Conservative Q-Learning
CRR :class:`~tianshou.policy.DiscreteCRRPolicy` Critic Regularized Regression
后验采样强化学习 (PSRL) :class:`~tianshou.policy.PSRLPolicy` Posterior Sampling Reinforcement Learning
优先级经验重放 (PER) :class:`~tianshou.data.PrioritizedReplayBuffer` Prioritized Experience Replay
广义优势函数估计器 (GAE) :meth:`~tianshou.policy.BasePolicy.compute_episodic_return` Generalized Advantage Estimator

天授还有如下特点：

实现优雅，使用4000多行代码即完全实现上述功能
目前为止实现效果最好的 MuJoCo benchmark
支持任意算法的多个环境（同步异步均可的）并行采样，详见 :ref:`parallel_sampling`
支持动作网络和价值网络使用循环神经网络（RNN）来实现，详见 :ref:`rnn_training`
支持自定义环境，包括任意类型的观测值和动作值（比如一个字典、一个自定义的类），详见 :ref:`self_defined_env`
支持自定义训练策略，详见 :ref:`customize_training`
支持 N-step bootstrap 采样方式 :meth:`~tianshou.policy.BasePolicy.compute_nstep_return` 和优先级经验重放 :class:`~tianshou.data.PrioritizedReplayBuffer` 在任意基于Q学习的算法上的应用；感谢numba jit的优化让GAE、nstep和PER运行速度变得巨快无比
支持多智能体学习，详见 :ref:`marl_example`
拥有全面的单元测试，包括功能测试、完整训练流程测试、文档测试、代码风格测试和类型测试

与英文文档不同，中文文档提供了一个宏观层面的对天授平台的概览。（其实都是毕业论文里面弄出来的）

安装

天授目前发布在 PyPI 和 conda-forge 中，需要Python版本3.6以上。

通过PyPI进行安装：

$ pip install tianshou

通过conda进行安装：

$ conda install -c conda-forge tianshou

还可以直接从GitHub源代码最新版本进行安装：

$ pip install git+https://github.com/thu-ml/tianshou.git@master --upgrade

在安装完毕后，打开您的Python并输入

import tianshou
print(tianshou.__version__)

如果没有异常出现，那么说明已经成功安装了。

.. toctree::
   :maxdepth: 1
   :caption: 教程

   slide
   tutorials
   concepts
   benchmark
   cheatsheet

.. toctree::
   :maxdepth: 2
   :caption: 文档

   docs/toc

.. toctree::
   :maxdepth: 1
   :caption: 贡献

   contributing

Indices and tables

:ref:`genindex`
:ref:`modindex`
:ref:`search`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

欢迎查看天授平台中文文档

安装

Indices and tables

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

欢迎查看天授平台中文文档

安装

Indices and tables