The name rl_toolbox comes from the many implementations of RL algorithms that I tested before I setteled with PPO implemented by stable baseline 3.
This repo contains the environment as well as the helper algorithms necessary to train a quadruped in simulation and deploy the neural networks on a real machine.
- Gym environment to simulate a quadruped (specifically Idef'X) using the simulator Erquy.
- Helper functions to calculate IK and build a small library of motions for the RL to be based on.
- Small reimplementation of PPO by stable baseline 3 to enforce symetrical gait and avoid very large kl-divergence.
- Transfer algorithm to train a student that can only access measurable physical properties against a teacher trained in RL that is given the full space of observations. Inspiration heavily drawn from this paper.
The baseline policy we can get is a simple walking policy, robust to small push on the robot.