Reinforcement Learning Agents
Implemented for Tensorflow 2.0+
DDPG with prioritized replay
Primal-Dual DDPG for CMDP
Install dependancies imported (my tf2 conda env as reference )
Each file contains example code that runs training on CartPole env
Training: python3 TF2_DDPG_LSTM.py
Tensorboard: tensorboard --logdir=DDPG/logs
Agents tested using CartPole env.
Name
On/off policy
Model
Action space support
DQN
off-policy
Dense, LSTM
discrete
DDPG
off-policy
Dense, LSTM
discrete, continuous
AE-DDPG
off-policy
Dense
discrete, continuous
SAC:bug:
off-policy
Dense
continuous
PPO
on-policy
Dense
discrete, continuous
Name
On/off policy
Model
Action space support
Primal-Dual DDPG
off-policy
Dense
discrete, continuous
Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs
DQN Basic, time step = 4, 500 reward
DQN LSTM, time step = 4, 500 reward
DDPG Basic, 500 reward
DDPG LSTM, time step = 5, 500 reward
AE-DDPG Basic, 500 reward
PPO Basic, 500 reward