Trust-Region-Policy-Optimization

My attepmt at a TRPO implementation in pytorch. :)

The implementation is inspired from UC Berkeley's Deep RL Bootcamp's assignments and the following TRPO implementations by ikostrikov , mjacar and the original implementation by John Schulman.

python main.py

All parameters exist in trpo_agent.py

Provide feedback