-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRPO and PPO Models don't train #84
Comments
Hi @yashviagrawal |
@antoine-galataud My expectations are for the RL agent to learn properly without instantly stabalising. I have attached the image of the graph of the PPO model trained for 200 epochs, where the RL agent has been unable to achieve the goal: |
@yashviagrawal thanks for sharing some results. Did you try to run an experiment from master sources (using TRPO, and without any changes)? |
@antoine-galataud |
@antoine-galataud @takaomoriyama
Hello, I've been working on this project for a long time and I've trained the model using the TRPO policy for over 2000 epochs, but the reward would get stabalised early on only. Then I switched to the PPO policy, and it showed great progress when I trained it for 250 epochs where the reward went from -2 lakhs to -20000, but after that inspite of running it for more 300 epochs, the reward didn't drop and it stabalised.
Please help me figure out why is that?
The text was updated successfully, but these errors were encountered: