TRPO and PPO Models don't train #84

yashviagrawal · 2022-03-29T10:30:26Z

@antoine-galataud @takaomoriyama
Hello, I've been working on this project for a long time and I've trained the model using the TRPO policy for over 2000 epochs, but the reward would get stabalised early on only. Then I switched to the PPO policy, and it showed great progress when I trained it for 250 epochs where the reward went from -2 lakhs to -20000, but after that inspite of running it for more 300 epochs, the reward didn't drop and it stabalised.
Please help me figure out why is that?

antoine-galataud · 2022-03-30T08:27:19Z

Hi @yashviagrawal
It's hard to tell what's going wrong without more details, and at first sight it looks like it's converging since mean episode reward is increasing then stabilizing. What are your expectations?

yashviagrawal · 2022-03-31T06:46:10Z

@antoine-galataud
In the graph image shown in the repository, it shows that the RL agent reached its goal of optimising the power consumption and everything in just 300 epochs.
But inspite of me running the code for 2000 epochs using TRPO policy, it still didn't optimise it. And, the same with PPO policy.

My expectations are for the RL agent to learn properly without instantly stabalising.

I have attached the image of the graph of the PPO model trained for 200 epochs, where the RL agent has been unable to achieve the goal:

antoine-galataud · 2022-03-31T07:29:17Z

@yashviagrawal thanks for sharing some results. Did you try to run an experiment from master sources (using TRPO, and without any changes)?

yashviagrawal · 2022-04-20T08:28:27Z

@antoine-galataud
Hello, yes I did try to run the orginal code with the TRPO policy.
I wanted to ask how do I know what the action space was?

antoine-galataud mentioned this issue May 4, 2022

Where is the Model saved? #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRPO and PPO Models don't train #84

TRPO and PPO Models don't train #84

yashviagrawal commented Mar 29, 2022

antoine-galataud commented Mar 30, 2022

yashviagrawal commented Mar 31, 2022

antoine-galataud commented Mar 31, 2022

yashviagrawal commented Apr 20, 2022

TRPO and PPO Models don't train #84

TRPO and PPO Models don't train #84

Comments

yashviagrawal commented Mar 29, 2022

antoine-galataud commented Mar 30, 2022

yashviagrawal commented Mar 31, 2022

antoine-galataud commented Mar 31, 2022

yashviagrawal commented Apr 20, 2022