Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRPO and PPO Models don't train #84

Open
yashviagrawal opened this issue Mar 29, 2022 · 4 comments
Open

TRPO and PPO Models don't train #84

yashviagrawal opened this issue Mar 29, 2022 · 4 comments

Comments

@yashviagrawal
Copy link

@antoine-galataud @takaomoriyama
Hello, I've been working on this project for a long time and I've trained the model using the TRPO policy for over 2000 epochs, but the reward would get stabalised early on only. Then I switched to the PPO policy, and it showed great progress when I trained it for 250 epochs where the reward went from -2 lakhs to -20000, but after that inspite of running it for more 300 epochs, the reward didn't drop and it stabalised.
Please help me figure out why is that?

@antoine-galataud
Copy link
Collaborator

Hi @yashviagrawal
It's hard to tell what's going wrong without more details, and at first sight it looks like it's converging since mean episode reward is increasing then stabilizing. What are your expectations?

@yashviagrawal
Copy link
Author

@antoine-galataud
In the graph image shown in the repository, it shows that the RL agent reached its goal of optimising the power consumption and everything in just 300 epochs.
But inspite of me running the code for 2000 epochs using TRPO policy, it still didn't optimise it. And, the same with PPO policy.

My expectations are for the RL agent to learn properly without instantly stabalising.

I have attached the image of the graph of the PPO model trained for 200 epochs, where the RL agent has been unable to achieve the goal:
MicrosoftTeams-image (1)

@antoine-galataud
Copy link
Collaborator

@yashviagrawal thanks for sharing some results. Did you try to run an experiment from master sources (using TRPO, and without any changes)?

@yashviagrawal
Copy link
Author

@antoine-galataud
Hello, yes I did try to run the orginal code with the TRPO policy.
I wanted to ask how do I know what the action space was?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants