Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Classic Control Promble] Training baselines branch TF2-PPO2 to solve Pendulum-v0 extremely slow and unstable #1103

Closed
MrForExample opened this issue May 5, 2020 · 2 comments

Comments

@MrForExample
Copy link

MrForExample commented May 5, 2020

Command:
python -m baselines.run --alg=ppo2 --env=Pendulum-v0 --num_timesteps=1e6 --reward_scale=0.001 --save_path=./models/Pendulm_ppo2

The hyperparameter use in baselines/ppo2/defaults.py:

def classic_control():
    return dict(
        nsteps=128,
        nminibatches=4,
        lam=0.95,
        gamma=0.99,
        noptepochs=4,
        log_interval=1,
        ent_coef=.01,
        lr=lambda f: f * 2e-4,
        cliprange=0.1,
        num_layers=2, 
        num_hidden=64, 
        activation='relu'
    ) 

Example result:

Beginning of training
-------------------------------------------
| eplenmean               | 200           |
| eprewmean               | -1.43e+03     |
| fps                     | 836           |
| loss/approxkl           | 1.0275808e-13 |
| loss/clipfrac           | 0.0           |
| loss/policy_entropy     | 1.4534587     |
| loss/policy_loss        | 1.0652002e-07 |
| loss/value_loss         | 0.006119203   |
| misc/explained_variance | -1.88         |
| misc/nupdates           | 11            |
| misc/serial_timesteps   | 1408          |
| misc/time_elapsed       | 2.96          |
| misc/total_timesteps    | 1408          |
-------------------------------------------
After 1M step:
-------------------------------------------
| eplenmean               | 200           |
| eprewmean               | -1.43e+03     |
| fps                     | 799           |
| loss/approxkl           | 0.0           |
| loss/clipfrac           | 0.0           |
| loss/policy_entropy     | 13.628587     |
| loss/policy_loss        | -5.378388e-08 |
| loss/value_loss         | 0.0008752448  |
| misc/explained_variance | 0.266         |
| misc/nupdates           | 7812          |
| misc/serial_timesteps   | 999936        |
| misc/time_elapsed       | 1.24e+03      |
| misc/total_timesteps    | 999936        |
-------------------------------------------

The source code of baselines and PPO is download use git and left untouched, I spend quite some time adjust the hyperparameter and it doesn't seem have much effect on result if there any,
Does any one have any idea what's could go wrong?

@MrForExample MrForExample changed the title [Classic Control Promble] Training baselines TF2-PPO2 to solve Pendulum-v0 extremely slow and unstable [Classic Control Promble] Training baselines branch TF2-PPO2 to solve Pendulum-v0 extremely slow and unstable May 5, 2020
@DanielTakeshi
Copy link

I would recommend using stable-baselines. This repository is not currently being maintained.

FWIW I have gotten PPO2 to work on MuJoCo (well, last time I tried many months ago).
#938

@MrForExample
Copy link
Author

@DanielTakeshi Thanks for your information, I want find the PPO implement use TF2, and stable-baselines is use TF1.
But never mind, I spent 2 days and write my own PPO implement use TF2 and now seems work fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants