Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving code quality #400

Open
david-woelfle opened this issue May 9, 2018 · 6 comments
Open

Improving code quality #400

david-woelfle opened this issue May 9, 2018 · 6 comments

Comments

@david-woelfle
Copy link

Hi everybody,

first many thanks for putting this repo online. As I found it I thought something like: "High quality implementations of RL algorithms? That is pretty cool." However, after having a more detailed look into the repository I was actually very disappointed that what I found does not appear to be anything close to high quality code.

Here some examples what I mean:

Having a look at baselines/baselines/a2c/a2c.py there are nearly no comments explaining what is done and why. I get that large parts of the code can be understood by reading the corresponding paper and that is fair enough for me. Nevertheless, not writing any docstrings at all seems very poor to me, See e.g. lines 18-31 of a2c.py:

def __init__(self, policy, ob_space, ac_space, nenvs, nsteps,
            ent_coef=0.01, vf_coef=0.5, max_grad_norm=0.5, lr=7e-4,
            alpha=0.99, epsilon=1e-5, total_timesteps=int(80e6), lrschedule='linear'):

        sess = tf_util.make_session()
        nbatch = nenvs*nsteps

        A = tf.placeholder(tf.int32, [nbatch])
        ADV = tf.placeholder(tf.float32, [nbatch])
        R = tf.placeholder(tf.float32, [nbatch])
        LR = tf.placeholder(tf.float32, [])

        step_model = policy(sess, ob_space, ac_space, nenvs, 1, reuse=False)
        train_model = policy(sess, ob_space, ac_space, nenvs*nsteps, nsteps, reuse=True)

While reading this code I ask myself many questions, e.g:

  • What is e.g. ob_space and what object do you expect for it.
  • Why have you chosen the default values you have chosen?
  • Why is reuse=False for step_model and reuse=True for train_model

Going on, some more points I would expect from high quality code:

  • At least a basic test coverage. Currently there are no tests implemented, besides (and gratefully acknowledged) some parts of the common module.
  • That a common code conventions is obeyed, for python that is usually pep8. The a2c.py file for example has 32 pep8 violations on 160 lines of code.
  • That there is some documentation on the structure of the repository. E.g. How is the common folder organised? Which files belong in there, and what are the existing files in common for? What can I expect from the mpi_moments.py file (also no docstrings in the file where I could find explanation)?

That's a lot of criticism I know. So why should you care at all? Well the idea of this repo is actually brillant and I am sure it works fine as long as I don't have to touch anything. But as soon as I want to change a thing I would rather throw you code away and write it new myself, because it's less effort then understanding poorly documented code.

Finally, to leave you with a question, is there any chance that the above points are being improved in future?

Kind regards,

David

@pzhokhov
Copy link
Collaborator

pzhokhov commented May 9, 2018

Hi David! Thank you for pointing that out! All your concerns are perfectly valid; in fact, we have tripped over many of these problems internally as well. Recently, we started an effort on code quality improvement of baselines, specifically, test and benchmark coverage, pep8 (albeit with a bit relaxed standards, at least at first, around line length, star imports and maybe variable names because researchers love to name variables "Q" or "Vf" :) ). Code comments and docs will follow, with a bit of a fuzzy timeline though. If you have any particular algorithm you'd like to receive high-priority love, please let us know, also, we are open to pull requests with either new algorithms or improvements to the existing ones.

@david-woelfle
Copy link
Author

Hi @pzhokhov! Thank you for your quick reply and good to know you are working on this. I think the concept of this repository is so great it's worth spending the effort on it. I will see if I can spend some time within my next projects to add some documentation too. Do you have any docstring convention (aka layout) chosen yet?

One more thing, is there a common API layout? Like e.g. in scikit-learn every classifier/regressor provides a fit and transform method.

@jacky-liang
Copy link

Adding onto this - the Gym library provides a nice and clean interface for interacting with RL environments. It'd be nice if the same consistency can be applied to the algorithms and policies in baselines.

For example, it seems like many algorithms implement their own versions of neural nets and policies, when a modular approach could be taken, where the implementation details of policies can be abstracted away from the training algorithms, and vice versa.

Ideally there should just be 1 train.py that takes in a config file which specifies the policy architectures and the algorithm, and loads the corresponding components.

It'd be also nice to create a standard policy saving/loading interface, so it'd be possible to load and collect data from policies separate from the learning algorithms that were used for training.

@pzhokhov
Copy link
Collaborator

@david-woelfle API layout - there is no solid common API just yet (that's another part of the code quality effort). So far the most stable part seems that each algorithm provides a "learn" method that takes environment, policy, and algorithm hyperparameters and returns the trained model.
Docstrings - I personally like multi-line ones separated by triple single quotes, with the description of parameters and return values for "public" methods, but again, no solid convention across the repo exists at the moment.
@jacky-liang both are good points. The saving / loading interface in a bare-bones version is in internal review now, so should be out soon; the modular API for training, algorithms, policies and the environments is something we are also currently working on.

@mt-
Copy link

mt- commented Jul 24, 2018

Hi all,

Not to pile on here, but I have a few more points I think are worth highlighting.

  • The codes bases for ppo1 and ppo2 are very different. This is true both in terms of arguments (batching and such), how they deal with the old_pi, saving/restoring, sampling vs mean, and lots more details I'm happy to expand on.
  • The actual implementations of ppo1 and ppo2 are also different with no explanation. Specifically ppo1 no longer applies clipping to the value function. It used to in an older implementation, and ppo2 still does. There is no documentation as to why this change was made.
  • The vectorization library is great but it could really use some documentation. I'm still not 100% clear why the dummy implementation supports multiple environments (I'm using SubprocVecEnv to parallelize during training and DummyVecEnv for evaluation which I believe is correct).
  • Backwards compatibility seems like a low priority (looking at you tf_util.py).
  • The save system has already been highlighted, but I should add that using logger.get_dir() probably isn't the best choice for where to save the model.
  • Single letter variable names are everywhere, as are acronyms.
  • People seem to hate spaces....

While I massively appreciate the code, cleaning things up a bit would awesome.

@jperl
Copy link

jperl commented Oct 15, 2018

The stable-baselines fork has done a lot of heavy-lifting to refactor these baselines into a common API and add documentation. Would it be possible to merge their efforts upstream?

Edit: this discussion is happening here 🙏.

AdamGleave pushed a commit to HumanCompatibleAI/baselines that referenced this issue Jul 24, 2019
* Add 'terminal_observation' info key to vecenvs

Fixes openai#400

* Review changes to terminal_observation fix

* Fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants