Saving and restoring DDPG agent #162

Sumsamkhan · 2017-10-09T18:57:29Z

Can someone please tell me how to save and load a model in the DDPG implementation?

watts4speed · 2018-01-20T19:01:37Z

I have the same issue

xmanatee · 2018-01-20T20:00:47Z

Same here :(

hfurkanbozkurt · 2018-01-20T20:03:25Z

Hey, you can use tf.train.saver as it is described in here: https://www.tensorflow.org/programmers_guide/saved_model
Or you can return the agent from the function. Keep in mind that, before you return from function, do not forget to finish the episode. Otherwise, you cannot use the environment directly.

haudren · 2018-05-29T04:24:51Z

To achieve this I had to modify the code to actually use the provided tf.train.Saver as follows:

diff --git a/baselines/ddpg/training.py b/baselines/ddpg/training.py
index 74a9b8f..103010d 100644
--- a/baselines/ddpg/training.py
+++ b/baselines/ddpg/training.py
@@ -182,6 +182,10 @@ def train(env, nb_epochs, nb_epoch_cycles, render_eval, reward_scale, render, pa
             logger.dump_tabular()
             logger.info('')
             logdir = logger.get_dir()
+
+            if saver is not None:
+                saver.save(sess, os.path.join(logdir, 'checkpoint', '{}_reach.ckpt'.format(epoch_episodes)))
+
             if rank == 0 and logdir:
                 if hasattr(env, 'get_state'):
                     with open(os.path.join(logdir, 'env_state.pkl'), 'wb') as f:

This saves the current tf graph in the /tmp/openai-xxx/checkpoint at every epoch.

keithmgould · 2018-06-14T16:27:14Z

I've managed to save the model, and load it in a later session. However upon load the returns are not jumping back to where they were when the model was trained/saved. So for example untrained the returns were 40. After training they were 100. I can load the saved model, and even inspect the model to verify its the new saved model, yet returns are at 40. Thoughts? Here is code change:

diff --git a/baselines/ddpg/training.py b/baselines/ddpg/training.py
index 74a9b8f..39ec84d 100644
--- a/baselines/ddpg/training.py
+++ b/baselines/ddpg/training.py
@@ -31,7 +32,7 @@ def train(env, nb_epochs, nb_epoch_cycles, render_eval, reward_scale, render, pa

     # Set up logging stuff only for a single worker.
     if rank == 0:
-        saver = tf.train.Saver()
+        saver = tf.train.Saver(max_to_keep=100)
     else:
         saver = None

@@ -41,9 +42,20 @@ def train(env, nb_epochs, nb_epoch_cycles, render_eval, reward_scale, render, pa
     episode_rewards_history = deque(maxlen=100)
     with U.single_threaded_session() as sess:
         # Prepare everything.
+
+        if restore == True:
+            logger.info("Restoring from saved model")
+            saver.restore(sess, tf.train.latest_checkpoint('./models/'))
+        else:
+            logger.info("Starting from scratch!")
+            sess.run(tf.global_variables_initializer()) # this should happen here and not in the agent right?
+
         agent.initialize(sess)

         sess.graph.finalize()

         agent.reset()
         obs = env.reset()
         if eval_env is not None:
@@ -182,6 +194,11 @@ def train(env, nb_epochs, nb_epoch_cycles, render_eval, reward_scale, render, pa
             logger.dump_tabular()
             logger.info('')
             logdir = logger.get_dir()
+
+            logger.info('saving model...')
+            saver.save(sess, './models/my_model', global_step=epoch, write_meta_graph=False)
+            logger.info('done saving model!')
+
             if rank == 0 and logdir:
                 if hasattr(env, 'get_state'):

diff --git a/baselines/ddpg/ddpg.py b/baselines/ddpg/ddpg.py
index e2d4950..9e8a2ad 100644
--- a/baselines/ddpg/ddpg.py
+++ b/baselines/ddpg/ddpg.py
@@ -323,7 +323,7 @@ class DDPG(object):

     def initialize(self, sess):
         self.sess = sess
-        self.sess.run(tf.global_variables_initializer())
+        # self.sess.run(tf.global_variables_initializer()) // why does this happen here and not in trainer?
         self.actor_optimizer.sync()
         self.critic_optimizer.sync()
         self.sess.run(self.target_init_updates)

If its not clear, I moved the global_var_init into trainer since we only want to use it on a fresh model.

freeze888 · 2018-06-20T01:54:01Z

agent.reset() initializes all parameters randomly.
I made new function agent.reset_test() and use it instead of agent.reset()
----------------------------------- in train function()
saver.restore(sess,path)

        agent.initialize_test(sess)

        sess.graph.finalize()

        agent.reset_test()
        obs = env.reset()
        #if eval_env is not None:
        #    eval_obs = eval_env.reset()
        done = False
        episode_reward = 0.
        episode_step = 0
        episodes = 0
        t = 0

def reset_test(self):
    # Reset internal state after an episode is complete.
    if self.action_noise is not None:
        self.action_noise.reset()
    '''
    if self.param_noise is not None:
        self.sess.run(self.perturb_policy_ops, feed_dict={
            self.param_noise_stddev: self.param_noise.current_stddev,
        })
    '''

keithmgould · 2018-06-20T15:27:16Z

@freeze888 Not sure I follow - agent.reset() is called after every episode. I don't understand how this method (or modifying this method) could be the issue?

Also it looks like this method adjusts action/parameter noise in the DDPG class, which should not effect a restore?

zhehuazhou · 2018-07-04T08:15:51Z

If you select the noise type as parameter noise, then there is a variable in agent called agent.param_noise.current_stddev, which is not a tensor, so when you save the trained agent and do the restoring, this parameter will not be recovered.

To restore your agent, you need to either add this parameter to the session or manually initialize your parameter noise std as your latest one.

Daniel451 · 2018-07-06T05:43:35Z

I am still confused: is editing the baseline code directly really the way to do it? In my opinion, this would be a flaw in the general design. Shouldn't there be at least something like a log method that gets called every n steps, so that one could just add saving there?

joellutz · 2018-07-19T09:11:38Z

@keithmgould I tried your solution, but it somehow didn't really work as you mentioned. Did you (or someone else) manage to correctly save & restore the DDPG agent? After all, training a RL algorithm without being able to load the trained model is not really useful in my opinion.

iSaran · 2018-07-19T09:38:43Z

@joellutz I agree. I believe it should be like the HER implementation in which there is support for train (save the best policy, the latest policy etc), play a policy etc. However HER has its own DDPG implementation. It would be probably a good idea to standardize the contributed algorithms in this repo in order to have similar structure and support for basic features (like save and restore policies).

keithmgould · 2018-07-19T14:29:46Z

Hey all,

Have not solved the problem, so as a workaround I used a different (non-Baseline) implementation of DDPG. If thats an option for you, check out Patrick Emami's implementation here. He also wrote a nice intro to the algorithm, here. I've found it easy to save and restore, and it trains just fine. Not sure whats going on with the Baseline version.

joellutz · 2018-07-20T05:50:05Z

@keithmgould Yeah I've already tried Emami's implementation. The code is much cleaner & shorter, but the techniques differ (e.g. no parameter noise, observation & layer normalization, critic-l2-regularization). The fact that the baseline implementation worked quite quickly for my environment & has by far better action space exploration (possibly due to parameter noise) are strong arguments for baseline. But saving & restoring the model just doesn't want to work with baseline...

I tried extending Emami's implementation with the missing techniques, but I couldn't add the adaptive param noise & observation normalization there, as I'm relatively new to tensorflow, tflearn & RL in general. Does anyone know how to do param noise & observation normalization in tflearn (in Emami's implementation)? I don't really know what's going on with the tensorflow code in the baseline implementation to be honest.

jramak · 2018-07-21T02:46:26Z

@chow0214 I saved the noise param in a pkl file, and the neural net weights with tensorflow's Saver. However, it still does not restore exactly where it left off.

watts4speed · 2018-07-23T02:08:27Z

Not having the ability to save/restore is a really big issue, I agree. Is there anyway we can get an indication from the OpenAI people about whether this issue is something they intend to fix? It's also a big deal to switch away from baselines/DDPG given the other stuff they have implemented and I don't want to do so if they are going to fix the problem.

joellutz · 2018-07-27T08:25:09Z

I did some testruns with my environment, trying to save & restore the model. I captured the last state right before training & saving of the model, as well as the selected action right after that (in the new epoch-cycle). So I captured the action and the state this action was based on. Then I terminated the training, and started it again, but this time I restored the model from the previously saved file. At the env.reset() (right at the beginning of the episode) I injected the previously captured state. Then I captured the selected action right at the beginning, which should be exactly the same as the previously captured action, because the state each of them are based on is exactly the same.

The two actions were exactly the same when having no parameter or action noise at all, so I think the saving & restoring might have worked. With parameter noise, the two actions weren't exactly the same, but this is maybe due to the randomness of the parameter noise which leads to a slightly different action. (Even though everything is seeded, the random generators are in a different state right at the beginning vs. right after the first epoch-cycle.)

This is how I save & restore the model (all in baselines/ddpg/training.py, slightly modified compared to @keithmgould's solution)

# ...
with U.single_threaded_session() as sess:
    # Prepare everything.
    
    if restore == True:
        logger.info("Restoring from saved model")
        saver = tf.train.import_meta_graph(savingModelPath + "ddpg_test_model.meta")
        saver.restore(sess, tf.train.latest_checkpoint(savingModelPath))
    else:
        logger.info("Starting from scratch!")
        sess.run(tf.global_variables_initializer()) # this should happen here and not in the agent right?


    agent.initialize(sess)
    sess.graph.finalize()

    # ...

            # in the epoch_cycles loop (after the training of the model)
            # Saving the trained model
            if(saver is not None):
                logger.info("saving the trained model")
                start_time_save = time.time()
                saver.save(sess, savingModelPath + "ddpg_test_model")
                logger.info('runtime saving: {}s'.format(time.time() - start_time_save))

            logger.info('runtime epoch-cycle {0}: {1}s'.format(cycle, time.time() - start_time_cycle))

        mpi_size = MPI.COMM_WORLD.Get_size()
        # ...

The logs of my testrun without any param or action noise:

# (last rollout step)
selected (unscaled) action: [-0.012 0.005 0.007 -0.002]
Training the Agent
saving the trained model
runtime saving: 2.47790503502s
runtime epoch-cycle 0: 722.832417965s
selected (unscaled) action: [ 0.001 -0.001 0.009 -0.035]
# (run aborted)

# (new run with restore=True and injected state)
Restoring from saved model
INFO:tensorflow:Restoring parameters from ~/Documents/saved_models_OpenAI_gym/ddpg_test_model
selected (unscaled) action: [ 0.001 -0.001 0.009 -0.035]

jramak · 2018-07-27T17:03:34Z

@joellutz thanks, seems like you included writing the meta graph in the save compared to @keithmgould 's solution. Did it make a difference when restoring from the meta graph? It says here it's not necessary: https://stackoverflow.com/questions/36195454/what-is-the-tensorflow-checkpoint-meta-file

joellutz · 2018-07-30T08:04:12Z

@jramak I don't know if writing the meta graph is really necessary or not, I followed this tutorial where they include it. I haven't tried if it still works without, as the process of capturing the state etc. is quite tedious for my environment.

LaTinta · 2018-08-02T07:35:53Z

@joellutz Thanks~ Your method is very effective.
If someone wants to save and load DDPG model, there are some steps:

use saver.save() to save tf session;
move sess.run(tf.global_variables_initializer()) from initialize in DDPG, to where you want to initializatize model parameters;
use saver.restore() to load tf session, and use initialize() to load the session in DDPG model.

If you need to use the model in a service, I suggest you may use tf.InteractiveSession() instead of U.single_threaded_session().

brendenpetersen · 2018-09-27T20:34:28Z

@LaTinta This method will not save the information from the RunningMeanStd object, correct? So, if normalize_observations=True (the default value), then the agent used at evaluation will not be the same as the one in training.

For my own use cases, I've set normalize_observations=False to avoid this issue (and because my observation space was already normalized, so it ended up hurting performance anyway). But with PPO1, for example, the RunningMeanStd object is always created (there is no setting to turn it off), so I don't think offline evaluation possible without changing the code.

EDIT: Just saw this at the bottom of the README. Looks like they've added a TfRunningMeanStd class that saves the necessary state as part of the compute graph. Still have to change their code but it should be trivial.

NOTE: At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in baselines/common/vec_env/vec_normalize.py. This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default

EDIT 2: Looks like DDPG and PPO1 don't use VecNormalize, but rather use mpi_running_mean_std.RunningMeanStd, which has no TensorFlow analog. So, I still currently see no way of saving an observation-normalizing DDPG/PPO1 policy without more significant code changes.

r7vme · 2018-10-01T10:00:12Z

Seems stable-baselines DDPG implementation provides ability to save/load.

https://stable-baselines.readthedocs.io/en/master/modules/ddpg.html#example

jrjbertram · 2018-11-22T02:21:08Z

FYI, as of commit 858afa8 these approaches no longer work. ddpg/training.py was removed. Looking at adapting them for the refactored codebase.

jrjbertram · 2018-11-22T14:05:57Z

I have it working (I think) in the latest codebase... except that my model performs poorly after loading from a checkpoint. Used similar approach as described above. I can't tell if that's due to an error in my code or just a bad training result. Does anyone have any ideas on how to verify that the model was loaded correctly?

Code changes in this commit (I have a copy of openai baselines embedded in my repo for now):
https://github.com/jrjbertram/jsbsim_rl/commit/6825c0c277e94d24e3ecb1450eef82dda8b5793d

For my testing I'm using the approach of loading a 0-length training session followed by a --play.

https://github.com/jrjbertram/jsbsim_rl/blob/master/replay.sh

I verified that the code to reload the model is being executed.

Where I'm confused / suspicious is that during training my rollout return and return_history curves look pretty good... nice logarithmic shape. When I replay though my actions seem fairly random and I don't seem to be acting in a way that would collect any reward.

Sohojoe · 2018-11-23T21:54:01Z

@jrjbertram are your normalizing your environment? if so, there is a known issue:

NOTE: At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in baselines/common/vec_env/vec_normalize.py. This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default

source = https://github.com/openai/baselines#saving-loading-and-visualizing-models

* Add policy_kwargs with tests. * Add documentation. * mark unused variable

joyce-fang · 2019-03-29T15:54:44Z

@r7vme The stable-baselines library doesn't seem to solve the RunningMeanStd issue. It changes the normalize_observation default value to False so the RunningMeanStd is not used. When I enable normalize_observation the model does not restore correctly.
EDIT:
Seems like it was fixed last week hill-a@06f5843

DanielTakeshi · 2019-06-28T02:43:53Z

Has there been an update on how to properly save DDPG models?

Sohojoe mentioned this issue Jul 27, 2018

Optional gym wrapper Unity-Technologies/ml-agents#1007

Merged

7 tasks

iSaran mentioned this issue Sep 11, 2018

There is no save model option for DDPG #566

Closed

hari-sikchi mentioned this issue Dec 7, 2018

Added Saving functionality. madras-simulator/baselines#3

Merged

AdamGleave pushed a commit to HumanCompatibleAI/baselines that referenced this issue Feb 18, 2019

Feature/policy kwargs (openai#162)

88a5c5d

* Add policy_kwargs with tests. * Add documentation. * mark unused variable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving and restoring DDPG agent #162

Saving and restoring DDPG agent #162

Sumsamkhan commented Oct 9, 2017

watts4speed commented Jan 20, 2018

xmanatee commented Jan 20, 2018

hfurkanbozkurt commented Jan 20, 2018

haudren commented May 29, 2018

keithmgould commented Jun 14, 2018 •

edited

Loading

freeze888 commented Jun 20, 2018

keithmgould commented Jun 20, 2018

zhehuazhou commented Jul 4, 2018 •

edited

Loading

Daniel451 commented Jul 6, 2018

joellutz commented Jul 19, 2018

iSaran commented Jul 19, 2018

keithmgould commented Jul 19, 2018

joellutz commented Jul 20, 2018 •

edited

Loading

jramak commented Jul 21, 2018

watts4speed commented Jul 23, 2018 •

edited

Loading

joellutz commented Jul 27, 2018 •

edited

Loading

jramak commented Jul 27, 2018 •

edited

Loading

joellutz commented Jul 30, 2018

LaTinta commented Aug 2, 2018 •

edited

Loading

brendenpetersen commented Sep 27, 2018 •

edited

Loading

r7vme commented Oct 1, 2018

jrjbertram commented Nov 22, 2018

jrjbertram commented Nov 22, 2018

Sohojoe commented Nov 23, 2018

joyce-fang commented Mar 29, 2019 •

edited

Loading

DanielTakeshi commented Jun 28, 2019

Saving and restoring DDPG agent #162

Saving and restoring DDPG agent #162

Comments

Sumsamkhan commented Oct 9, 2017

watts4speed commented Jan 20, 2018

xmanatee commented Jan 20, 2018

hfurkanbozkurt commented Jan 20, 2018

haudren commented May 29, 2018

keithmgould commented Jun 14, 2018 • edited Loading

freeze888 commented Jun 20, 2018

keithmgould commented Jun 20, 2018

zhehuazhou commented Jul 4, 2018 • edited Loading

Daniel451 commented Jul 6, 2018

joellutz commented Jul 19, 2018

iSaran commented Jul 19, 2018

keithmgould commented Jul 19, 2018

joellutz commented Jul 20, 2018 • edited Loading

jramak commented Jul 21, 2018

watts4speed commented Jul 23, 2018 • edited Loading

joellutz commented Jul 27, 2018 • edited Loading

jramak commented Jul 27, 2018 • edited Loading

joellutz commented Jul 30, 2018

LaTinta commented Aug 2, 2018 • edited Loading

brendenpetersen commented Sep 27, 2018 • edited Loading

r7vme commented Oct 1, 2018

jrjbertram commented Nov 22, 2018

jrjbertram commented Nov 22, 2018

Sohojoe commented Nov 23, 2018

joyce-fang commented Mar 29, 2019 • edited Loading

DanielTakeshi commented Jun 28, 2019

keithmgould commented Jun 14, 2018 •

edited

Loading

zhehuazhou commented Jul 4, 2018 •

edited

Loading

joellutz commented Jul 20, 2018 •

edited

Loading

watts4speed commented Jul 23, 2018 •

edited

Loading

joellutz commented Jul 27, 2018 •

edited

Loading

jramak commented Jul 27, 2018 •

edited

Loading

LaTinta commented Aug 2, 2018 •

edited

Loading

brendenpetersen commented Sep 27, 2018 •

edited

Loading

joyce-fang commented Mar 29, 2019 •

edited

Loading