Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use trained A2C agent #326

Open
ahsteven opened this issue Mar 12, 2018 · 0 comments
Open

How to use trained A2C agent #326

ahsteven opened this issue Mar 12, 2018 · 0 comments

Comments

@ahsteven
Copy link

ahsteven commented Mar 12, 2018

Note: There is a small bug on line 7 of run_atari.py for a2c. It reads:
from baselines.ppo2.policies import CnnPolicy, LstmPolicy, LnLstmPolicy
but should be:
from baselines.a2c.policies import CnnPolicy, LstmPolicy, LnLstmPolicy

I have trained the A2C agent for several hours on breakout but I cannot get good results when I try to use the agent.

The way the CNN is set up is to input a batch however when playing the game I just want to feed one observation. So I changed ob_shape in a2c/policies from
ob_shape = (nbatch, nh, nw, nc)
to
ob_shape = (None, nh, nw, nc)

However I am afraid I am perhaps causing a problem with the weights being reused in the train_model.

I wrote the following function to play the game but have never gotten more than one point on breakout.

def play_episode(env_name, model, seed):
    env = gym.make(env_name)
    env = wrap_deepmind(env, frame_stack=True, scale=True)
    obs, states, done = env.reset(), None, False
    episode_rew = 0
    while not done:
        obs = np.reshape(obs, (1,84,84,4))
        action, value, states, _ = model.step(obs, states, done)# states used for lstm model only
        obs, rew, done, _ = env.step(action)
        episode_rew += rew
    env.close
    return episode_rew

What is the proper way to load and play the game to get the reported scores?
I used the step_model to play the game. Is this correct? Or should I be using the train_model?

Also, how are the weights of the step_model updated?

kosii pushed a commit to kosii/baselines that referenced this issue May 26, 2019
* Issue openai#317 [feature request] filter_size can be a array instead of one value

* Issues openai#326 [Feature] filter_size can be a array

* Issue openai#326 [Feature] filter_size can be a array

* Issues openai#326 [Feature] filter_size can be a array: Line too long

* Update changelog.rst

* Issue openai#326 [Feature] filter_size can be a array, the added test code is test_a2c_conv.py

* Issues openai#326 [Feature] filter_size can be a array, remove the unused variables

* Issues openai#326 [Feature] filter_size can be a array, remove the unused library

* Issue openai#326, [Feature] filter_size can be a array. Clean up the test code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant