You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: There is a small bug on line 7 of run_atari.py for a2c. It reads:
from baselines.ppo2.policies import CnnPolicy, LstmPolicy, LnLstmPolicy
but should be:
from baselines.a2c.policies import CnnPolicy, LstmPolicy, LnLstmPolicy
I have trained the A2C agent for several hours on breakout but I cannot get good results when I try to use the agent.
The way the CNN is set up is to input a batch however when playing the game I just want to feed one observation. So I changed ob_shape in a2c/policies from
ob_shape = (nbatch, nh, nw, nc)
to
ob_shape = (None, nh, nw, nc)
However I am afraid I am perhaps causing a problem with the weights being reused in the train_model.
I wrote the following function to play the game but have never gotten more than one point on breakout.
def play_episode(env_name, model, seed):
env = gym.make(env_name)
env = wrap_deepmind(env, frame_stack=True, scale=True)
obs, states, done = env.reset(), None, False
episode_rew = 0
while not done:
obs = np.reshape(obs, (1,84,84,4))
action, value, states, _ = model.step(obs, states, done)# states used for lstm model only
obs, rew, done, _ = env.step(action)
episode_rew += rew
env.close
return episode_rew
What is the proper way to load and play the game to get the reported scores?
I used the step_model to play the game. Is this correct? Or should I be using the train_model?
Also, how are the weights of the step_model updated?
The text was updated successfully, but these errors were encountered:
kosii
pushed a commit
to kosii/baselines
that referenced
this issue
May 26, 2019
* Issue openai#317 [feature request] filter_size can be a array instead of one value
* Issues openai#326 [Feature] filter_size can be a array
* Issue openai#326 [Feature] filter_size can be a array
* Issues openai#326 [Feature] filter_size can be a array: Line too long
* Update changelog.rst
* Issue openai#326 [Feature] filter_size can be a array, the added test code is test_a2c_conv.py
* Issues openai#326 [Feature] filter_size can be a array, remove the unused variables
* Issues openai#326 [Feature] filter_size can be a array, remove the unused library
* Issue openai#326, [Feature] filter_size can be a array. Clean up the test code
Note: There is a small bug on line 7 of run_atari.py for a2c. It reads:
from baselines.ppo2.policies import CnnPolicy, LstmPolicy, LnLstmPolicy
but should be:
from baselines.a2c.policies import CnnPolicy, LstmPolicy, LnLstmPolicy
I have trained the A2C agent for several hours on breakout but I cannot get good results when I try to use the agent.
The way the CNN is set up is to input a batch however when playing the game I just want to feed one observation. So I changed ob_shape in a2c/policies from
ob_shape = (nbatch, nh, nw, nc)
to
ob_shape = (None, nh, nw, nc)
However I am afraid I am perhaps causing a problem with the weights being reused in the train_model.
I wrote the following function to play the game but have never gotten more than one point on breakout.
What is the proper way to load and play the game to get the reported scores?
I used the step_model to play the game. Is this correct? Or should I be using the train_model?
Also, how are the weights of the step_model updated?
The text was updated successfully, but these errors were encountered: