Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOFError with multiprocessing #16

Closed
yding5 opened this issue Jul 12, 2019 · 6 comments
Closed

EOFError with multiprocessing #16

yding5 opened this issue Jul 12, 2019 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@yding5
Copy link

yding5 commented Jul 12, 2019

Thanks for your great work. I find this environment very useful for our research. When I tried the training example, I got EOFError whenever the num-processes > 1. The error happens when the training is finished and seems to related to the multiprocessing. The same error seems also happens in a Macbook.

I find some similar problem through search as below, but none of them solve the problem. Could you please take a look at that? Thanks.

duckietown/gym-duckietown#75
openai/baselines#640

Linux

Environment info:
Ubuntu 18.04.2 LTS
python 3.7.3
gym 0.13.0
pyglet 1.2.4

Error message:
(tensorflow_yukun) akash@a1:~/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr$ python main.py --algo ppo --num-frames 4000 --num-processes 2 --num-steps 80 --lr 0.00005 --env-name MiniWorld-Hallway-v0
Falling back to num_samples=1
Falling back to num_samples=1
Falling back to num_samples=1
Falling back to num_samples=1
Creating frame stacking wrapper
Saving model

Updates 10, num timesteps 1760, FPS 61
Last 7 training episodes: mean/median reward 0.14/0.00, min/max reward 0.00/0.96, success rate 0.14

Updates 20, num timesteps 3360, FPS 61
Last 14 training episodes: mean/median reward 0.14/0.00, min/max reward 0.00/0.98, success rate 0.14

Process Process-2:
Process Process-1:
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/akash/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/home/akash/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
EOFError

Mac

macOS 10.14.5 (18F132)
python 3.6.8
gym 0.13.1
pyglet 1.4.1

(miniWorld1) yukuns-mbp:pytorch-a2c-ppo-acktr [email protected]$ python main.py --algo ppo --num-frames 2000 --num-processes 2 --num-steps 80 --lr 0.00005 --env-name MiniWorld-Hallway-v0
Falling back to non-multisampled frame buffer
Falling back to num_samples=8
Falling back to non-multisampled frame buffer
Falling back to non-multisampled frame buffer
Falling back to num_samples=8
Falling back to non-multisampled frame buffer
Creating frame stacking wrapper
Saving model

Updates 0, num timesteps 160, FPS 53
Last 4 training episodes: mean/median reward 0.98/0.98, min/max reward 0.97/1.00, success rate 1.00

Updates 10, num timesteps 1760, FPS 56
Last 12 training episodes: mean/median reward 0.64/0.94, min/max reward 0.00/1.00, success rate 0.67

Process Process-2:
Process Process-1:
Traceback (most recent call last):
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/[email protected]/Documents/Code/yding5/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
Traceback (most recent call last):
EOFError
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/[email protected]/Documents/Code/yding5/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

@maximecb
Copy link
Contributor

maximecb commented Jul 15, 2019

Hello there!

I added a call to envs.close() in main.py. Please let me know if that makes the error message disappear.

Also, I'm curious to know what kind of research you're doing, and if there are environments/features you would like to see added to MiniWorld :)

@maximecb maximecb added the bug Something isn't working label Jul 15, 2019
@maximecb maximecb self-assigned this Jul 15, 2019
@yding5
Copy link
Author

yding5 commented Jul 15, 2019

Hello, the envs.close() solves the problem nicely. Thanks!

We are currently exploring some agent-based tasks with reinforcement learning and the world model. I found this environment very handy and easy to be modified. One thing might make it even better is more objects available. Maybe some info on how to import more mesh objects. We are still in early-stage and I will for sure update here when we find something important or get a paper written.

BTW, I have a minor question here. I got "Falling back to num_samples=1" consistently. This seems to be something related to the rendering. Does this causing any problem and can I just ignore it?

@maximecb
Copy link
Contributor

Maybe some info on how to import more mesh objects.

Are you looking for info on where to find meshes, or just how to load them into the world? I should indeed document that. Let me know if there's anything else I can help with :)

BTW, I have a minor question here. I got "Falling back to num_samples=1" consistently. This seems to be something related to the rendering. Does this causing any problem and can I just ignore it?

You can mostly just ignore it. It's trying to do anti-aliasing, but that only seems to work if you're running directly on a machine with an nvidia GPU and not using xvfb.

@yding5
Copy link
Author

yding5 commented Jul 15, 2019

For the mesh objects, I mean some info about whether we can import new meshes and how to do it. It might be something like "you can put the mesh file of what format into the mesh folder, and modify which line in the code to import it". Thanks for your help!

@maximecb
Copy link
Contributor

@yding5
Copy link
Author

yding5 commented Jul 19, 2019

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants