EOFError with multiprocessing #16

yding5 · 2019-07-12T01:28:58Z

Thanks for your great work. I find this environment very useful for our research. When I tried the training example, I got EOFError whenever the num-processes > 1. The error happens when the training is finished and seems to related to the multiprocessing. The same error seems also happens in a Macbook.

I find some similar problem through search as below, but none of them solve the problem. Could you please take a look at that? Thanks.

duckietown/gym-duckietown#75
openai/baselines#640

Linux

Environment info:
Ubuntu 18.04.2 LTS
python 3.7.3
gym 0.13.0
pyglet 1.2.4

Error message:
(tensorflow_yukun) akash@a1:~/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr$ python main.py --algo ppo --num-frames 4000 --num-processes 2 --num-steps 80 --lr 0.00005 --env-name MiniWorld-Hallway-v0
Falling back to num_samples=1
Falling back to num_samples=1
Falling back to num_samples=1
Falling back to num_samples=1
Creating frame stacking wrapper
Saving model

Updates 10, num timesteps 1760, FPS 61
Last 7 training episodes: mean/median reward 0.14/0.00, min/max reward 0.00/0.96, success rate 0.14

Updates 20, num timesteps 3360, FPS 61
Last 14 training episodes: mean/median reward 0.14/0.00, min/max reward 0.00/0.98, success rate 0.14

Process Process-2:
Process Process-1:
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/akash/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/home/akash/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
EOFError

Mac

macOS 10.14.5 (18F132)
python 3.6.8
gym 0.13.1
pyglet 1.4.1

(miniWorld1) yukuns-mbp:pytorch-a2c-ppo-acktr [email protected]$ python main.py --algo ppo --num-frames 2000 --num-processes 2 --num-steps 80 --lr 0.00005 --env-name MiniWorld-Hallway-v0
Falling back to non-multisampled frame buffer
Falling back to num_samples=8
Falling back to non-multisampled frame buffer
Falling back to non-multisampled frame buffer
Falling back to num_samples=8
Falling back to non-multisampled frame buffer
Creating frame stacking wrapper
Saving model

Updates 0, num timesteps 160, FPS 53
Last 4 training episodes: mean/median reward 0.98/0.98, min/max reward 0.97/1.00, success rate 1.00

Updates 10, num timesteps 1760, FPS 56
Last 12 training episodes: mean/median reward 0.64/0.94, min/max reward 0.00/1.00, success rate 0.67

Process Process-2:
Process Process-1:
Traceback (most recent call last):
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/[email protected]/Documents/Code/yding5/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
Traceback (most recent call last):
EOFError
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/[email protected]/Documents/Code/yding5/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker
cmd, data = remote.recv()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

maximecb · 2019-07-15T14:32:20Z

Hello there!

I added a call to envs.close() in main.py. Please let me know if that makes the error message disappear.

Also, I'm curious to know what kind of research you're doing, and if there are environments/features you would like to see added to MiniWorld :)

yding5 · 2019-07-15T18:44:56Z

Hello, the envs.close() solves the problem nicely. Thanks!

We are currently exploring some agent-based tasks with reinforcement learning and the world model. I found this environment very handy and easy to be modified. One thing might make it even better is more objects available. Maybe some info on how to import more mesh objects. We are still in early-stage and I will for sure update here when we find something important or get a paper written.

BTW, I have a minor question here. I got "Falling back to num_samples=1" consistently. This seems to be something related to the rendering. Does this causing any problem and can I just ignore it?

maximecb · 2019-07-15T20:46:51Z

Maybe some info on how to import more mesh objects.

Are you looking for info on where to find meshes, or just how to load them into the world? I should indeed document that. Let me know if there's anything else I can help with :)

BTW, I have a minor question here. I got "Falling back to num_samples=1" consistently. This seems to be something related to the rendering. Does this causing any problem and can I just ignore it?

You can mostly just ignore it. It's trying to do anti-aliasing, but that only seems to work if you're running directly on a machine with an nvidia GPU and not using xvfb.

yding5 · 2019-07-15T21:24:29Z

For the mesh objects, I mean some info about whether we can import new meshes and how to do it. It might be something like "you can put the mesh file of what format into the mesh folder, and modify which line in the code to import it". Thanks for your help!

maximecb · 2019-07-16T15:01:53Z

There it is: https://github.com/maximecb/gym-miniworld/blob/master/docs/design.md#loading-3d-models

yding5 · 2019-07-19T14:40:18Z

Great, thanks!

maximecb added the bug Something isn't working label Jul 15, 2019

maximecb self-assigned this Jul 15, 2019

maximecb closed this as completed Jul 15, 2019

maximecb mentioned this issue Apr 12, 2020

ERROR: "Connection reset by peer" #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOFError with multiprocessing #16

EOFError with multiprocessing #16

yding5 commented Jul 12, 2019

maximecb commented Jul 15, 2019 •

edited

Loading

yding5 commented Jul 15, 2019

maximecb commented Jul 15, 2019

yding5 commented Jul 15, 2019

maximecb commented Jul 16, 2019

yding5 commented Jul 19, 2019

EOFError with multiprocessing #16

EOFError with multiprocessing #16

Comments

yding5 commented Jul 12, 2019

Linux

Mac

maximecb commented Jul 15, 2019 • edited Loading

yding5 commented Jul 15, 2019

maximecb commented Jul 15, 2019

yding5 commented Jul 15, 2019

maximecb commented Jul 16, 2019

yding5 commented Jul 19, 2019

maximecb commented Jul 15, 2019 •

edited

Loading