-
-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Utilization of GPU #10
Comments
Well, I believe you could swap out some of the Just some ideas! |
|
I have 32 thread dual xeon with dual 1080 ti and cpu only 100& single thread, 2 others around 50, 2 others under 20 and rest under 10 percent, so low load on cpu super low, gpus are one constantly at 0 and other max 1 percent..:-) |
Same here. how to fully utilized with multi-GPU usage to train the agents. |
@archenroot have you tried increasing the parallelism? Increase n_jobs in optimize.py to a value equal to the number of cores you have, and it should increase utilization. |
@TalhaAsmal - well, it doesn't work at least from 10, i tried even 64 :-) as reported in other issue, there is then problem with concurrency assess into sqlite. On the other hand I replaced sqlite with Postgres engine, but optuna doesn't at moment support (there is PR for it already waiting to merge) custom parameters (pool_size, etc.), so SQLAlchemy is failing as well on default config with Postgres, once optuna has merged new PR, we can achieve this heavy parallelizm, but it doesn't work as of now.... After 2 days, at around 400 trials finished with 4 threads :D (brutal race), also lot of those are PRUNED as marked at early stages as unpromising... will se what config I will get ... I think another 2-3 days... |
@TalhaAsmal - I will try today evening install custom optuna branch with requested fix for custom driver params with Postgres. |
@archenroot did you manage to try it with the custom optuna branch? I also ran into concurrency issues with sqlite, but since I have a very old CPU (2600k) I just reduced the parallelism to 2, with obvious negative consequences for speed. |
Here are why:
In my own experiments with Atari games, using SubprocVecEnv improved performance 200% ~ 250%. |
@dennywangtenk - sure I used the SubproceVecEnv, but did you test yourself before writing here?, some other things get broken and also sqlite is not storage for concurrency access, some other db yet doesn't work as Optuna doesn't have custom config (there is PR, but not released, you an build yourself...) |
@archenroot , I got similar error, it seems due to both optuna and baselines SubproceVecEnv are both using multiprocessing has some conflicts. set n_jobs = 1, force optuna on sequential. |
@dennywangtenk does it actually make sense to set n_jobs to 1 and switch to |
I am now at |
This library achieves very high success rates, though it takes a very long time to optimize and train. This could be improved if we could figure out a way to utilize the GPU more during optimization/training, so the CPU can be less of a bottleneck. Currently, the CPU is being used for most of the intermediate environment calculations, while the GPU is used within the PPO2 algorithm during policy optimization.
I am currently optimizing/training on the following hardware:
The bottleneck on my system is definitely the CPU, which is surprising as this library takes advantage of the multi-threaded benefits of the Threadripper, and my GPU is staying around 1-10% utilization. I have some ideas on how this could be improved, but would like to start a conversation.
Increase the size of the policy network (i.e. increase the number of hidden layers or increase the number of nodes in each layer)
Do less work in each training loop, so the GPU loop is called more often.
I would love to hear what you guys think. Any ideas or knowledge is welcome to be shared here.
The text was updated successfully, but these errors were encountered: