-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewritten parallelism to BiocParallel package #104
base: master
Are you sure you want to change the base?
Conversation
- Parallel processing is now controlled by user through a BiocParallel::BiocParallelParam object. This gives more freedom in choosing of parallel backend, as well as it brings a better performance in case a computational cluster is started in the beginning and reused multiple times. This is also case of the sc3() function, which now reuses the same BPPARAM for each of the pipeline's subfunctions. Also, BiocParallel is itself a robust wrapper around the parallel and snow packages, providing an unified interface to backend parallel methods, and comes bundled with powerful logging capabilities (error tracing) and some handy features, such as native progress bar. - Vignette, DESCRIPTION, and NAMESPACE have been updated accordingly. - Some other files were changed due to formatting. This is caused by the usage of devtools package, which tries to unify the code and documentation.
Hi Jiri, many thanks for your request! It all looks awesome and I will try to merge it ASAP, depending on my availability. Thanks again! |
Hi Jiri, I just fetched your changes to my laptop and tried to run the vignette in a Terminal. When I run the first parallelisation it feels super slow, gets stuck at 67% and does not progress further:
It may be a problem with my laptop, but here is my
Do you know what the problem can be? |
Hello Vladimir, I was trying my changes on an HPC server (~80 cores, 1TB RAM) without any problems, so could the performance of your laptop be the problem? Even on that HPC using 32 workers, the whole calculation takes a considerable amount of time. The calculation of distances between cells takes a lot of memory and during parallel processing, the amount of used memory is even higher (basically three times more). Could you try first sequential processing with Just for curiosity, if you would try the original implementation of parallelism using the same number of workers, it would finish flawlessly? |
Hi Jiri, many thanks for your reply! I am on vacation until Friday, will try your suggestions then. Thanks! |
Hi, any updates on this? Soon I will be publishing a package to the Bioconductor, which doesn't allow remote sources (so I can't put my fork to |
Very sorry! I've changed jobs and have absolutely no time to work on this right now. Probably the best is not to rely on me updating it soon... I hope to get to this at some point... Very sorry again. |
Hello,
I have been looking at #83 and decided to rewrite the parallelism to the BiocParallel package. The reasons could be reviewed in the first bullet below. I have successfully tested those modifications on a relatively small dataset (1k PBMC) and also run devtools::check() resulting in zero errors and warnings.
sc3()
function, which now reuses the sameBPPARAM
for each of the pipeline's subfunctions. Also,BiocParallel
is itself a robust wrapper around theparallel
andsnow
packages, providing a unified interface to backend parallel methods, and comes bundled with powerful logging capabilities (error tracing) and some handy features, such as a native progress bar.DESCRIPTION
, andNAMESPACE
have been updated accordingly.devtools
package, which tries to unify the code and documentation.