-
Notifications
You must be signed in to change notification settings - Fork 723
Support accelerator directive for local executor #5850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for nextflow-docs-staging canceled.
|
modules/nextflow/src/main/groovy/nextflow/processor/LocalPollingMonitor.groovy
Outdated
Show resolved
Hide resolved
@bentsherman The feature implemented in this PR would really help us use all the local GPUs without dealing with scheduling tasks on them manually. Do you know when this feature will be released? Or is it even planned? |
Right now I just put it out so that people can try it out, so I encourage you to try it with a local build of this PR. In principle we do want to have this, just haven't decided whether it should be part of |
b4b321e
to
069653d
Compare
Hi @bentsherman or @pditommaso , I finally got some time to try this out. However, I was not able to compile nextflow from source. I used the following steps:
The error I get is as below:
I'm not really sure, why I receive the 403 status code during download. Do you have any ideas to fix this? I would really want to try out this feature on our local GPU machines. |
Hi @bentsherman or @pditommaso, I'd be happy if you could have a look at this PR: |
07b6a01
to
d1f1a8a
Compare
Hi @bentsherman or @pditommaso, could you please also have a look at this PR: It proposes a fix to respect gpuIDs set in |
d1f1a8a
to
ec6a888
Compare
ec6a888
to
b88d058
Compare
@thealanjason thank you again, you actually inspired me to improve the overall approach and make it more generic. I removed the @pditommaso I think this PR is ready for serious consideration. Using |
modules/nextflow/src/main/groovy/nextflow/processor/LocalPollingMonitor.groovy
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great that now NVIDIA, AMD, and HIP devices can be handled generically :)
b88d058
to
90d1422
Compare
Signed-off-by: Ben Sherman <[email protected]>
90d1422
to
28bff07
Compare
Signed-off-by: Ben Sherman <[email protected]>
Just commenting to say this would help on SO many of cloud compute deployments. |
@ECM893 do you typically use the local executor in cloud for GPUs? If so I'm curious what your process looks like |
Yes. |
When a pipeline runs multiple GPU-enabled tasks on the same node, each task will see all GPUs and will not try to coordinate which task should use which GPU.
NVIDIA provides the
CUDA_VISIBLE_DEVICES
environment variable to control which tasks can see which GPUs, and users generally have to manage this variable themselves. Some HPC schedulers can assign this variable automatically, or use cgroups to control GPU visibility at a lower level.Nextflow should be able to manage this variable for the local executor, so that the user doesn't have to add complex pipeline logic to do the same. Running a GPU workload locally on a multi-GPU node is a common use case, so it is worth doing.
See the docs in the PR for usage.
To use with containers, you might have to add
CUDA_VISIBLE_DEVICES
todocker.envWhitelist
.CUDA_VISIBLE_DEVICES
works with containers or if you have to setNVIDIA_VISIBLE_DEVICES
.--gpus
for the docker command in order to use the GPUs at all, that can be set indocker.runOptions
See also: #5570