-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Name of Feature or Improvement
I'd like to change from a hardcoding of nvidia.com/gpu
to instead having a dict or something of resources. There are other accelerators and it'd be nice to specify them with an arbitrary key/value rather than hardcoding nvidia.com/gpu
Description of Problem the Feature Should Solve
Currently hardcoding nvidia.com/gpu
is suboptimal since there are other accelerators, habana.ai/gaudi
to name one, but there are other potential resources and accelerators, some possibly even not public. It would be a benefit to usability to specify these additional resources without editing the template.
Describe the Solution You Would Like to See
I'd like to see a constructor something like:
cluster = Cluster(ClusterConfiguration(
name='raytest',
namespace='ray-demo',
num_workers=2,
min_cpus=8,
max_cpus=8,
min_memory=12,
max_memory=12,
resources={"habana.ai/gaudi": 1},
image="quay.io/spryor/ray:synapseai-1.13-torch",
instascale=False
))
Which would just add the keys/values from the resources variable into the resources requests/limits section. Perhaps an option to set requests/limits separately something like for splitting, but first pass it's totally fine if it's just requests == limits since for hardware devices it's required they be equal
Describe Alternatives You Have Considered
Some alternative format ideas are maybe something like min_resources and max_resources, or a string format like "someresource": "1/2" for request 1 limit 2, etc.
Additional Context
In this case, the request is around Habana Gaudi devices, but the scope exists beyond that
Activity
anishasthana commentedon Feb 22, 2024
cc @Bobbins228
Bobbins228 commentedon Feb 26, 2024
This sounds like a useful change 👍
KPostOffice commentedon Sep 19, 2024
Solved with #531