CA DRA: correctly handle Node readiness after scale-up #7780

towca · 2025-01-29T18:15:38Z

Which component are you using?:

/area cluster-autoscaler
/area core-autoscaler
/wg device-management

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Nodes with custom resources exposed by device plugins (e.g. GPUs) have condition Ready before they actually expose the resources. Cluster Autoscaler has to hack them to be not-Ready until they do expose the resources, otherwise the unschedulable pods don't pack on the Nodes in filter_out_schedulable and CA does another, unnecessary scale-up.

The same happens for DRA resources - until the driver for a given Node publishes its ResourceSlices, the Node is considered Ready but the Pod can't schedule on it, so CA does another scale-up.

Describe the solution you'd like.:

We could re-do the current GPU hack and treat Nodes that should have ResourceSlices exposed but don't as not Ready. We can detect whether a given Node should have ResourceSlices ready by comparing with the template node for its node group.

Alternatively, maybe we could add a new Condition to the Node, specifying whether ResourceSlices have been exposed already? Then CA could just look at the condition instead of correlating with the template node. This seems like a much cleaner solution, but it requires changes in core Kubernetes objects, so not sure how feasible it is.

Additional context.:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. An MVP of the support was implemented in #7530 (with the whole implementation tracked in kubernetes/kubernetes#118612). There are a number of post-MVP follow-ups to be addressed before DRA autoscaling is ready for production use - this is one of them.

k8s-ci-robot added area/cluster-autoscaler area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. labels Jan 29, 2025

mbrow137 added this to SIG Autoscaler: Dynamic Resource Allocation Feb 4, 2025

github-project-automation bot moved this to 🆕 New in SIG Autoscaler: Dynamic Resource Allocation Feb 4, 2025

mbrow137 moved this from 🆕 New to 🏗 In Progress in SIG Autoscaler: Dynamic Resource Allocation Feb 4, 2025

mbrow137 moved this from 🏗 In Progress to 🆕 New in SIG Autoscaler: Dynamic Resource Allocation Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA DRA: correctly handle Node readiness after scale-up #7780

CA DRA: correctly handle Node readiness after scale-up #7780

towca commented Jan 29, 2025 •

edited

Loading

CA DRA: correctly handle Node readiness after scale-up #7780

CA DRA: correctly handle Node readiness after scale-up #7780

Comments

towca commented Jan 29, 2025 • edited Loading

towca commented Jan 29, 2025 •

edited

Loading