CA DRA: correctly handle Node readiness after scale-up #7780
Labels
area/cluster-autoscaler
area/core-autoscaler
Denotes an issue that is related to the core autoscaler and is not specific to any provider.
wg/device-management
Categorizes an issue or PR as relevant to WG Device Management.
Which component are you using?:
/area cluster-autoscaler
/area core-autoscaler
/wg device-management
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Nodes with custom resources exposed by device plugins (e.g. GPUs) have condition Ready before they actually expose the resources. Cluster Autoscaler has to hack them to be not-Ready until they do expose the resources, otherwise the unschedulable pods don't pack on the Nodes in filter_out_schedulable and CA does another, unnecessary scale-up.
The same happens for DRA resources - until the driver for a given Node publishes its ResourceSlices, the Node is considered Ready but the Pod can't schedule on it, so CA does another scale-up.
Describe the solution you'd like.:
We could re-do the current GPU hack and treat Nodes that should have ResourceSlices exposed but don't as not Ready. We can detect whether a given Node should have ResourceSlices ready by comparing with the template node for its node group.
Alternatively, maybe we could add a new Condition to the Node, specifying whether ResourceSlices have been exposed already? Then CA could just look at the condition instead of correlating with the template node. This seems like a much cleaner solution, but it requires changes in core Kubernetes objects, so not sure how feasible it is.
Additional context.:
This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. An MVP of the support was implemented in #7530 (with the whole implementation tracked in kubernetes/kubernetes#118612). There are a number of post-MVP follow-ups to be addressed before DRA autoscaling is ready for production use - this is one of them.
The text was updated successfully, but these errors were encountered: