feat: support new topologySpread scheduling constraints #852

jmdeal · 2023-12-08T19:28:43Z

Fixes #430

Description
This PR adds support for the following topology spread constraint fields:

matchLabelKeys
nodeAffinityPolicy
nodeTaintsPolicy

How was this change tested?
make test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

coveralls · 2023-12-08T19:43:34Z

Pull Request Test Coverage Report for Build 13139823506

Details

128 of 132 (96.97%) changed or added relevant lines in 10 files are covered.
4 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.08%) to 81.331%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/controllers/provisioning/provisioner.go	13	15	86.67%
pkg/controllers/provisioning/scheduling/topology.go	28	30	93.33%

Files with Coverage Reduction	New Missed Lines	%
pkg/utils/termination/termination.go	2	92.31%
pkg/test/expectations/expectations.go	2	94.81%

Totals
Change from base Build 13127485405:	0.08%
Covered Lines:	9175
Relevant Lines:	11281

💛 - Coveralls

pkg/controllers/provisioning/scheduling/topology_test.go

pkg/controllers/provisioning/scheduling/topologydomaingroup.go

pkg/controllers/provisioning/scheduling/topologygroup.go

jmdeal · 2023-12-13T18:14:20Z

Holding to include matchLabelKeys for pod affinity with k8s v1.29.

hamishforbes · 2024-10-15T20:03:53Z

Hi @jmdeal is there anything blocking this or any way I can help to move this on (testing a custom build or something)?

engedaam · 2024-12-12T17:09:29Z

/assign @njtran

k8s-ci-robot · 2024-12-12T17:09:32Z

@engedaam: GitHub didn't allow me to assign the following users: njtarn.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @njtarn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

engedaam · 2024-12-12T17:10:05Z

/assign @njtran

jonathan-innis · 2025-01-27T06:27:31Z

/unassign njtran

jonathan-innis · 2025-01-27T06:27:36Z

/assign jonathan-innis

jonathan-innis · 2025-01-27T06:29:16Z

Ran this comparison a few times, this was the set of results with the largest difference. Marginally faster overall surprisingly, with a slight decrease in first round scheduled pods (though I noticed this stat varied wildly between runs).

And I think I figured out why! Turns out that we really haven't been doing our scheduling benchmarking correctly in a while! See #1930

k8s-ci-robot · 2025-01-31T23:14:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmdeal
Once this PR has been reviewed and has the lgtm label, please ask for approval from jonathan-innis. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jonathan-innis · 2025-02-03T06:36:40Z

pkg/controllers/provisioning/scheduling/topologygroup.go

-	for domain := range domains {
-		domainCounts[domain] = 0
-	}
+func NewTopologyGroup(topologyType TopologyType, topologyKey string, pod *v1.Pod, namespaces sets.Set[string], labelSelector *metav1.LabelSelector, maxSkew int32, minDomains *int32, taintPolicy *v1.NodeInclusionPolicy, affinityPolicy *v1.NodeInclusionPolicy, domainGroup TopologyDomainGroup) *TopologyGroup {


Suggested change

func NewTopologyGroup(topologyType TopologyType, topologyKey string, pod *v1.Pod, namespaces sets.Set[string], labelSelector *metav1.LabelSelector, maxSkew int32, minDomains *int32, taintPolicy *v1.NodeInclusionPolicy, affinityPolicy *v1.NodeInclusionPolicy, domainGroup TopologyDomainGroup) *TopologyGroup {

func NewTopologyGroup(topologyType TopologyType, topologyKey string, pod *v1.Pod, namespaces sets.Set[string], labelSelector *metav1.LabelSelector, maxSkew int32, minDomains *int32, taintPolicy, affinityPolicy *v1.NodeInclusionPolicy, domainGroup TopologyDomainGroup) *TopologyGroup {

the tiniest nit known to man

I've reformatted to put each argument on it's own line, since this is beginning to wrap off of my screen. Given that, I think leaving both types in is more readable, but let me know what you think. Personally, I prefer always including the types as I find it a little more readable, but I don't feel to strongly.

pkg/controllers/provisioning/scheduling/topologygroup.go

pkg/utils/pod/scheduling.go

pkg/controllers/provisioning/scheduling/topologygroup.go

pkg/controllers/provisioning/scheduling/topologydomaingroup.go

jonathan-innis · 2025-02-03T07:02:26Z

First pass -- I didn't look at everything and have some starting questions

rschalo · 2025-02-03T23:06:08Z

/remove-label blocked

k8s-ci-robot · 2025-02-03T23:06:11Z

@rschalo: The label(s) /remove-label blocked cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-label blocked

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jonathan-innis · 2025-02-04T23:19:17Z

pkg/controllers/provisioning/scheduling/topologynodefilter.go

 	}

 	// otherwise, we need to match the combination of label selector and any term of the required node affinities since
 	// those terms are OR'd together
-	var filter TopologyNodeFilter
+	filter := TopologyNodeFilter{


We don't attach tolerations in this codepath? Seems wrong that we wouldn't attach tolerations here -- also, since there isn't a test failing for this scenario seems like we need to add one

jonathan-innis · 2025-02-04T23:20:31Z

pkg/utils/pod/scheduling.go

@@ -180,9 +180,9 @@ func HasDoNotDisrupt(pod *corev1.Pod) bool {
 	return pod.Annotations[v1.DoNotDisruptAnnotationKey] == "true"
 }

-// ToleratesDisruptedNoScheduleTaint returns true if the pod tolerates karpenter.sh/disrupted:NoSchedule taint
+// ToleratesDisruptionNoScheduleTaint returns true if the pod tolerates karpenter.sh/disruption:NoSchedule=Disrupting taint


This doesn't look right -- I think we are still using karpenter.sh/disrupted:NoSchedule

jonathan-innis · 2025-02-04T23:21:39Z

pkg/controllers/provisioning/scheduling/topologydomaingroup.go

+	"sigs.k8s.io/karpenter/pkg/scheduling"
+)
+
+// TopologyDomainGroup tracks the domains for a single topology. Additionally, it tracks the taints associated with


It just tracks the eligible domains, right?

jonathan-innis · 2025-02-04T23:24:05Z

pkg/controllers/provisioning/scheduling/topologydomaingroup.go

+func (t TopologyDomainGroup) Insert(domain string, taints ...v1.Taint) {
+	// Note: This could potentially be improved by removing any set of which the new set of taints is a proper subset.
+	// Currently this is only handled when the incoming set is the empty set.
+	if _, ok := t[domain]; !ok || len(taints) == 0 {


It still took me a second to parse this -- it might be worth pushing that comment down below up here or maybe add a second one -- just because I didn't get it until I saw the comment down below

jonathan-innis · 2025-02-04T23:28:16Z

pkg/controllers/provisioning/scheduling/topologydomaingroup.go

+
+// ForEachToleratedDomain calls f on each domain tracked by the TopologyDomainGroup which are also tolerated by the provided pod.
+func (t TopologyDomainGroup) ForEachToleratedDomain(pod *v1.Pod, f func(domain string)) {
+	for domain, taintGroups := range t {


Consider marking down that we could potentially improve this code by just storing the domains that map to certain cached tolerations -- small improvement but maybe worth doing eventually

pkg/controllers/provisioning/scheduling/topologygroup.go

jonathan-innis · 2025-02-05T01:05:27Z

pkg/controllers/provisioning/scheduling/topology.go

@@ -278,6 +278,28 @@ func (t *Topology) countDomains(ctx context.Context, tg *TopologyGroup) error {
 		pods = append(pods, podList.Items...)
 	}

+	// capture new domain values from existing nodes that may not have any pods selected by the topology group
+	// scheduled to them already
+	t.cluster.ForEachNode(func(n *state.StateNode) bool {


Noting down that we have plans to move this over into domainGroups since domainGroups should really capture all of the domains that are eligible across the cluster (including NodePools) rather than just the NodePools (which is what they are doing right now)

jonathan-innis · 2025-02-05T01:06:59Z

pkg/controllers/provisioning/scheduling/topology.go

@@ -278,6 +278,28 @@ func (t *Topology) countDomains(ctx context.Context, tg *TopologyGroup) error {
 		pods = append(pods, podList.Items...)
 	}

+	// capture new domain values from existing nodes that may not have any pods selected by the topology group


After some discussion with @jmdeal, we should consider passing state nodes directly into this function rather than using the t.cluster.ForEach and holding a read lock throughout the time

jonathan-innis · 2025-02-05T01:07:21Z

pkg/controllers/provisioning/scheduling/topology.go

+			return true
+		}
+		// ensure we at least have a count of zero for this potentially new topology domain
+		if _, countExists := tg.domains[domain]; !countExists {


nit: Since this is more standard

Suggested change

if _, countExists := tg.domains[domain]; !countExists {

if _, ok := tg.domains[domain]; !ok {

jonathan-innis · 2025-02-05T01:08:23Z

pkg/controllers/provisioning/scheduling/topology.go

@@ -323,7 +345,17 @@ func (t *Topology) countDomains(ctx context.Context, tg *TopologyGroup) error {
 func (t *Topology) newForTopologies(p *corev1.Pod) []*TopologyGroup {
 	var topologyGroups []*TopologyGroup
 	for _, cs := range p.Spec.TopologySpreadConstraints {
-		topologyGroups = append(topologyGroups, NewTopologyGroup(TopologyTypeSpread, cs.TopologyKey, p, sets.New(p.Namespace), cs.LabelSelector, cs.MaxSkew, cs.MinDomains, t.domains[cs.TopologyKey]))
+		for _, key := range cs.MatchLabelKeys {
+			if value, ok := p.ObjectMeta.Labels[key]; ok {


Suggested change

if value, ok := p.ObjectMeta.Labels[key]; ok {

if value, ok := p.Labels[key]; ok {

super super minor nit

jonathan-innis · 2025-02-05T01:10:19Z

pkg/controllers/provisioning/scheduling/topology.go

@@ -323,7 +345,17 @@ func (t *Topology) countDomains(ctx context.Context, tg *TopologyGroup) error {
 func (t *Topology) newForTopologies(p *corev1.Pod) []*TopologyGroup {
 	var topologyGroups []*TopologyGroup
 	for _, cs := range p.Spec.TopologySpreadConstraints {


Suggested change

for _, cs := range p.Spec.TopologySpreadConstraints {

for _, tsc := range p.Spec.TopologySpreadConstraints {

this is also minor but why not?

jonathan-innis · 2025-02-05T01:24:50Z

pkg/controllers/provisioning/scheduling/topology_test.go

+		It("should balance pods across a label when discovered from the provisioner (NodeTaintsPolicy=ignore)", func() {
+			const spreadLabel = "fake-label"
+			const taintKey = "taint-key"
+			nodePool.Spec.Template.Spec.Requirements = append(nodePool.Spec.Template.Spec.Requirements, v1.NodeSelectorRequirementWithMinValues{


For the tests that are validating schedulability of a pod against a nodepool or node, it might be worth it to add taints and tolerations for the nodepool/node that we are able to schedule to as well so that we ensure that we are exercising the tolerates checks that we have in the topology code

I could also see an argument that this is tested below but it probably doesn't hurt -- I could go either way

jonathan-innis · 2025-02-05T01:25:32Z

pkg/controllers/provisioning/scheduling/topology_test.go

+			// should fail to schedule both pods, one pod is scheduled to domain "foo" but the other can't be scheduled to domain "bar"
+			ExpectSkew(ctx, env.Client, "default", &topology[0]).To(ConsistOf(1))
+		})
+		It("should balance pods across a label when discovered from the provisioner (NodeTaintsPolicy=honor)", func() {


Suggested change

It("should balance pods across a label when discovered from the provisioner (NodeTaintsPolicy=honor)", func() {

It("should balance pods across a label when discovered from the NodePool (NodeTaintsPolicy=honor)", func() {

consider replacing the use of the word provisioner throughout this file with NodePool :P we got some legacy wording that got kept around due to how long this has been open

jonathan-innis · 2025-02-05T01:28:05Z

pkg/controllers/provisioning/scheduling/topology_test.go

+			ExpectApplied(ctx, env.Client, nodePools[0], nodePools[1])
+			ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pods...)
+
+			// Expect 3 total nodes provisioned, 2 pods schedule to foo, 1 to bar, and 1 to baz


nit: consider adding a bit more to this comment (1 pod that tolerated the nodepool-1 taint on fot, 1 that tolerated the nodepool-1 taint on bar, etc.

jonathan-innis · 2025-02-05T01:29:54Z

pkg/controllers/provisioning/scheduling/topology_test.go

+			ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pods...)
+
+			// should schedule all pods to domain "foo", ignoring bar since pods don't tolerate
+			ExpectSkew(ctx, env.Client, "default", &topology[0]).To(ConsistOf(2))


What about a test for a set of pods that have two different tolerations and two different NodePools with mutually exclusive taints?

jonathan-innis · 2025-02-05T01:31:37Z

pkg/controllers/provisioning/scheduling/topology_test.go

+			ExpectReconcileSucceeded(ctx, nodeStateController, client.ObjectKeyFromObject(node2))
+
+			ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov)
+			ignore := corev1.NodeInclusionPolicyHonor


nit: This naming is wrong from the copy

jonathan-innis · 2025-02-05T01:34:14Z

This is awesome work! This is definitely getting close! I think mostly a few small things -- we should write-down things that we want to refactor here since there were some ideas thrown out about how we could move existing nodes into topologyGroups and avoid iterating through stateNodes in countDomains -- also some things around naming of functions -- also also capturing nodeAffinities in eligible domains for topologyDomainGroups

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 8, 2023

k8s-ci-robot requested review from engedaam and jackfrancis December 8, 2023 19:28

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 8, 2023

jmdeal changed the title ~~feat: support new topologySpread scheduling constraints~~ [WIP] feat: support new topologySpread scheduling constraints Dec 8, 2023

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 8, 2023

jmdeal force-pushed the support-affinity-and-taint-policy branch from aed9f49 to 54af319 Compare December 8, 2023 19:34

jmdeal force-pushed the support-affinity-and-taint-policy branch 7 times, most recently from 4329131 to f0127f2 Compare December 9, 2023 01:46

tzneal reviewed Dec 9, 2023

View reviewed changes

pkg/controllers/provisioning/scheduling/topology_test.go Outdated Show resolved Hide resolved

tzneal reviewed Dec 9, 2023

View reviewed changes

pkg/controllers/provisioning/scheduling/topologydomaingroup.go Outdated Show resolved Hide resolved

tzneal reviewed Dec 9, 2023

View reviewed changes

pkg/controllers/provisioning/scheduling/topologygroup.go Outdated Show resolved Hide resolved

jmdeal force-pushed the support-affinity-and-taint-policy branch 3 times, most recently from b23a605 to 0408162 Compare December 11, 2023 21:59

jmdeal changed the title ~~[WIP] feat: support new topologySpread scheduling constraints~~ feat: support new topologySpread scheduling constraints Dec 11, 2023

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2023

jmdeal force-pushed the support-affinity-and-taint-policy branch from 0408162 to 6673b5b Compare December 11, 2023 23:14

jmdeal changed the title ~~feat: support new topologySpread scheduling constraints~~ [WIP] feat: support new topologySpread scheduling constraints Dec 13, 2023

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 13, 2023

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2023

jmdeal mentioned this pull request Dec 14, 2023

docs: remove topologySpreadConstraints compatibility note aws/karpenter-provider-aws#5336

Closed

3 tasks

jmdeal force-pushed the support-affinity-and-taint-policy branch from dae9cfd to 1d46f34 Compare August 15, 2024 23:17

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2024

jmdeal force-pushed the support-affinity-and-taint-policy branch from 1d46f34 to 7b3827f Compare August 28, 2024 17:01

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2024

k8s-ci-robot assigned njtran Dec 12, 2024

k8s-ci-robot unassigned njtran Jan 27, 2025

k8s-ci-robot assigned jonathan-innis Jan 27, 2025

tzneal and others added 4 commits January 31, 2025 14:17

support node affinity and taints policies for topology spread

4fb929f

correctly handle domains from NodePools when honoring taints

38792fa

support matchLabelKeys for topology spread

79ce198

inject expression, rather than overwrite label

01de6ac

jmdeal force-pushed the support-affinity-and-taint-policy branch from 5958693 to 01de6ac Compare January 31, 2025 23:14

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 31, 2025

jonathan-innis reviewed Feb 3, 2025

View reviewed changes

remaining feedback

a00643d

jmdeal force-pushed the support-affinity-and-taint-policy branch from 09e5867 to a00643d Compare February 4, 2025 16:12

jonathan-innis reviewed Feb 4, 2025

View reviewed changes

jonathan-innis reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support new topologySpread scheduling constraints #852

feat: support new topologySpread scheduling constraints #852

jmdeal commented Dec 8, 2023 •

edited

Loading

coveralls commented Dec 8, 2023 •

edited

Loading

jmdeal commented Dec 13, 2023

hamishforbes commented Oct 15, 2024

engedaam commented Dec 12, 2024 •

edited

Loading

k8s-ci-robot commented Dec 12, 2024

engedaam commented Dec 12, 2024

jonathan-innis commented Jan 27, 2025

jonathan-innis commented Jan 27, 2025

jonathan-innis commented Jan 27, 2025

k8s-ci-robot commented Jan 31, 2025

jonathan-innis Feb 3, 2025

jmdeal Feb 4, 2025

jonathan-innis commented Feb 3, 2025

rschalo commented Feb 3, 2025

k8s-ci-robot commented Feb 3, 2025

jonathan-innis Feb 4, 2025

jonathan-innis Feb 4, 2025

jonathan-innis Feb 4, 2025

jonathan-innis Feb 4, 2025

jonathan-innis Feb 4, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis Feb 5, 2025

jonathan-innis commented Feb 5, 2025

	func NewTopologyGroup(topologyType TopologyType, topologyKey string, pod v1.Pod, namespaces sets.Set[string], labelSelector metav1.LabelSelector, maxSkew int32, minDomains int32, taintPolicy v1.NodeInclusionPolicy, affinityPolicy v1.NodeInclusionPolicy, domainGroup TopologyDomainGroup) TopologyGroup {
	func NewTopologyGroup(topologyType TopologyType, topologyKey string, pod v1.Pod, namespaces sets.Set[string], labelSelector metav1.LabelSelector, maxSkew int32, minDomains int32, taintPolicy, affinityPolicy v1.NodeInclusionPolicy, domainGroup TopologyDomainGroup) *TopologyGroup {

	if _, countExists := tg.domains[domain]; !countExists {
	if _, ok := tg.domains[domain]; !ok {

	if value, ok := p.ObjectMeta.Labels[key]; ok {
	if value, ok := p.Labels[key]; ok {

	for _, cs := range p.Spec.TopologySpreadConstraints {
	for _, tsc := range p.Spec.TopologySpreadConstraints {

	It("should balance pods across a label when discovered from the provisioner (NodeTaintsPolicy=honor)", func() {
	It("should balance pods across a label when discovered from the NodePool (NodeTaintsPolicy=honor)", func() {

feat: support new topologySpread scheduling constraints #852

Are you sure you want to change the base?

feat: support new topologySpread scheduling constraints #852

Conversation

jmdeal commented Dec 8, 2023 • edited Loading

coveralls commented Dec 8, 2023 • edited Loading

Pull Request Test Coverage Report for Build 13139823506

Details

💛 - Coveralls

jmdeal commented Dec 13, 2023

hamishforbes commented Oct 15, 2024

engedaam commented Dec 12, 2024 • edited Loading

k8s-ci-robot commented Dec 12, 2024

engedaam commented Dec 12, 2024

jonathan-innis commented Jan 27, 2025

jonathan-innis commented Jan 27, 2025

jonathan-innis commented Jan 27, 2025

k8s-ci-robot commented Jan 31, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan-innis commented Feb 3, 2025

rschalo commented Feb 3, 2025

k8s-ci-robot commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan-innis commented Feb 5, 2025

jmdeal commented Dec 8, 2023 •

edited

Loading

coveralls commented Dec 8, 2023 •

edited

Loading

engedaam commented Dec 12, 2024 •

edited

Loading