-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MULTIARCH-4974: Cluster wide architecture weighted affinity #452
base: main
Are you sure you want to change the base?
MULTIARCH-4974: Cluster wide architecture weighted affinity #452
Conversation
@AnnaZivkovic: This pull request references MULTIARCH-4974 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Prashanth684 and I were also discussing about providing two merging strategies with an additional field for the nodeAffinityScoring plugin, In
|
6777797
to
e39938b
Compare
+1 Nice write up @aleskandro . We should definitely get this recorded in an EP and define the two strategies clearly. In future there might also be room to add more strategies like maybe a strategy where an input variable or even a normalization function is provided by user to influence the scheduling. |
wait I'm confused..the whole reason we thought about the merging strategy was because the nodes with ssd should be considered over nodes without ssd? or is it just purely based on the higher number? |
d3b7b94
to
7c27733
Compare
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ So it will prefer the one with the highest score. We could just append to this list, but we risk unbalancing any predefined user rules |
dcab070
to
ececc41
Compare
ececc41
to
ea9a87b
Compare
c096dfe
to
4cf7fa3
Compare
Expect(err).NotTo(HaveOccurred(), "failed to update ClusterPodPlacementConfig", err) | ||
Expect(ppc.Spec.Plugins).NotTo(BeNil()) | ||
}) | ||
It("appends ClusterPodPlacementConfig node affinity to nil preferred affinities", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is better to put into pkg/e2e/operator
folder
@AnnaZivkovic: This pull request references MULTIARCH-4974 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
4cf7fa3
to
bee9ec9
Compare
0c14a07
to
b1a6a92
Compare
@@ -384,4 +389,226 @@ var _ = Describe("Controllers/Podplacement/PodReconciler", func() { | |||
) | |||
}) | |||
}) | |||
When("The node affinity scoring plugin is enabled", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @AnnaZivkovic is it possible to move this "When"
spec to controllers/operator/clusterpodplacementconfig_controller_test.go
? the test cases in controllers/podplacement/...
suite are running in parallel, the CPPC creation in below BeforeEach(){}
might be run at the same time in different process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test belongs here, Tori has now removed the creation of the CPPC and set the plugin enabled by default in this suite's creation request for it.
/retest |
@@ -384,4 +389,226 @@ var _ = Describe("Controllers/Podplacement/PodReconciler", func() { | |||
) | |||
}) | |||
}) | |||
When("The node affinity scoring plugin is enabled", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test belongs here, Tori has now removed the creation of the CPPC and set the plugin enabled by default in this suite's creation request for it.
a19d54e
to
74e0c57
Compare
By("The pod should have been processed by the webhook and the scheduling gate label should be added") | ||
Eventually(framework.VerifyPodLabels(ctx, client, ns, "app", "test", e2e.Present, schedulingGateLabel), e2e.WaitShort).Should(Succeed()) | ||
|
||
By("The pod should have been set node affinity of arch info.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By("The pod should have been set node affinity of arch info.") | |
By("The pod should have been set preferred node affinity of arch info.") |
pkg/testing/builder/pod.go
Outdated
func (p *PodBuilder) WithPreferredDuringSchedulingIgnoredDuringExecution(opts ...func(*v1.PreferredSchedulingTerm)) *PodBuilder { | ||
p.WithNodeAffinity() | ||
|
||
if p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution == nil { | ||
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution = []v1.PreferredSchedulingTerm{} | ||
} | ||
|
||
// Apply each optional function | ||
for _, opt := range opts { | ||
term := v1.PreferredSchedulingTerm{} | ||
opt(&term) | ||
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution = append( | ||
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution, | ||
term, | ||
) | ||
} | ||
|
||
return p | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (p *PodBuilder) WithPreferredDuringSchedulingIgnoredDuringExecution(opts ...func(*v1.PreferredSchedulingTerm)) *PodBuilder { | |
p.WithNodeAffinity() | |
if p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution == nil { | |
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution = []v1.PreferredSchedulingTerm{} | |
} | |
// Apply each optional function | |
for _, opt := range opts { | |
term := v1.PreferredSchedulingTerm{} | |
opt(&term) | |
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution = append( | |
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution, | |
term, | |
) | |
} | |
return p | |
} | |
func (p *PodBuilder) WithPreferredDuringSchedulingIgnoredDuringExecution(values ...*v1.PreferredSchedulingTerm) *PodBuilder { | |
p.WithNodeAffinity() | |
if p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution == nil { | |
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution = []v1.PreferredSchedulingTerm{} | |
} | |
for i := range values { | |
if values[i] == nil { | |
panic("nil value passed to WithPreferredDuringSchedulingIgnoredDuringExecution") | |
} | |
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution = append( | |
p.pod.Spec.Affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution, *values[i]) | |
} | |
return p | |
} |
can this func just pass *v1.PreferredSchedulingTerm
directly instead of func(*v1.PreferredSchedulingTerm)
?
4d7304a
to
4a88f60
Compare
4a88f60
to
82d76d5
Compare
…terPodPlacementConfig more private
… during scheduling using append method
c69f623
to
b1a90d8
Compare
b1a90d8
to
ee514ed
Compare
@AnnaZivkovic: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@@ -95,6 +95,12 @@ var _ = Describe("The Pod Placement Operand", func() { | |||
), e2e.WaitShort).Should(Succeed()) | |||
By("The pod should have been set node affinity of arch info.") | |||
Eventually(framework.VerifyPodNodeAffinity(ctx, client, ns, "app", "test", *expectedNSTs), e2e.WaitShort).Should(Succeed()) | |||
archLabelNSR = NewNodeSelectorRequirement(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
archLabelNSR = NewNodeSelectorRequirement(). | |
By("The pod should have the expected preferred affinities") | |
archLabelNSR = NewNodeSelectorRequirement(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And so on where you will verify that
@@ -244,6 +260,8 @@ var _ = Describe("The Pod Placement Operand", func() { | |||
Eventually(framework.VerifyPodLabels(ctx, client, ns, "app", "test", e2e.Present, schedulingGateNotSetLabel), e2e.WaitShort).Should(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pod is ignored by the webhook when the required node affinity for architecture is already set. Therefore, the scheduling gate is not set and the preferred node affinity neither.
As we are now having this plugin that sets the preferred node affinity, we should change the behavior in the webhook that let it skip setting the scheduling gate in the pod:
multiarch-tuning-operator/controllers/podplacement/pod_model.go
Lines 329 to 341 in f9402c7
// shouldIgnorePod returns true if the pod should be ignored by the operator. | |
// The operator should ignore the pods in the following cases: | |
// - the pod is in the same namespace as the operator | |
// - the pod is in the kube-* namespace | |
// - the pod has a node name set | |
// - the pod has a node selector that matches the control plane nodes | |
// - the pod is owned by a daemonset | |
func (pod *Pod) shouldIgnorePod() bool { | |
return utils.Namespace() == pod.Namespace || strings.HasPrefix(pod.Namespace, "kube-") || | |
pod.Spec.NodeName != "" || pod.hasControlPlaneNodeSelector() || | |
pod.isNodeSelectorConfiguredForArchitecture() || pod.isFromDaemonSet() | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pod.isNodeSelectorConfiguredForArchitecture() should change to
pod.isNodeSelectorConfiguredForArchitecture() && !cppc.plugins.nodeAffinity.enabled ||
pod.isNodeSelectorConfiguredForArchitecture() && cppc.Plugins.nodeAffinity.enabled && pod.isPreferredAffinityConfiguredForArchitecture()
where isPreferredAffinityConfiguredForArchitecture()
has to be implemented and cppc
comes from the informer cache. The webhook runs in the pod placement controller, so it should already be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would do is to Change this test case such that the deployment has also the preferred affinity set.
After the regression cases are set and succeeded, we will need two other testcases in which the user provides either the required node affinity (only required node affinity is set) or the preferred node affinity (only preferred node affinity is set).
Implementing cluster wide weights using
PreferredDuringSchedulingIgnoredDuringExecution
which stores its items as a list.In the case where we have predefined PreferredSchedulingTerms. We must preserve the weights and avoid unbalancing them with the new arch weights. To do so we can normalize the existing weights using arch weights.
new_weight = 100 * old_weight/ sum(arch weights)
For example
User defined arch weights
The pod yaml would look like the following if there was a pre existing rule
This pr is dependent on #369