-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding predicate ordering design proposal #1152
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# predicates ordering | ||
|
||
|
||
|
||
Status: proposal | ||
|
||
Author: yastij | ||
Approvers: | ||
* gmarek | ||
* bsalamat | ||
* k82cn | ||
|
||
|
||
|
||
|
||
## Abstract | ||
|
||
This document describes how and why reordering predicates helps to achieve performance for the kubernetes scheduler. | ||
We will expose the motivations behind this proposal, The two steps/solution we see to tackle this problem and the timeline decided to implement these. | ||
|
||
|
||
## Motivation | ||
|
||
While working on a [Pull request](https://github.com/kubernetes/kubernetes/pull/50185) related to a proposal, we saw that the order of running predicates isn’t defined. | ||
|
||
This makes the scheduler perform extra-computation that isn’t needed, As an example we [outlined](https://github.com/kubernetes/kubernetes/pull/50185) that the kubernetes scheduler runs predicates against nodes even if marked “unschedulable”. | ||
|
||
Reordering predicates allows us to avoid this problem, by computing the most restrictive predicates first. To do so, we propose two reordering types. | ||
|
||
|
||
|
||
## Static ordering | ||
|
||
This ordering will be the default ordering. If a policy config is provided with a subset of predicates, only those predicates will be invoked using the static ordering. | ||
|
||
|
||
|
||
|
||
|Position | Predicate | comments (note, justification...) | | ||
----------------- | ---------------------------- | ------------------ | ||
| 1 | `CheckNodeConditionPredicate` | we really don’t want to check predicates against unschedulable nodes. | | ||
| 2 | `PodFitsHost` | we check the pod.spec.nodeName. | | ||
| 3 | `PodFitsHostPorts` | we check ports asked on the spec. | | ||
| 4 | `PodMatchNodeSelector` | check node label after narrowing search. | | ||
| 5 | `PodFitsResources ` | this one comes here since it’s not restrictive enough as we do not try to match values but ranges. | | ||
| 6 | `NoDiskConflict` | Following the resource predicate, we check disk | | ||
| 7 | `PodToleratesNodeTaints '` | check toleration here, as node might have toleration | | ||
| 8 | `PodToleratesNodeNoExecuteTaints` | check toleration here, as node might have toleration | | ||
| 9 | `CheckNodeLabelPresence ` | labels are easy to check, so this one goes before | | ||
| 10 | `checkServiceAffinity ` | - | | ||
| 11 | `MaxPDVolumeCountPredicate ` | - | | ||
| 12 | `VolumeNodePredicate ` | - | | ||
| 13 | `VolumeZonePredicate ` | - | | ||
| 14 | `CheckNodeMemoryPressurePredicate` | doesn’t happen often | | ||
| 15 | `CheckNodeDiskPressurePredicate` | doesn’t happen often | | ||
| 16 | `InterPodAffinityMatches` | Most expensive predicate to compute | | ||
|
||
|
||
## End-user ordering | ||
|
||
Using scheduling policy file, the cluster admin can override the default static ordering. This gives administrator the maximum flexibility regarding scheduler behaviour and enables scheduler to adapt to cluster usage. | ||
Please note that the order must be a positive integer, also, when providing equal ordering for many predicates, scheduler will determine the order and won't guarantee that the order will remain the same between them. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if it's clear for everyone so we should mention that updating the Policy requires scheduler restart. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ping this comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is worth noting that point again in this doc, just in case someone is not fully aware of that point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
Finally updating the scheduling policy file will require a scheduler restart. | ||
|
||
as an example the following is scheduler policy file using an end-user ordering: | ||
|
||
``` json | ||
{ | ||
"kind" : "Policy", | ||
"apiVersion" : "v1", | ||
"predicates" : [ | ||
{"name" : "PodFitsHostPorts", "order": 2}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we just follow the order in configuration? so "order" is not necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @k82cn - I'd prefer having a clear ordering here, as using an implicit one might confuse users. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also like the explicit order, but we should be backward compatible. So, absence of "order" should be treated as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bsalamat - also maybe add that if there's a non-logical ordering (e.g. order:1, order:2, order:16) default to static ordering with a warning ? Also I agree that absence of order should be treated as order:0 (i.e. executed at the end & randomly right after the ranked predicates) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is wrong with 1, 2, 16? It shows a clear ordering and should be treated a valid ordering. Yes for the second part. Order:0 means executed at no particular order and after the ordered predicates. If none of the predicates has an order, then should run with the default order. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Meant -16 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, order must be a positive integer. In case of an invalid value (such as a negative number or a string) we should probably fail to start. Users may miss the warning and not notice the issue in their config. Failing to start is in-line with how scheduler reacts to other config issues. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM |
||
{"name" : "PodFitsResources", "order": 3}, | ||
{"name" : "NoDiskConflict", "order": 5}, | ||
{"name" : "PodToleratesNodeTaints", "order": 4}, | ||
{"name" : "MatchNodeSelector", "order": 6}, | ||
{"name" : "PodFitsHost", "order": 1} | ||
], | ||
"priorities" : [ | ||
{"name" : "LeastRequestedPriority", "weight" : 1}, | ||
{"name" : "BalancedResourceAllocation", "weight" : 1}, | ||
{"name" : "ServiceSpreadingPriority", "weight" : 1}, | ||
{"name" : "EqualPriority", "weight" : 1} | ||
], | ||
"hardPodAffinitySymmetricWeight" : 10 | ||
} | ||
``` | ||
|
||
|
||
## Timeline | ||
|
||
* static ordering: GA in 1.9 | ||
* dynamic ordering: TBD based on customer feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal should go in the
scheduling
directory.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bsalamat - indeed