KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555

esotsal · 2025-09-21T18:21:24Z

One-line PR description: Create new KEP 5554: In place update pod resources alongside static cpu manager policy

Issue link: Support In place update pod resources alongside static cpu manager policy #5554

Other comments:

esotsal · 2025-09-21T18:50:14Z

/cc @natasha41575 @tallclair @pravk03 @Chunxia202410 @ffromani

k8s-ci-robot · 2025-09-21T18:50:20Z

@esotsal: GitHub didn't allow me to request PR reviews from the following users: Chunxia202410.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @natasha41575 @tallclair @pravk03 @Chunxia202410 @ffromani

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

esotsal · 2025-10-08T14:30:33Z

/assign @deads2k For PRR

It will make things easier for the PRR reviewer if we try to get as many of the open comments addressed as possible prior to PRR freeze

Thanks, i tried to get as many as i could. The comments about Deferred didn't had time to complete, in general i agree with the comment, just need more time to describe it properly, i don't think it will be ready by tomorrow.

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

deads2k · 2025-10-13T14:47:28Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+automations, so be extremely careful here.
+-->
+
+No.


would resize requests that previously failed start succeeding?

sorry, that one is my bad: #5555 (comment). I didn't realize that resize requests that previously failed starting to succeeded was what was meant by default behavior changing

@esotsal please feel free to change it back to what you had originally

Absolutely! Acknowledged will update accordingly within today’s business day.

sorry, that one is my bad: #5555 (comment). I didn't realize that resize requests that previously failed starting to succeeded was what was meant by default behavior changing

It's generally a new field, so no one can have latent data that suddenly starts being acted upon. Since this field already exists and could have data not being acted upon, when the kubelet restarts it could suddenly start being acted upon, so it should be noted.

Ack, done. PTAL

deads2k · 2025-10-13T14:48:25Z

some PRR questions/clarifications, but the PRR looks good otherwise.

ffromani

The general direction LGTM.
Mostly inline clarifications questions.
I don't think we should detail so much the implementation in the KEP, and I think is fair to refine the implementation at code review stage once the KEP is approved.
That said, I'm supportive overall, need to review the metrics and the observability side, still pending.
I'm confident I can LGTM once the comments are addressed.

ffromani · 2025-10-14T09:33:22Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+ * Topology manager policies can be adhered to.
+ * CPUs allocated upon creation of Guaranteed QoS pods are maintained.
+
+This proposal will leverage [KEP-5328 (Node Declared Features framework)](https://github.com/pravk03/enhancements/tree/9de1ba74b4b71883f0ceabf8580d5d817bda36d4/keps/sig-node/5328-node-declared-features) ( if implemented ), to overcome feature rollout challenges with [Version Skew Strategy](#version-skew-strategy):


I'm completely fine with this approach, but this would mean that we have two keps dependent on each other to be delivered in the same cycle. It can be nontrivial to get through. I'm just highlighting a challenge here, again I'm fine with this approach.

Ack, i think it is worth trying since it will make cleaner the implementation.

Since this KEP is postponed , I think your concerns thing’s being stressed between the two KEPs should be relaxed now. Please let me know what you think if you are ok to resolve this conversation.

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

ffromani · 2025-10-14T10:40:18Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+* must keep “promised” CPUs of a running container of a Guaranteed Pod CPUs.
+  + With the term “promised” we mean the CPUs allocated upon creation of the Guaranteed Pod checkpointed in CPU Managers checkpoint file, please refer to [Promised CPUs checkpoint](#promised-cpus-checkpoint) for more details of the proposed implementation using local storage.
+* attempt of allocating additional CPUs should adhere to the combination of Topology/Memory/CPU/Device/kubelet reservation policies
+  + All 40 possible combinations should work


Exploring the full combination matrix is surely the most comprehensive option but I'm really fairly confident that strict-cpu-reservation should be safely elided with some targeted sentinel e2e tests. This alone halves the test combinations.

ffromani · 2025-10-14T10:59:36Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+### What happens on CPU Resize failures
+
+When the topology manager cannot generate a topology hint which satisfies the topology manager policy, the pod resize is marked as [Deferred](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#resize-status). This means that while the node theoretically has enough CPU for the resize, it's not currently possible but can be re-evaluated later, potentially becoming feasible as pod churn frees up capacity.


It is indeed important to doublecheck but at glance the topology manager admit handler should reasonnably ready thanks to its fundamental property of being stateless. I think we will maybe need to review/refine the error reporting, but the basic flow should be ok

ffromani · 2025-10-14T11:01:12Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+Consider including folks who also work outside the SIG or subproject.
+-->
+
+Bugs in handling CPU resizing can result in incorrect CPU pinning affecting performance of the pod being resized and also the other pods on the node.


resizes, especially downsize, cause resource fragmentation which will make future resource requests harder to accomodate, including new admissions.
(unrelated to risks?) we may need to review the allocation algorithm in general to better tolerate fragmentation. Random thought example: defaulting to distributed allocation (vs current default packed allocation) MAY make resizes easier.

We should include the risk of a Pod remaining in a deferred state indefinitely in scenarios where topology or policy constraints would never allow the allocation, or provide a mitigation strategy for such cases.

We should include the risk of a Pod remaining in a deferred state indefinitely in scenarios where topology or policy constraints would never allow the allocation, or provide a mitigation strategy for such cases.

Such resizes should not be marked as deferred, they should be marked as infeasible: #5555 (comment)

Ack, good point. Done, PTAL

ffromani · 2025-10-14T11:08:40Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+Please refer to https://github.com/kubernetes/kubernetes/pull/129719/files# for details. 
+
+With the goal the Implementation to be specific to cpu manager implementation modifies the following files


this is welcome but too low level. You don't need to call out the source code file, and IMHO this backfires a little because ties the KEP design, which should be timeless as possible, to a certain state of the source code more than necessary. If we are short in time, and/or if other reviewers are not bothered, you can keep it like this, but please consider a more high-level/less coupled implementation description for the future.

I agree with the above suggestion. KEPs are ideally more stable/long-lived than code. If the code layout changes the KEP should not become stale.

+1 that laying out the exact source code structure file-by-file is likely to become stale. To offer a little more guidance, the more clear "implementation" sections that I've seen usually describe the behavior of how the request moves through the different components (in this case allocation manager, topology/cpu manager etc), and high level descriptions of what each of those components do and their interactions with each other.

But agree that this is nonblocking and maybe can be fixed in the beta update if there is not enough time now

Ack, i think we are short in time, i will try to update

Will update KEP , describing the implementation behavior and remove links to source code Since we have more time now , will keep this conversation unresolved and ping you when done to take another look.

ffromani · 2025-10-14T11:10:55Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+type getPromisedCPUSetError struct {                                                                                                                                                                                 
+        PodUID        string                                                                                                                                                                                         
+        ContainerName string                                                                                                                                                                                         
+}                                                                                                                                                                                                                    


Again this is too low level to be codified in the KEP. we should leave ourselves more freedom at the implementation stage to how actually code the algorithm and the design we outline in the KEP.
As general direction seem sensible, but a separate code review will be needed later anyway.

Ack, , i think we are short in time, i will try to update

Will update KEP , describing the implementation behavior and remove links to source code Since we have more time now , will keep this conversation unresolved and ping you when done to take another look

ffromani · 2025-10-14T11:12:21Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+For example ( not fully decided, to be concluded in Beta )
+
+"cpu_manager_alognside_ippvs_pinning_errors_total". 


I'd avoid acronyms in metrics. What about for example

cpu_manager_inplace_resize_pinning_errors_total

?

Ack, done, as per Natashas comment below, i have removed metrics test at this stage. We can take this in beta updates later since metrics will be graduation criteria.

ffromani

finished reviews, PTAL to one comment about using cpu_manager_state (we should not). Other than that, my previous summary applies.

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

ffromani · 2025-10-14T11:55:00Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+Usage metrics for better observability of this KEP will be implemented and could be used.
+
+For example ( not fully decided, to be concluded in Beta )


I'm fine deferring this decision to beta or to alpha2 (and PRR seems to be fine as well)

Ack, done. PTAL

Can you please @ffromani check if this applies also for #5555 (comment) ?

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

swatisehgal · 2025-10-14T15:23:38Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+Please refer to https://github.com/kubernetes/kubernetes/pull/129719/files# for details. 
+
+With the goal the Implementation to be specific to cpu manager implementation modifies the following files


I agree with the above suggestion. KEPs are ideally more stable/long-lived than code. If the code layout changes the KEP should not become stale.

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

swatisehgal · 2025-10-14T15:41:49Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+* pull-kubernetes-node-kubelet-serial-cpu-manager
+* pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
+
+At the time of writing please check


It would be good to capture more detail around the scenarios that will be tested. For instance:

Successful resize when CPU request/limit is updated from 2 to 4.

Resize failure when CPU < promised CPUs.

Essentially, these could mirror the cases you’ve illustrated in the diagram under the “Use Cases for CPU Resize” section.

Ack, done but will try to update more as per Natashas comments as well below.

For clarity The link to the running tests from the existing PR which I have added , actually do plenty of tests 73 tests have been added ( 193 total from 123 ) for this KEP ( will be more ). Some being redoes existing IPRR e2e tests alongside CPU Manager ( to ensure it doesn’t break existing functionality ) , multi pod container scale up, scale down , negative tests with burstable / guaranteed, promised , testing Infeasible resizes ie asking large number of CPUs and Deferred . In my plans would be to add tests with multiple pod , creating deleting in later phases . I will follow KEP template on this and add them described with reasoning.

I didn’t had time to explicitly write all current 73 of them , i assumed a link to the proof of the running tests would be sufficient. My fault . Now since time issue is solved postponing this KEP to v1.36 will update in detail.

swatisehgal · 2025-10-14T15:47:22Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/kep.yaml

+feature-gates:
+  - name: InPlacePodVerticalScalingExclusiveCPUs
+    components:
+      - kubelet
+disable-supported: true


Please check if we need to capture feature gates that InPlacePodVerticalScalingExclusiveCPUs depends on e.g. feature gates corresponding to in-place Resource Resize, CPU Manager, Topology manager here for the purpose of tracking.

Can you please elaborate, I don’t understand this comment. Can you please add a suggestion perhaps ?

swatisehgal · 2025-10-14T15:54:24Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+### What happens on CPU Resize failures
+
+When the topology manager cannot generate a topology hint which satisfies the topology manager policy, the pod resize is marked as [Deferred](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#resize-status). This means that while the node theoretically has enough CPU for the resize, it's not currently possible but can be re-evaluated later, potentially becoming feasible as pod churn frees up capacity.


Marking the Pod as deferred makes sense when the request could eventually be satisfied on the node once resources free up. But how should we handle cases where topology or policy constraints would never permit the allocation (e.g., multiple NUMA zones or a CPU request exceeding a single NUMA node’s capacity)?

This is sort of what I had in the back of my mind when I asked this question: #5555 (comment)

For Deferred resizes, the KEP should provide examples of the ways that resources can be freed up later so that the resize can succeed. Deferred is not an appropriate condition to use if we know that the resize will never succeed.

For nonrecoverable constraints, the KEP should enumerate them and explicitly state that they are marked as Infeasible. Infeasible resizes are never retried.

This is a good point. Thinking more about it, deferred seems most likely when a pod is Terminating, so we know these resources would be soon be free and taking those into account THEN we can satisfy an allocation.
But the topology manager currently doesn't do speculative allocation (what-if scenarios), it only operates on a snapshot in time of the pod state.

I for myself I'm not against exploring this path, but this alone is probably KEP worthy, because of scope.
If we consider only a node state snapshot, then the feasibility state is binary yes/no, with no "maybe" (deferred) I can think of. This is a result of the fact that the resource managers DO NOT do resource compaction, because it would violate the pinning/static guarantees.

Thinking about the future, is however possible to explore more integration with the eviction manager and/or with the descheduler to get controller-driven resource compaction.

Circling back to the original point, I think that it would be beneficial to point more clearly on which scenarios we would expect Deferred status, as it seems harder to achieve.

Yes , I agree I will add more details on all possible cases I can think off , with a suggestion of status ( Infeasible / Deferred ) with reasoning of why.

Just want to highlight , because I get the impression that there is a feeling that this KEP is complicated, it is not so complicated as it seems, I believe.

I understand the concerns , i would recommend to check the accompanying PR and the tests and see that actually things are less complicated.

I gave it a thought for the Deffered present future and I have an idea how to test this on the PR with a multi pod test. I will add the test code to the PR and describe in the KEP tests section. Hopefully this will make things clearer.

I am relieved that this point was raised before the v1.35 deadline and that we all agreed upon postponing this KEP to v1.36.

I strongly prefer what @ffromani brought up as an option below: #5555 (comment)

Specifically:

I think that IF we limit the scope of this work to only cpumanager enablement, THEN we will have mostly Unfeasible responses which can be turned into Deferred later once topology manager enhancement is done. This is my own reading, I don't know if this is the intended/desired outcome longterm.

To me this sounds like a much better path forward. I'd rather the KEP just say that all errors from topology manager will be treated as Infeasible. We can change them to Deferred later as a follow-up if there is a use case.

Just want to highlight , because I get the impression that there is a feeling that this KEP is complicated, it is not so complicated as it seems, I believe.

It is complicated, but only because you have added a proposal for a new state for Deferred. Almost all of my concerns go away if we just stick to Infeasible rather than introducing new Deferred modes.

I gave it a thought for the Deffered present future and I have an idea how to test this on the PR with a multi pod test. I will add the test code to the PR and describe in the KEP tests section. Hopefully this will make things clearer.

Having tests doesn't address the concerns I brought up in #5555 (comment). The KEP is still not explaining or designing reasonable behavior that can work well with the higher level controllers that this feature is designed to empower. If these concerns about Deferred are not addressed nor is Deferred removed from the scope of the KEP, I am not sure that I feel confident that this will make it to 1.36 either. If we keep Deferred in the scope, we will have to consider how that affects capacity requests.

The KEP is still not explaining or designing reasonable behavior that can work well with the higher level controllers that this feature is designed to empower. If these concerns about Deferred are not addressed nor is Deferred removed from the scope of the KEP, I am not sure that I feel confident that this will make it to 1.36 either. If we keep Deferred in the scope, we will have to consider how that affects capacity requests.

+1 , concerns will be addressed and KEP will be updated taking into account higher level controllers that this feature is designed to empower including impact in capacity requests. Thanks @swatisehgal , @pravk03 @natasha41575 and @ffromani for creating awareness on this important requirement, which was missed from my side.

Hi checking the history, possible use cases the proposal to drop Deferred from this kep and keep all errors from topology manager as Infeasible sounds reasonable, at least for alpha and see how the discussion evolves. I will update the commit and the KEP with this, but i plan to do this in v1.36 circle due to other priorities i have, aiming January to push the update on time for v1.36 deadlines.

swatisehgal · 2025-10-14T15:58:46Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+Please refer to [Prerequisite testing updates](#prerequisite-testing-updates) for more details.
+
+![alt_text](./use_cases_for_cpu_resize.png "Use Cases for CPU Resize")


This is very helpful and has clarified the proposal for me. It could be useful to include a table of “expected outcomes” for common cases like: small CPU increase, large CPU increase, CPU request decrease, and static policy scenarios with multiple NUMA zones. It would also be valuable to document what guarantees (or lack thereof) exist for existing workloads.

+1, will try to create such a table.

swatisehgal · 2025-10-14T16:02:39Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+Consider including folks who also work outside the SIG or subproject.
+-->
+
+Bugs in handling CPU resizing can result in incorrect CPU pinning affecting performance of the pod being resized and also the other pods on the node.


We should include the risk of a Pod remaining in a deferred state indefinitely in scenarios where topology or policy constraints would never allow the allocation, or provide a mitigation strategy for such cases.

swatisehgal · 2025-10-14T16:12:33Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+* Ensure that the QoS class of the pod does not change during resize. 
+* Preserve the properties of the configured Topology Manager, CPU Manager during resize.
+* Support CPU limit increase without container restart keeping originally configured CPU values (during pod creation on the Node)


It might be worth mentioning here, or elsewhere in the KEP, that since we are discussing the CPU Manager static policy for Guaranteed QoS Pods, CPU request equals limit, and this equality should be maintained when the Pod is resized.

Perhaps clarify in goals or nongoals that we want parity in all other limitations that IPPR currently has

Ack done

For the comment about the risk, i think this use case is applicable also to general In place update pod resources. Today in such case mitigation is resize to be reverted by the user, as far as i know. Added in the risks section.

PTAL

natasha41575 · 2025-10-14T16:30:48Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+For example ( not fully decided, to be concluded in Beta )
+
+- "cpu_manager_alognside_ippvs_pinning_errors_total". 
+- "cpu_manager_alognside_ippvs_requests_total". 


I think it would be better to leave out the example metrics for now (here and elsewhere). Metrics will need review from sig-instrumentation and I think if you leave it here now, it might distract the conversation from the things that need to be nailed down for alpha.

Ack, done. PTAL

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

natasha41575 · 2025-10-14T16:41:48Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+Please refer to https://github.com/kubernetes/kubernetes/pull/129719/files# for details. 
+
+With the goal the Implementation to be specific to cpu manager implementation modifies the following files


+1 that laying out the exact source code structure file-by-file is likely to become stale. To offer a little more guidance, the more clear "implementation" sections that I've seen usually describe the behavior of how the request moves through the different components (in this case allocation manager, topology/cpu manager etc), and high level descriptions of what each of those components do and their interactions with each other.

But agree that this is nonblocking and maybe can be fixed in the beta update if there is not enough time now

natasha41575 · 2025-10-14T16:43:02Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+- a search in the Kubernetes bug triage tool (https://storage.googleapis.com/k8s-triage/index.html)
+-->
+
+- N/A


Provide a brief justification (e.g. all cases are covered by unit and e2e tests)

Ack, done. PTAL

natasha41575 · 2025-10-14T16:48:44Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+For Static CPU Management Policy support the following end to end tests must be introduced or updated : 
+
+* Under pkg/kubelet/test/e2e_node


It is not sufficient (and IMO not necessary) to list where the e2e tests will be written and running. The e2e test plan should describe what behavior we are testing. It can be at a high level, but we need to be able to cross check desired coverage compared to actual coverage when it comes time for beta / ga promotion.

I think the pod level resource kep is a good example for something to start.

Ack, i will take a look and try to update. I think we are short in time but will give it a try.

For clarity please check also my response #5555 (comment)

The particularity of this KEP is that it was first suggested to be a fix so lot of discussion about tests were done already in the PR.

For historical reasons want to remind that IPPR e2e node tests started from this PR , after refactored and also the need was highlighted to community to refactor CPU manager tests, so there is good reason why the location is mentioned . It is because is the only test location which can perform this KEPs tests.

Having now more time after the decision to postpone this KEP to v1.36 , i will update the contents for this section to improve clarity. Will keep this comment unresolved and ping you when ready for review.

natasha41575 · 2025-10-14T16:52:29Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+* User feedback (ideally from at least two distinct users) is green,
+* 2 examples of real-world usage
+* 2 installs


We are targeting 1.36 for beta and also setting these as beta graduation requirements. I am a little skeptical that we are going to be able to get 2 examples of real-world usage and "allow time for feedback" (as stated in the line below) for a feature that is planned to be in alpha-stage for just a couple of months before we have to make the decision to promote to beta. I feel like we are setting ourselves up for failure.

@SergeyKanzhelev @tallclair do you have general sig node guidance for this?

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

natasha41575

There are still many open / unresolved discussion from my initial review from two weeks ago; please make sure to address those too.

The most important ones:

natasha41575 · 2025-10-14T18:17:51Z

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md

+
+Resource fragmentation could potentially caused, after pod resizes, especially downsize. Such fragementation could result that future resource requests could be harder to accommodate, including new admissions. In such case a possible mitigation would be in the future to review the allocation algorithm in general to better tolerate fragmentation.
+
+Pod could remain in a deferred state indefinitely in scenarios where topology or policy constraints would never allow the allocation, and it is not possible to predict this in order to set in an Infeasible state. In such case, the condition's `message` will include details of why the resize has not been admitted a possible mitigation would be the resize attempt to be reverted by the user.


Why is it not possible to predict this? @swatisehgal gave a few scenarios in her comment #5555 (comment), what makes it impossible for the topology / cpu manager to detect such scenarios?

For such scenarios, we should explicitly list the ones we can think of and if (and this is a big if) it is impossible to make the distinction, we should explain why the limitation is there.

cc @swatisehgal @ffromani @tallclair

I want to avoid setting the precedent of just using "Deferred" as the backup for all potential failures in scenarios where we know the resize can't succeed. "Deferred" sends a message to the user and autoscalers built on top of IPPR (such as VPA) that the resize can be retried if room is made. Autoscalers are then going to evict or scale down other pods to make room for the resize, but the resize is never going to succeed. This is hugely disruptive and undesirable.

I don't want to end up in a scenario where VPA etc have to learn how to distinguish between Deferred (impossible) or Deferred (can succeed later). That's what Infeasible is for. A human user having to step in and revert the resize is not a reasonable mitigation IMO, because it doesn't work at scale.

Topology manager involves memory manager , device on top of manager as well. This KEP intention is only CPU, if we want to make such an effort then it will require topology manager modifications which as agreed we’re out of scope, I believe.

Maybe I am missing something, but I am not sure that I follow distinguishing between deferred and infeasible cases means we'd be affecting the memory manager side of things? We would still have a check to make sure that resizes that resize memory along with static memory manager policy are marked infeasible.

if we want to make such an effort then it will require topology manager modifications which as agreed we’re out of scope

When / where did we decide topology manager modifications would be out of scope? That seems like an unrealistic expectation for a KEP that is proposing changes to its behavior...

@swatisehgal use case is applicable possible also as it is today’s InPlacePodVerticalScaling functionality. without exclusive CPUs It could be possible to happen and the risk exists to stick to a deferred state. Reading 1287 KEP my understanding is that only way to get out of this is to revert the resize , thus it is up to the user or the autoscaler algorithm to handle this. Same applies to this KEP , I believe.

Maybe all the Infeasible cases can be statically determined without requiring changes in the topology manager itself ?.

Some cases that I could think of.

Topology manager policy is single-numa-node and we are requesting more CPUs than what a NUMA node contains.

CPU Decrease below initial allocation (< promised_CPUs).

CPU manager policy option is full-pcpus-only, HT is enabled and we are scaling up/down to odd number of CPUs.

To me, it feels like there aren't many Infeasible cases and we could do static checks in canResizePod( ) before attempting hint generation.

We can incorporate the similar checks for memory when we introduce IPPR support with static memory policy.

How is this handled at pod creation time? Would these infeasible states cause the pod to be rejected? Or would the Kubelet just make some best-effort attempt at alignment?

How is this handled at pod creation time?

This KEP does not modify existing handling during Pod creation time ( except check pointing PromisedCPUs ). Do you have a specific use case you have in mind which we need to consider?

This KEP does not modify existing handling during Pod creation time ( except check pointing PromisedCPUs ). Do you have a specific use case you have in mind which we need to consider?

I think @tallclair is asking what is the behavior today, independent of this KEP / IPPR.

For example, if at pod creation time, topology manager policy is single-numa-node and we are requesting more CPUs than what a NUMA node contains, what does the kubelet do? Does it outright reject the pod?

In the above case, the pod remains pending because of Kubelet admission failure.

In this case the cpumanager would not be able to create a Hint which is both Preferred=True and with just a single NUMA node set; the topology manager will cause the pod to fail admission with TopologyAffinityError. A single pod will thus end in Failed state and if the pod is managed by a controller (e.g. Deployment) we will end up in the well known kubernetes/kubernetes#84869

esotsal · 2025-10-14T18:58:05Z

Thanks for the reviews , i got the last hours plenty which I don’t think I have time to answer all / update the KEP within 24hours ie 1.35 deadline 16th October. Please indicate if you believe your comments are blocking for 1.35 , to move the milestone to 1.36 for alpha.

natasha41575 · 2025-10-14T22:09:35Z

Thanks for the reviews , i got the last hours plenty which I don’t think I have time to answer all / update the KEP within 24hours ie 1.35 deadline 16th October. Please indicate if you believe your comments are blocking for 1.35 , to move the milestone to 1.36 for alpha.

If by this, you mean you don’t intend to address the open comments for 1.35 deadline, then I’ll just share that I don’t feel good about giving this KEP my LGTM as it currently is. Of course @ffromani’s has the final judgment on the details regarding cpu / topology manager side of things, but on the IPPR side, there are still many open discussions.

The biggest one, and the one that I feel should be blocking, is around deferred / infeasible resize handling:

Deferred and Infeasible mean something, and are acted on by controllers differently. It is important that we at least try to get it right, or document the limitations. What I was hoping that the outcome of these comments would be:

Better explanation of why the “Deferred” cases are deferred, how they might succeed in the future, and how the kubelet should know when is the right time to try again.
A list of scenarios where we know the resize would never succeed, with those resizes being marked as “Infeasible” (or a sufficient explanation as to why we would not be able to distinguish infeasible cases from deferred cases).
More clarity on what your “error types” are for, because it is not obvious how they fit into the design.

Besides that, there are a few other open discussions, listing just a few of them here:

Comment on the beta graduation criteria: KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555 (comment)
My ask to clarify how the topology manager is going to make the resize allocation decision: KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555 (comment)
The e2e test section does not adequately explain all the tests that need to be covered: KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555 (comment)
Comment that the implementation section is too low level: KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555 (comment)

Maybe individually some of these don’t need to block, but all put together it gives me a general feeling that not everything has been thought through sufficiently.

esotsal · 2025-10-15T02:48:38Z

If by this, you mean you don’t intend to address the open comments for 1.35 deadline,

Yes , this is what ii mean. I am short in time to answer all those comments by tomorrow, i anticipate that an exception request would not be sufficient either. We share the same opinion , let’s postpone for v1.36 , to have time to answer the comments and improve this KEP. Informed assigned KEP wrangler #5554 (comment) about this.

ffromani · 2025-10-15T13:23:22Z

Thanks for the reviews , i got the last hours plenty which I don’t think I have time to answer all / update the KEP within 24hours ie 1.35 deadline 16th October. Please indicate if you believe your comments are blocking for 1.35 , to move the milestone to 1.36 for alpha.

If by this, you mean you don’t intend to address the open comments for 1.35 deadline, then I’ll just share that I don’t feel good about giving this KEP my LGTM as it currently is. Of course @ffromani’s has the final judgment on the details regarding cpu / topology manager side of things, but on the IPPR side, there are still many open discussions.

Considered the last comments in the last few hours I agree is better to push to 1.36 and clarify the intended use cases and if we need a scope change.
[EDIT] but if there's a consensus forming around how to move forward and we "just" need to iron some details (but it doesn't seem the case), I'm also willing to push through a deadline extension and help from my side.

esotsal · 2025-10-16T07:32:31Z

[EDIT] but if there's a consensus forming around how to move forward and we "just" need to iron some details (but it doesn't seem the case), I'm also willing to push through a deadline extension and help from my side.

Thanks @ffromani , i think there is a consensus that we should think of it more thoroughly and update the KEP to improve clarity. I think one release postpone is needed and worth it, I don’t see it as an analysis paralysis situation, we are just in ❤️ with the problem not the solution, as we should 😁.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 21, 2025

k8s-ci-robot requested review from dchen1107 and jeremyrickard September 21, 2025 18:21

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 21, 2025

esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch from 4c5c393 to 1240d58 Compare September 21, 2025 18:24

esotsal changed the title ~~KEP 5554: In place update pod resources alongside static cpu manager policy~~ KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation Sep 21, 2025

esotsal mentioned this pull request Sep 21, 2025

Support In place update pod resources alongside static cpu manager policy #5554

Open

7 tasks

esotsal changed the title ~~KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation~~ [WIP] KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation Sep 21, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 21, 2025

k8s-ci-robot requested review from ffromani, natasha41575, pravk03 and tallclair September 21, 2025 18:50

esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch from 1240d58 to 8973b16 Compare September 26, 2025 10:59

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 26, 2025

esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch 10 times, most recently from 24bfb5c to a6c437b Compare September 26, 2025 17:19

esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch 9 times, most recently from 9812852 to b3065a4 Compare October 8, 2025 14:23

Chunxia202410 reviewed Oct 9, 2025

View reviewed changes

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md Outdated Show resolved Hide resolved

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md Show resolved Hide resolved

esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch from b3065a4 to 2608740 Compare October 9, 2025 13:21

deads2k reviewed Oct 13, 2025

View reviewed changes

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md Show resolved Hide resolved

deads2k reviewed Oct 13, 2025

View reviewed changes

ffromani reviewed Oct 14, 2025

View reviewed changes

swatisehgal reviewed Oct 14, 2025

View reviewed changes

natasha41575 reviewed Oct 14, 2025

View reviewed changes

keps/sig-node/5554-in-place-update-pod-resources-alongside-static-cpu-manager-policy/README.md Outdated Show resolved Hide resolved

KEP-5554: Initial KEP creation

af67a05

esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch from 2608740 to af67a05 Compare October 14, 2025 18:05

natasha41575 reviewed Oct 14, 2025

View reviewed changes

natasha41575 mentioned this pull request Oct 15, 2025

[FG:InPlacePodVerticalScaling] When resize container, before CanAdmitPod, need to check if the node avaliable resources are sufficient kubernetes/kubernetes#134581

Closed

Chunxia202410 mentioned this pull request Oct 24, 2025

[FG:InPlacePodVerticalScaling] Performance degradation in latency-sensitive services due to CPU affinity loss upon guaranteed QoS Pod scaling down kubernetes/kubernetes#131309

Open


		### What happens on CPU Resize failures

		When the topology manager cannot generate a topology hint which satisfies the topology manager policy, the pod resize is marked as [Deferred](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#resize-status). This means that while the node theoretically has enough CPU for the resize, it's not currently possible but can be re-evaluated later, potentially becoming feasible as pod churn frees up capacity.


		Please refer to https://github.com/kubernetes/kubernetes/pull/129719/files# for details.

		With the goal the Implementation to be specific to cpu manager implementation modifies the following files


		For example ( not fully decided, to be concluded in Beta )

		"cpu_manager_alognside_ippvs_pinning_errors_total".


		Usage metrics for better observability of this KEP will be implemented and could be used.

		For example ( not fully decided, to be concluded in Beta )


		Please refer to [Prerequisite testing updates](#prerequisite-testing-updates) for more details.

		![alt_text](./use_cases_for_cpu_resize.png "Use Cases for CPU Resize")


		For Static CPU Management Policy support the following end to end tests must be introduced or updated :

		* Under pkg/kubelet/test/e2e_node


		Resource fragmentation could potentially caused, after pod resizes, especially downsize. Such fragementation could result that future resource requests could be harder to accommodate, including new admissions. In such case a possible mitigation would be in the future to review the allocation algorithm in general to better tolerate fragmentation.

		Pod could remain in a deferred state indefinitely in scenarios where topology or policy constraints would never allow the allocation, and it is not possible to predict this in order to set in an Infeasible state. In such case, the condition's `message` will include details of why the resize has not been admitted a possible mitigation would be the resize attempt to be reverted by the user.

KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555

Are you sure you want to change the base?

KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555

Conversation

esotsal commented Sep 21, 2025

Uh oh!

esotsal commented Sep 21, 2025

Uh oh!

k8s-ci-robot commented Sep 21, 2025

Uh oh!

esotsal commented Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k commented Oct 13, 2025

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esotsal Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esotsal Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

esotsal Oct 16, 2025 •

edited

Loading

esotsal Oct 16, 2025 •

edited

Loading

natasha41575 Oct 14, 2025 •

edited

Loading

esotsal Oct 16, 2025 •

edited

Loading