Skip to content

AEP-8026: per-vpa-component-configuration #8026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

omerap12
Copy link
Member

What type of PR is this?

/kind documentation
/kind feature
/kind api-change

What this PR does / why we need it:

AEP for #7650

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 12, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omerap12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler labels Apr 12, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 12, 2025
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
@omerap12
Copy link
Member Author

/assign @raywainman

Signed-off-by: Omer Aplatony <[email protected]>
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@omerap12 omerap12 changed the title AEP: per-vpa-component-configuration AEP-8026: per-vpa-component-configuration Apr 18, 2025
Copy link
Contributor

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this @omerap12, thanks for putting this together!

- oomBumpUpRatio
- oomMinBumpUp
- memoryAggregationInterval
- evictAfterOOMThreshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit but I would love to be consistent with the camel-casing of OOM here to match the others, maybe we say evictAfterOomThreshold?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in: a10a0e3

* recommendationMarginFraction
* Other parameters that benefit from workload-specific tuning

### Validation via CEL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we still do some basic validating in the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, e2e tests will be included: 5b89607

## Future Work

This enhancement is designed to be extensible. As the VPA evolves and users provide feedback, additional parameters may be added to the per-object configuration. Each new parameter will:
1. Be documented in this AEP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1 here, if we can keep all of these in the same AEP then I think it becomes a lot easier and more streamlined to add parameters.

One thing that would be nice though is to add a small section for each new parameter we are proposing to add outlining what the parameter does.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. What do you think about that? 978208e

Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
@adrianmoisey
Copy link
Member

Do these API changes need to be feature-gated and go through the process of:

  1. Default off
  2. Default on
  3. GA

I don't think the VPA project really promises a downgrade path, but Kubernetes does. If we use what Kubernetes has defined, then we may need to do this.

@omerap12
Copy link
Member Author

omerap12 commented May 2, 2025

Do these API changes need to be feature-gated and go through the process of:

  1. Default off
  2. Default on
  3. GA

I don't think the VPA project really promises a downgrade path, but Kubernetes does. If we use what Kubernetes has defined, then we may need to do this.

I thought you are on vacation!
But I don't think we need to do the all feature gate process , maybe I'm wrong here..
@raywainman @voelzmo wdyt?

@adrianmoisey
Copy link
Member

I thought you are on vacation!

I'm easing back into open source over the next week or so, and playing catch up.

But I don't think we need to do the all feature gate process , maybe I'm wrong here..

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

In the past it seems as though we loosely adopt what Kubernetes promises (ability to downgrade), but that doesn't seem to be a formal decision.

@omerap12
Copy link
Member Author

omerap12 commented May 2, 2025

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

What does downgrade means in this context? cause you are just adding fields..

@adrianmoisey
Copy link
Member

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

What does downgrade means in this context? cause you are just adding fields..

Something like this is what I'm thinking:

  1. Someone upgrades the VPA to a version including this feature (CRDs and all components upgraded)
  2. They then use one of these fields for a VPA
  3. They downgrade. I assume the CRD is not downgraded... only a VPA component, to a version that doesn't support the new feature

Their previously-set VPA setting (oomBumpUpRatio, for example) no longer works.

It's not a big deal, but it's how Kubernetes rolls out new features.

@omerap12
Copy link
Member Author

omerap12 commented May 3, 2025

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

What does downgrade means in this context? cause you are just adding fields..

Something like this is what I'm thinking:

  1. Someone upgrades the VPA to a version including this feature (CRDs and all components upgraded)
  2. They then use one of these fields for a VPA
  3. They downgrade. I assume the CRD is not downgraded... only a VPA component, to a version that doesn't support the new feature

Their previously-set VPA setting (oomBumpUpRatio, for example) no longer works.

It's not a big deal, but it's how Kubernetes rolls out new features.

I understand. Right now, if someone updates the VPA and sets a different oomBumpUpRatio than the global setting, and then later downgrades, it will go back to the global value.
I can add a feature gate if you think it's needed.

@adrianmoisey
Copy link
Member

I understand. Right now, if someone updates the VPA and sets a different oomBumpUpRatio than the global setting, and then later downgrades, it will go back to the global value.
I can add a feature gate if you think it's needed.

Yeah, the downgrade isn't a big problem with this change.
I think the big question is if we want to align with what Kubernetes does, or if we're happy with not supporting a super clean downgrade.

I think the VPA is a lower risk than Kubernetes, so we could get away not worrying about a downgrade, but, someone may expect the same from the VPA as they do from Kubernetes. So I'm on the fence with this one.
I've added an item to the sig-autoscaling meeting (for the 12th May) to discuss it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants