-
Notifications
You must be signed in to change notification settings - Fork 129
Add control and data plane HPA #3492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @nowjean! Welcome to the project! 🎉 Thanks for opening this pull request! |
✅ All required contributors have signed the F5 CLA for this PR. Thank you! |
I have hereby read the F5 CLA and agree to its terms |
Thank you for you contribution to the project. Please run |
I’ve completed 'make generate-all'. Could you please review my PR? |
172c009
to
d081d68
Compare
So this only affects the control plane, correct? We probably want to support this for the nginx data plane as well (seems like that would be the more beneficial use case). In order to configure deployment options for the data plane, it requires a bit more work, specifically in our APIs and the code itself. The NginxProxy CRD holds the deployment configuration for the nginx data plane, which the control plane uses to configure the data plane when deploying it. Here is a simple example of how we add a new field to the API to allow for configuring these types of deployment fields: #3319. |
I'd also love a more descriptive PR title, as well as a release note in the description so we can include this feature in our release notes :) |
@sjberman Yes, this PR only affects the control plane. Can we also implement HPA for the data plane? AFAIK, the data plane Deployment is created by the NginxProxy CRD, and its name depends on the Gateway's HPA only applies to Deployments with a fixed name, like:
So, I think we can't implement HPA via the Helm chart, especially since data plane and control plane pods are now separated in 2.0. |
@nowjean I updated my comment with a description on how it can be implemented on the data plane side. Glad we're on the same page :) |
Will manual test this PR for both control plane and data plane when we have all the changes :) |
@sjberman @salonichf5 I've pushed my changes to this PR. From my testing, the code correctly applies HPA to both the control plane and data plane. |
Testing applying these HPA for control plane and data plane pods
values.yaml
HPA details
Needed to install the metrics server (enabling insecure TLS) to get metrics for resource memory and should this be communicated to end user about setting additional fields if we want scaling to be active
values.yaml
I saw HPA get configured for control plane pod but i couldn't see one configured for data plane pod. Events from the nginx deployment and logs could normal.
The NginxProxy resource reflects resources value but not
So a couple of observations
What am I doing wrong in terms of testing ? @sjberman @nowjean |
@salonichf5 @sjberman Thanks for testing! Please refer to below guide and review my PR again. I've patched Makefile generate-crds
This option off the description of CRDs. Because, new nginxProxy manifest file occurs
(In my case, I had to upgrade my runc version to build ngf docker images.)
End-users can create multiple Gateways, and each one needs its own HPA, so the logic now lives in the Gateway resource. Plus, I'm not sure about this part:
Normally, we assume that end users already have the Metrics Server running if they're using HPA or similar features. But maybe it's worth adding a note in the docs to avoid confusion. |
c57e992
to
e8399d9
Compare
@nowjean Would you mind updating the PR title to something more descriptive, as well as filling out the release note section in the PR description? |
Thanks for yours review! I think almost done, Please Let me know if you'd like me to squash these 9 commits into one commit. |
@nowjean We'll squash when we merge, so no need to do it now. Just a couple of outstanding comments from me that haven't been addressed yet. |
@sjberman Do I need to something more? |
@nowjean There were still a couple of my comments that haven't been answered/addressed yet. |
Thanks, I've replied all of your comments before, Please let me know any needed action. |
@nowjean I don't see any response to the following: #3492 (comment) |
@sjberman I also replied to all four comments two weeks ago. Can’t you see that? ![]() ![]() ![]() |
@nowjean They're pending, which likely means you responded as a review and didn't submit it. Otherwise I can't see them. |
Hey @nowjean, we looked at this issue during our community meeting. Did you get a chance to review the comments left on this PR? We're excited to get this in the next release. |
@mpstefan Sure! I will push today. |
apis/v1alpha2/nginxproxy_types.go
Outdated
MinReplicas int32 `json:"minReplicas"` | ||
MinReplicas *int32 `json:"minReplicas"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mark as optional
and omitempty
, similar to other optional fields.
@nowjean If you could rebase as well and fix conflicts, that would be great. I'll test this soon. |
@sjberman I rebased upstream main branch and resolved conflicts.
and then,
It was super complicated task for me. Anyway, I think hpa branch rebased successfully and resolved all conflicts. |
@@ -460,6 +466,49 @@ type DaemonSetSpec struct { | |||
Patches []Patch `json:"patches,omitempty"` | |||
} | |||
|
|||
// +kubebuilder:validation:XValidation:message="at least one metric must be specified when autoscaling is enabled",rule="!self.enabled || (has(self.targetCPUUtilizationPercentage) || has(self.targetMemoryUtilizationPercentage) || (has(self.autoscalingTemplate) && size(self.autoscalingTemplate) > 0))" | |||
// +kubebuilder:validation:XValidation:message="minReplicas must be less than or equal to maxReplicas",rule="self.minReplicas <= self.maxReplicas" | |||
// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to use CEL validation. You can just set minimum and maximum directly on the field. See other int values as examples in here.
// +kubebuilder:validation:XValidation:message="at least one metric must be specified when autoscaling is enabled",rule="!self.enabled || (has(self.targetCPUUtilizationPercentage) || has(self.targetMemoryUtilizationPercentage) || (has(self.autoscalingTemplate) && size(self.autoscalingTemplate) > 0))" | ||
// +kubebuilder:validation:XValidation:message="minReplicas must be less than or equal to maxReplicas",rule="self.minReplicas <= self.maxReplicas" | ||
// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)" | ||
// +kubebuilder:validation:XValidation:message="memory utilization must be between 1 and 100",rule="!has(self.targetMemoryUtilizationPercentage) || (self.targetMemoryUtilizationPercentage >= 1 && self.targetMemoryUtilizationPercentage <= 100)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)" | ||
// +kubebuilder:validation:XValidation:message="memory utilization must be between 1 and 100",rule="!has(self.targetMemoryUtilizationPercentage) || (self.targetMemoryUtilizationPercentage >= 1 && self.targetMemoryUtilizationPercentage <= 100)" | ||
// | ||
// HPASpec is the configuration for the Horizontal Pod Autoscaling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put the struct comment above the kubebuilder
comments, same pattern we follow elsewhere.
# - type: Pods | ||
# pods: | ||
# metric: | ||
# name: nginx_gateway_fabric_nginx_process_requests_total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this metric come from?
@@ -432,7 +494,7 @@ nginx: | |||
# - port: 80 | |||
# containerPort: 80 | |||
|
|||
# -- The resource requirements of the NGINX container. | |||
# -- The resource requirements of the NGINX container. You should be set this value, If you want to use dataplane Autoscaling(HPA). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is phrased strangely to me. Is it's intention to say that this value should NOT be set if using autoscaling?
// objects := []client.Object{deployment, daemonSet, service} | ||
|
||
objects := []client.Object{deployment, daemonSet, service} | ||
// // if hpa := p.buildHPA(objectMeta, nProxyCfg); hpa != nil { | ||
// // objects = append(objects, hpa) | ||
// // } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some commented out code here that can be removed.
@@ -1029,6 +1197,7 @@ func (p *NginxProvisioner) buildNginxResourceObjectsForDeletion(deploymentNSName | |||
// serviceaccount | |||
// configmaps | |||
// secrets | |||
// hpa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved after service since that's the order deletion based on the objects slice.
{{- if .Values.nginx.autoscaling.enabled }} | ||
autoscaling: | ||
enabled: {{ .Values.nginx.autoscaling.enabled }} | ||
minReplicas: {{ .Values.nginx.autoscaling.minReplicas }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is an optional field, it needs a conditional so it doesn't render if it doesn't exist
apiVersion: apps/v1 | ||
kind: Deployment | ||
name: {{ include "nginx-gateway.fullname" . }} | ||
minReplicas: {{ .Values.nginxGateway.autoscaling.minReplicas }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is an optional field, it needs a conditional so it doesn't render if it doesn't exist
|
||
return buildNginxDeploymentHPA(objectMeta, nProxyCfg.Kubernetes.Deployment.Autoscaling) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we have an issue with the current way our controller handles external resource updates. We don't allow anyone but our own controller to make changes to the nginx resources. But that's a problem if the HPA wants to scale it and change the number of replicas. Are there other fields that the HPA updates besides replicas?
I'm going to need to think about how we handle this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A workaround solution for now could be that where we set the Replicas value in objects.go for the Deployment, we first check to see if HPA is enabled, and then see if the HPA object already exists (using the k8s client) and get its status.desiredReplicas
field and set that as the Replicas. Then at least our controller will ensure they are the same value.
I'm hoping I can figure out something better in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this works, then in the API and helm values where Replicas is defined, we should update comments to say that the value is ignored if HPA is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the adding a scale sub resource to the NginxProxy is a better long term solution. I think it's standard Kubernetes practice to do this when trying to scale custom setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue I can think of that could be problematic is that if our controller is down for any reason, and the HPA sets the /scale
on the NginxProxy CRD, our controller can't act on it and nginx won't be scaled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait...could we have it update our existing replicas API field? If so, then that actually would be really nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can. We would still need to update the crd to have a scale sub resource, but we can set the path of the field we want to update, like:
specReplicasPath: .spec.kubernetes.deployment.replicas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the scaling on the NginxProxy resource isn't going to work. The NginxProxy resource can be attached to the GatewayClass, and its settings apply to all Gateways for that class. So if you have X amount of Gateways (and therefore X amount of nginx Deployments and HPAs), they may all have different requirements for scaling. They can't all update the same replica field at the GatewayClass level. They'll just be fighting with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah that is true, I guess you'd need one nginx proxy per gateway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nowjean So all that to say I think the solution in my earlier comment using status.desiredReplicas
is probably our option.
Proposed changes
Write a clear and concise description that helps reviewers understand the purpose and impact of your changes. Use the
following format:
Problem: I want NGF to work with a HorizontalPodAutoscaler
Solution: Add HPA for deployement
Testing: Describe any testing that you did.
I've deployed my AKS cluster and checked hpa working correctly.
Closes #3447
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.