Skip to content

Add control and data plane HPA #3492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Add control and data plane HPA #3492

wants to merge 10 commits into from

Conversation

nowjean
Copy link

@nowjean nowjean commented Jun 11, 2025

Proposed changes

Write a clear and concise description that helps reviewers understand the purpose and impact of your changes. Use the
following format:

Problem: I want NGF to work with a HorizontalPodAutoscaler

Solution: Add HPA for deployement

Testing: Describe any testing that you did.
I've deployed my AKS cluster and checked hpa working correctly.

Closes #3447

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Add HPA for control and data plane deployment. The Data plane HPA using NginxProxy CRD resource.

Copy link

nginx-bot bot commented Jun 11, 2025

Hi @nowjean! Welcome to the project! 🎉

Thanks for opening this pull request!
Be sure to check out our Contributing Guidelines while you wait for someone on the team to review this.

@nginx-bot nginx-bot bot added the community label Jun 11, 2025
Copy link
Contributor

github-actions bot commented Jun 11, 2025

✅ All required contributors have signed the F5 CLA for this PR. Thank you!
Posted by the CLA Assistant Lite bot.

@github-actions github-actions bot added the helm-chart Relates to helm chart label Jun 11, 2025
@nowjean
Copy link
Author

nowjean commented Jun 11, 2025

I have hereby read the F5 CLA and agree to its terms

@sjberman sjberman moved this from 🆕 New to External Pull Requests in NGINX Gateway Fabric Jun 12, 2025
@salonichf5
Copy link
Contributor

Thank you for you contribution to the project. Please run make generate-all from root directory to auto generate schema files.

@nowjean nowjean requested a review from a team as a code owner June 14, 2025 01:31
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 14, 2025
@nowjean
Copy link
Author

nowjean commented Jun 14, 2025

Thank you for you contribution to the project. Please run make generate-all from root directory to auto generate schema files.

I’ve completed 'make generate-all'. Could you please review my PR?

@nowjean nowjean force-pushed the hpa branch 2 times, most recently from 172c009 to d081d68 Compare June 14, 2025 07:20
@sjberman
Copy link
Collaborator

sjberman commented Jun 17, 2025

So this only affects the control plane, correct? We probably want to support this for the nginx data plane as well (seems like that would be the more beneficial use case).

In order to configure deployment options for the data plane, it requires a bit more work, specifically in our APIs and the code itself. The NginxProxy CRD holds the deployment configuration for the nginx data plane, which the control plane uses to configure the data plane when deploying it.

Here is a simple example of how we add a new field to the API to allow for configuring these types of deployment fields: #3319.

@sjberman sjberman added the enhancement New feature or request label Jun 17, 2025
@sjberman
Copy link
Collaborator

I'd also love a more descriptive PR title, as well as a release note in the description so we can include this feature in our release notes :)

@nowjean
Copy link
Author

nowjean commented Jun 17, 2025

@sjberman Yes, this PR only affects the control plane. Can we also implement HPA for the data plane?

AFAIK, the data plane Deployment is created by the NginxProxy CRD, and its name depends on the Gateway's name parameter. This means we can't assume a fixed name in values.yaml.

HPA only applies to Deployments with a fixed name, like:

scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: my-app

So, I think we can't implement HPA via the Helm chart, especially since data plane and control plane pods are now separated in 2.0.
Do you have any suggestions or recommended approaches for enabling HPA on the data plane?

@sjberman
Copy link
Collaborator

@nowjean I updated my comment with a description on how it can be implemented on the data plane side. Glad we're on the same page :)

@salonichf5
Copy link
Contributor

Thank you for you contribution to the project. Please run make generate-all from root directory to auto generate schema files.

I’ve completed 'make generate-all'. Could you please review my PR?

Will manual test this PR for both control plane and data plane when we have all the changes :)

@github-actions github-actions bot removed the enhancement New feature or request label Jun 20, 2025
@nowjean
Copy link
Author

nowjean commented Jun 20, 2025

@sjberman @salonichf5 I've pushed my changes to this PR.
This is my first time working with Golang and creating CRDs, so I’d appreciate your review.

From my testing, the code correctly applies HPA to both the control plane and data plane.
To enable this, please set nginx.autoscaling.enabled: true and configure nginx.container.resources accordingly.
dataplane
controlplane

@salonichf5
Copy link
Contributor

salonichf5 commented Jun 20, 2025

Testing applying these HPA for control plane and data plane pods

  1. Control plane pod

values.yaml

  autoscaling:
    # Enable or disable Horizontal Pod Autoscaler
    enabled: true
    annotations: {
       example.com/test: "true" 
    }
    minReplicas: 2
    maxReplicas: 4
    targetCPUUtilizationPercentage: 60
    targetMemoryUtilizationPercentage: 70
    behavior: 
      scaleUp:
        stabilizationWindowSeconds: 100
        policies:
        - type: Pods
          value: 2
          periodSeconds: 20
      scaleDown:
        stabilizationWindowSeconds: 200
        policies:
        - type: Pods
          value: 1
          periodSeconds: 20
  autoscalingTemplate: 
  - type: Pods
    pods:
      metric:
        name: nginx_gateway_fabric_nginx_process_requests_total
      target:
        type: AverageValue
        averageValue: 10000m

HPA details

k describe horizontalpodautoscalers.autoscaling -n nginx-gateway ngf-nginx-gateway-fabric
Name:                                                     ngf-nginx-gateway-fabric
Namespace:                                                nginx-gateway
Labels:                                                   app.kubernetes.io/instance=ngf
                                                          app.kubernetes.io/managed-by=Helm
                                                          app.kubernetes.io/name=nginx-gateway-fabric
                                                          app.kubernetes.io/version=edge
                                                          helm.sh/chart=nginx-gateway-fabric-2.0.1
Annotations:                                              example.com/test: true
                                                          meta.helm.sh/release-name: ngf
                                                          meta.helm.sh/release-namespace: nginx-gateway
CreationTimestamp:                                        Fri, 20 Jun 2025 14:16:18 -0600
Reference:                                                Deployment/ngf-nginx-gateway-fabric
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  <unknown> / 70%
  resource cpu on pods  (as a percentage of request):     <unknown> / 60%
Min replicas:                                             2
Max replicas:                                             4
Behavior:
  Scale Up:
    Stabilization Window: 100 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 20 seconds
  Scale Down:
    Stabilization Window: 200 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 1  Period: 20 seconds
Deployment pods:    2 current / 2 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Normal   SuccessfulRescale             11m                 horizontal-pod-autoscaler  New size: 2; reason: Current number of replicas below Spec.MinReplicas
  Warning  FailedGetResourceMetric       9m5s (x8 over 10m)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
  Warning  FailedComputeMetricsReplicas  9m5s (x8 over 10m)  horizontal-pod-autoscaler  invalid metrics (2 invalid out of 2), first error is: failed to get memory resource metric value: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
  Warning  FailedGetResourceMetric       50s (x41 over 10m)  horizontal-pod-autoscaler  failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

Needed to install the metrics server (enabling insecure TLS) to get metrics for resource memory and ScalingActive is false since i did not set memory request.

should this be communicated to end user about setting additional fields if we want scaling to be active

  1. Auto scaling set for both control plane and data plane pods

values.yaml


# NGINX Gateway Fabric

  resources:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "500m"
      memory: "256Mi"

  autoscaling:
    # Enable or disable Horizontal Pod Autoscaler
    enabled: true
    annotations: {
       example.com/test: "true" 
    }
    minReplicas: 2
    maxReplicas: 4
    targetCPUUtilizationPercentage: 60
    targetMemoryUtilizationPercentage: 70
    behavior: 
      scaleUp:
        stabilizationWindowSeconds: 100
        policies:
        - type: Pods
          value: 2
          periodSeconds: 20
      scaleDown:
        stabilizationWindowSeconds: 200
        policies:
        - type: Pods
          value: 1
          periodSeconds: 20
  autoscalingTemplate: 
  - type: Pods
    pods:
      metric:
        name: nginx_gateway_fabric_nginx_process_requests_total
      target:
        type: AverageValue
        averageValue: 10000m
        


#  NGINX Deployment
  autoscaling:
    # Enable or disable Horizontal Pod Autoscaler
    enabled: true
    hpaAnnotations: {
        cafe.example.com/test: "true"
    }
    minReplicas: 2
    maxReplicas: 4
    targetCPUUtilizationPercentage: 40
    targetMemoryUtilizationPercentage: 30
    behavior:
      scaleDown:
        stabilizationWindowSeconds: 100
        policies:
        - type: Pods
          value: 3
          periodSeconds: 20
      scaleUp:
        stabilizationWindowSeconds: 50
        policies:
        - type: Pods
          value: 1
          periodSeconds: 20



  container:
    # -- The resource requirements of the NGINX container. You should be set this value, If you want to use dataplane Autoscaling(HPA).
    resources:
      requests:
        cpu: 500m
        memory: 2Gi
k describe horizontalpodautoscalers.autoscaling -n nginx-gateway ngf-nginx-gateway-fabric
Name:                                                     ngf-nginx-gateway-fabric
Namespace:                                                nginx-gateway
Labels:                                                   app.kubernetes.io/instance=ngf
                                                          app.kubernetes.io/managed-by=Helm
                                                          app.kubernetes.io/name=nginx-gateway-fabric
                                                          app.kubernetes.io/version=edge
                                                          helm.sh/chart=nginx-gateway-fabric-2.0.1
Annotations:                                              example.com/test: true
                                                          meta.helm.sh/release-name: ngf
                                                          meta.helm.sh/release-namespace: nginx-gateway
CreationTimestamp:                                        Fri, 20 Jun 2025 14:56:42 -0600
Reference:                                                Deployment/ngf-nginx-gateway-fabric
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  21% (29343744) / 70%
  resource cpu on pods  (as a percentage of request):     7% (7m) / 60%
Min replicas:                                             2
Max replicas:                                             4
Behavior:
  Scale Up:
    Stabilization Window: 100 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 20 seconds
  Scale Down:
    Stabilization Window: 200 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 1  Period: 20 seconds
Deployment pods:    2 current / 2 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count
Events:
  Type     Reason                   Age    From                       Message
  ----     ------                   ----   ----                       -------
  Normal   SuccessfulRescale        5m40s  horizontal-pod-autoscaler  New size: 2; reason: Current number of replicas below Spec.MinReplicas
  Warning  FailedGetResourceMetric  5m25s  horizontal-pod-autoscaler  failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)

I saw HPA get configured for control plane pod but i couldn't see one configured for data plane pod.

Events from the nginx deployment and logs could normal.

helm template ngf ./charts/nginx-gateway-fabric -f ./charts/nginx-gateway-fabric/values.yaml
# Source: nginx-gateway-fabric/templates/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  annotations: 
    example.com/test: "true"
  labels:
    helm.sh/chart: nginx-gateway-fabric-2.0.1
    app.kubernetes.io/name: nginx-gateway-fabric
    app.kubernetes.io/instance: ngf
    app.kubernetes.io/version: "edge"
    app.kubernetes.io/managed-by: Helm
  name: ngf-nginx-gateway-fabric
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ngf-nginx-gateway-fabric
  minReplicas: 2
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  behavior:
    scaleDown:
      policies:
      - periodSeconds: 20
        type: Pods
        value: 1
      stabilizationWindowSeconds: 200
    scaleUp:
      policies:
      - periodSeconds: 20
        type: Pods
        value: 2
      stabilizationWindowSeconds: 100

The NginxProxy resource reflects resources value but not autoscaling fields

k describe nginxproxies.gateway.nginx.org -n nginx-gateway ngf-proxy-config
Name:         ngf-proxy-config
Namespace:    nginx-gateway
Labels:       app.kubernetes.io/instance=ngf
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=nginx-gateway-fabric
              app.kubernetes.io/version=edge
              helm.sh/chart=nginx-gateway-fabric-2.0.1
Annotations:  meta.helm.sh/release-name: ngf
              meta.helm.sh/release-namespace: nginx-gateway
API Version:  gateway.nginx.org/v1alpha2
Kind:         NginxProxy
Metadata:
  Creation Timestamp:  2025-06-20T20:56:42Z
  Generation:          1
  Resource Version:    481755
  UID:                 6140ca30-cd48-4b13-b53f-a7e382f842db
Spec:
  Ip Family:  dual
  Kubernetes:
    Deployment:
      Container:
        Image:
          Pull Policy:  Never
          Repository:   nginx-gateway-fabric/nginx
          Tag:          sa.choudhary
        Resources:
          Requests:
            Cpu:     500m
            Memory:  2Gi
      Replicas:      1
    Service:
      External Traffic Policy:  Local
      Type:                     LoadBalancer
Events:                         <none>

So a couple of observations

  • you are not specifying scaleTargetRef in NGINXProxy resource
  • also, during helm install i am running into error that unknown field "spec.kubernetes.deployment.autoscaling" from the NginxProxy template. So API has been updated but maybe something up with how yaml is nested.
  • I think this could be a possible issue. I did spend some time debugging and adding target refs but that failed too
W0620 15:33:11.844029   43067 warnings.go:70] unknown field "spec.kubernetes.deployment.autoscaling"
W0620 15:33:11.844075   43067 warnings.go:70] unknown field "spec.scaleTargetRef"

What am I doing wrong in terms of testing ? @sjberman @nowjean

@nowjean
Copy link
Author

nowjean commented Jun 21, 2025

@salonichf5 @sjberman Thanks for testing!

Please refer to below guide and review my PR again.

I've patched Makefile generate-crds

crd:maxDescLen=0

This option off the description of CRDs. Because, new nginxProxy manifest file occurs

The CustomResourceDefinition "nginxproxies.gateway.nginx.org" is invalid: metadata.annotations: Too long: must have at most 262144 byte
  1. Rebuild the image
    I’ve updated the CRDs, so please rebuild the NGF Docker image:
    make generate-all.
    make build-ngf-image.
    (You’ll find this target in the Makefile.)

  2. Apply new NginxProxy CRD.

kubectl apply -f gateway.nginx.org_nginxproxies.yaml
  1. Point Helm to the new image
    Edit values.yaml so the Helm chart uses the image you just built:
 image:
   # -- The NGINX Gateway Fabric image to use
   repository: ghcr.io/nginx/nginx-gateway-fabric  # → replace with your new image
   tag: edge  # → replace with your new tag

(In my case, I had to upgrade my runc version to build ngf docker images.)

  1. Deploy the Helm chart
  • The control-plane HPA is created automatically by the Helm template.

  • The data-plane HPA is created by each Gateway resource, because every NginxProxy manages its own control-plane Deployment, Service, and so on. After you apply a Gateway, you’ll see an HPA for its Deployment.

End-users can create multiple Gateways, and each one needs its own HPA, so the logic now lives in the Gateway resource.

Plus, I'm not sure about this part:

Needed to install the metrics server (enabling insecure TLS) to get metrics for resource memory and ScalingActive is false >since i did not set memory request.
should this be communicated to end user about setting additional fields if we want scaling to be active

Normally, we assume that end users already have the Metrics Server running if they're using HPA or similar features. But maybe it's worth adding a note in the docs to avoid confusion.

@nowjean nowjean force-pushed the hpa branch 3 times, most recently from c57e992 to e8399d9 Compare June 24, 2025 09:01
@sjberman
Copy link
Collaborator

sjberman commented Jul 1, 2025

@nowjean Would you mind updating the PR title to something more descriptive, as well as filling out the release note section in the PR description?

@nowjean nowjean changed the title Hpa Add control and data plane HPA Jul 2, 2025
@nowjean
Copy link
Author

nowjean commented Jul 2, 2025

@nowjean Would you mind updating the PR title to something more descriptive, as well as filling out the release note section in the PR description?

Thanks for yours review! I think almost done, Please Let me know if you'd like me to squash these 9 commits into one commit.

@sjberman
Copy link
Collaborator

sjberman commented Jul 2, 2025

@nowjean We'll squash when we merge, so no need to do it now. Just a couple of outstanding comments from me that haven't been addressed yet.

@nowjean
Copy link
Author

nowjean commented Jul 13, 2025

@sjberman Do I need to something more?

@sjberman
Copy link
Collaborator

@nowjean There were still a couple of my comments that haven't been answered/addressed yet.

@nowjean
Copy link
Author

nowjean commented Jul 15, 2025

@nowjean There were still a couple of my comments that haven't been answered/addressed yet.

Thanks, I've replied all of your comments before, Please let me know any needed action.

@sjberman
Copy link
Collaborator

@nowjean I don't see any response to the following:

#3492 (comment)
#3492 (comment)
#3492 (comment)
#3492 (comment)

@nowjean
Copy link
Author

nowjean commented Jul 15, 2025

@nowjean I don't see any response to the following:

#3492 (comment) #3492 (comment) #3492 (comment) #3492 (comment)

@sjberman I also replied to all four comments two weeks ago. Can’t you see that?
image

image image image

@sjberman
Copy link
Collaborator

@nowjean They're pending, which likely means you responded as a review and didn't submit it. Otherwise I can't see them.

@nowjean
Copy link
Author

nowjean commented Jul 16, 2025

@nowjean They're pending, which likely means you responded as a review and didn't submit it. Otherwise I can't see them.

@sjberman So Sorry for that. I've submitted my reviews. Thanks!

@mpstefan
Copy link
Member

Hey @nowjean, we looked at this issue during our community meeting. Did you get a chance to review the comments left on this PR? We're excited to get this in the next release.

@nowjean
Copy link
Author

nowjean commented Jul 22, 2025

@mpstefan Sure! I will push today.

Comment on lines 462 to 453
MinReplicas int32 `json:"minReplicas"`
MinReplicas *int32 `json:"minReplicas"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark as optional and omitempty, similar to other optional fields.

@sjberman
Copy link
Collaborator

@nowjean If you could rebase as well and fix conflicts, that would be great. I'll test this soon.

@nowjean
Copy link
Author

nowjean commented Jul 24, 2025

@sjberman I rebased upstream main branch and resolved conflicts.

git rebase upstream/main
[resolve conflicts..]
git add .
git rebase --continue

and then,

make generate-all
make unit-test

git add .
git commit -m "blah blah blah~~"

It was super complicated task for me. Anyway, I think hpa branch rebased successfully and resolved all conflicts.
Please check this! Thanks.

@sjberman sjberman added the enhancement New feature or request label Jul 24, 2025
@@ -460,6 +466,49 @@ type DaemonSetSpec struct {
Patches []Patch `json:"patches,omitempty"`
}

// +kubebuilder:validation:XValidation:message="at least one metric must be specified when autoscaling is enabled",rule="!self.enabled || (has(self.targetCPUUtilizationPercentage) || has(self.targetMemoryUtilizationPercentage) || (has(self.autoscalingTemplate) && size(self.autoscalingTemplate) > 0))"
// +kubebuilder:validation:XValidation:message="minReplicas must be less than or equal to maxReplicas",rule="self.minReplicas <= self.maxReplicas"
// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to use CEL validation. You can just set minimum and maximum directly on the field. See other int values as examples in here.

// +kubebuilder:validation:XValidation:message="at least one metric must be specified when autoscaling is enabled",rule="!self.enabled || (has(self.targetCPUUtilizationPercentage) || has(self.targetMemoryUtilizationPercentage) || (has(self.autoscalingTemplate) && size(self.autoscalingTemplate) > 0))"
// +kubebuilder:validation:XValidation:message="minReplicas must be less than or equal to maxReplicas",rule="self.minReplicas <= self.maxReplicas"
// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)"
// +kubebuilder:validation:XValidation:message="memory utilization must be between 1 and 100",rule="!has(self.targetMemoryUtilizationPercentage) || (self.targetMemoryUtilizationPercentage >= 1 && self.targetMemoryUtilizationPercentage <= 100)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment

// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)"
// +kubebuilder:validation:XValidation:message="memory utilization must be between 1 and 100",rule="!has(self.targetMemoryUtilizationPercentage) || (self.targetMemoryUtilizationPercentage >= 1 && self.targetMemoryUtilizationPercentage <= 100)"
//
// HPASpec is the configuration for the Horizontal Pod Autoscaling.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put the struct comment above the kubebuilder comments, same pattern we follow elsewhere.

# - type: Pods
# pods:
# metric:
# name: nginx_gateway_fabric_nginx_process_requests_total
Copy link
Collaborator

@sjberman sjberman Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this metric come from?

@@ -432,7 +494,7 @@ nginx:
# - port: 80
# containerPort: 80

# -- The resource requirements of the NGINX container.
# -- The resource requirements of the NGINX container. You should be set this value, If you want to use dataplane Autoscaling(HPA).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is phrased strangely to me. Is it's intention to say that this value should NOT be set if using autoscaling?

Comment on lines +1222 to +1226
// objects := []client.Object{deployment, daemonSet, service}

objects := []client.Object{deployment, daemonSet, service}
// // if hpa := p.buildHPA(objectMeta, nProxyCfg); hpa != nil {
// // objects = append(objects, hpa)
// // }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some commented out code here that can be removed.

@@ -1029,6 +1197,7 @@ func (p *NginxProvisioner) buildNginxResourceObjectsForDeletion(deploymentNSName
// serviceaccount
// configmaps
// secrets
// hpa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved after service since that's the order deletion based on the objects slice.

{{- if .Values.nginx.autoscaling.enabled }}
autoscaling:
enabled: {{ .Values.nginx.autoscaling.enabled }}
minReplicas: {{ .Values.nginx.autoscaling.minReplicas }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is an optional field, it needs a conditional so it doesn't render if it doesn't exist

apiVersion: apps/v1
kind: Deployment
name: {{ include "nginx-gateway.fullname" . }}
minReplicas: {{ .Values.nginxGateway.autoscaling.minReplicas }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is an optional field, it needs a conditional so it doesn't render if it doesn't exist


return buildNginxDeploymentHPA(objectMeta, nProxyCfg.Kubernetes.Deployment.Autoscaling)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have an issue with the current way our controller handles external resource updates. We don't allow anyone but our own controller to make changes to the nginx resources. But that's a problem if the HPA wants to scale it and change the number of replicas. Are there other fields that the HPA updates besides replicas?

I'm going to need to think about how we handle this...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A workaround solution for now could be that where we set the Replicas value in objects.go for the Deployment, we first check to see if HPA is enabled, and then see if the HPA object already exists (using the k8s client) and get its status.desiredReplicas field and set that as the Replicas. Then at least our controller will ensure they are the same value.

I'm hoping I can figure out something better in the long term.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this works, then in the API and helm values where Replicas is defined, we should update comments to say that the value is ignored if HPA is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the adding a scale sub resource to the NginxProxy is a better long term solution. I think it's standard Kubernetes practice to do this when trying to scale custom setups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I can think of that could be problematic is that if our controller is down for any reason, and the HPA sets the /scale on the NginxProxy CRD, our controller can't act on it and nginx won't be scaled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait...could we have it update our existing replicas API field? If so, then that actually would be really nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can. We would still need to update the crd to have a scale sub resource, but we can set the path of the field we want to update, like:
specReplicasPath: .spec.kubernetes.deployment.replicas

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the scaling on the NginxProxy resource isn't going to work. The NginxProxy resource can be attached to the GatewayClass, and its settings apply to all Gateways for that class. So if you have X amount of Gateways (and therefore X amount of nginx Deployments and HPAs), they may all have different requirements for scaling. They can't all update the same replica field at the GatewayClass level. They'll just be fighting with each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah that is true, I guess you'd need one nginx proxy per gateway

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nowjean So all that to say I think the solution in my earlier comment using status.desiredReplicas is probably our option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community documentation Improvements or additions to documentation enhancement New feature or request helm-chart Relates to helm chart release-notes
Projects
Status: External Pull Requests
Development

Successfully merging this pull request may close these issues.

HorizontalPodAutoscaler Support
6 participants