GCP: Cluster Autoscaler only discovers first MIG when multiple MIGs share prefix in different zones #7772

Debraj-git · 2025-01-27T11:56:29Z

Description:
We've identified an issue with Cluster Autoscaler (CAS) where it fails to discover and manage all MIGs when multiple MIGs share the same name prefix but exist in different zones.

Version:
Cluster Autoscaler: v1.27.5
Chart version: cluster-autoscaler-9.21.1
Cloud Provider: GCP (GKE)

Current Behavior:

When using auto-discovery with a prefix pattern, CAS only discovers and manages the first MIG it finds
Additional MIGs with the same prefix in other zones are marked with "no node group config"
Once a zone is cached for a MIG prefix, CAS never discovers similar MIGs in other zones

Expected Behavior:
CAS should discover and manage all MIGs that match the specified prefix pattern, regardless of their zone.

Reproduction Steps:
Have a GKE cluster with nodepools spanning multiple zones
Each nodepool creates 2 MIGs with same prefix in different zones

Configured CAS with auto-discovery:

      containers:
      - command:
        - ./cluster-autoscaler
        - --cloud-provider=gce
        - --namespace=kube-system
        - --node-group-auto-discovery=mig:namePrefix=gke-gke-cluster-oa5a-enpla9up21-spot-,min=2,max=6
        - --balance-similar-node-groups=true
        - --expander=priority
        - --logtostderr=true
        - --max-node-provision-time=5m
        - --min-replica-count=0
        - --scale-down-delay-after-add=15m
        - --scale-down-delay-after-delete=5m
        - --scale-down-delay-after-failure=3m
        - --scale-down-enabled=true
        - --scale-down-unneeded-time=5m
        - --scale-down-utilization-threshold=0.7
        - --scan-interval=1m
        - --skip-nodes-with-local-storage=false
        - --skip-nodes-with-system-pods=true
        - --stderrthreshold=info
        - --v=4

MIG with same prefix - across 2 AZ

Logs:
I0127 05:19:47.609819       1 autoscaling_gce_client.go:504] found managed instance group gke-gke-cluster-oa5a-enpla9up21-spot--8a413476-grp matching regexp ^gke-gke-cluster-oa5a-enpla9up21-spot.+
I0127 05:19:47.761732       1 mig_info_provider.go:185] Regenerating MIG instances cache for production-platform-402407/us-east4-b/gke-gke-cluster-oa5a-enpla9up21-spot--8a413476-grp

// Only processes nodes from one MIG, ignores others
I0127 05:19:47.834646       1 pre_filtering_processor.go:67] Skipping gke-gke-cluster-oa5a-enpla9up21-spot--8a413476-g3de - node group min size reached (current: 2, min: 2)
I0127 05:19:47.834663       1 pre_filtering_processor.go:57] Node gke-gke-cluster-oa5a-enpla9up21-spot--ba82cb9b-6v9m should not be processed by cluster autoscaler (no node group config)  //this node belongs to a different MIG in the same nodepool

The text was updated successfully, but these errors were encountered:

rpsadarangani · 2025-01-27T15:48:41Z

/label kind-bug

k8s-ci-robot · 2025-01-27T15:48:44Z

@rpsadarangani: The label(s) /label kind-bug cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label kind-bug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rpsadarangani · 2025-01-27T15:50:29Z

/area provider-gcp

k8s-ci-robot · 2025-01-27T15:50:32Z

@rpsadarangani: The label(s) area/provider-gcp cannot be applied, because the repository doesn't have them.

In response to this:

/area provider-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

adrianmoisey · 2025-01-27T16:57:50Z

/area cluster-autoscaler

k8s-ci-robot added the area/cluster-autoscaler label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCP: Cluster Autoscaler only discovers first MIG when multiple MIGs share prefix in different zones #7772

GCP: Cluster Autoscaler only discovers first MIG when multiple MIGs share prefix in different zones #7772

Debraj-git commented Jan 27, 2025 •

edited

Loading

rpsadarangani commented Jan 27, 2025

k8s-ci-robot commented Jan 27, 2025

rpsadarangani commented Jan 27, 2025

k8s-ci-robot commented Jan 27, 2025

adrianmoisey commented Jan 27, 2025

GCP: Cluster Autoscaler only discovers first MIG when multiple MIGs share prefix in different zones #7772

GCP: Cluster Autoscaler only discovers first MIG when multiple MIGs share prefix in different zones #7772

Comments

Debraj-git commented Jan 27, 2025 • edited Loading

rpsadarangani commented Jan 27, 2025

k8s-ci-robot commented Jan 27, 2025

rpsadarangani commented Jan 27, 2025

k8s-ci-robot commented Jan 27, 2025

adrianmoisey commented Jan 27, 2025

Debraj-git commented Jan 27, 2025 •

edited

Loading