Fetching kube-scheduler and kube-controller-manager metrics from AWS EKS Control Plane #1219

RaulFiol93 · 2025-02-09T19:42:59Z

First of all, I really appreciate the work you are doing with this helm chart. It helps a lot to build powerful observability solutions in a simple way!

I am having problems to fetch kube-scheduler and kube-controller-manager metrics using the Cluster Metrics feature in an EKS cluster. Setting a configuration like the following one, it does not work for me:

clusterMetrics:
  enabled: true
  apiServer:
    enabled: true
  cAdvisor:
    enabled: true
  controlPlane:
    enabled: true
  kubeControllerManager:
    enabled: true
  kubeScheduler:
    enabled: true
  windows-exporter:
    enabled: false
  node-exporter:
    enabled: true
  kube-state-metrics:
    enabled: true

On the AWS documentation, it is mentioned:

"For clusters that are Kubernetes version 1.28 and above, Amazon EKS also exposes metrics under the API group metrics.eks.amazonaws.com. These metrics include control plane components such as kube-scheduler and kube-controller-manager"

I added the following extra config to the Alloy collector for metrics converting part of a sample Prometheus configuration provided in the AWS documentation and now I am able to scrape kube-scheduler and kube-proxy metrics from the metrics.eks.amazonaws.com api group:

alloy-metrics:
  enabled: true
  extraConfig: |
    discovery.kubernetes "kube_scheduler" {
      role = "endpoints"
    }

    discovery.kubernetes "kube_controller_manager" {
      role = "endpoints"
    }

    discovery.relabel "kube_scheduler" {
      targets = discovery.kubernetes.kube_scheduler.targets

      rule {
        source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_service_name", "__meta_kubernetes_endpoint_port_name"]
        regex         = "default;kubernetes;https"
        action        = "keep"
      }
    }

    discovery.relabel "kube_controller_manager" {
      targets = discovery.kubernetes.kube_controller_manager.targets

      rule {
        source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_service_name", "__meta_kubernetes_endpoint_port_name"]
        regex         = "default;kubernetes;https"
        action        = "keep"
      }
    }

    prometheus.scrape "kube_scheduler" {
      targets         = discovery.relabel.kube_scheduler.output
      forward_to      = [prometheus.remote_write.metricstore.receiver]
      job_name        = "kube-scheduler"
      scrape_interval = "30s"
      metrics_path    = "/apis/metrics.eks.amazonaws.com/v1/ksh/container/metrics"
      scheme          = "https"

      authorization {
        type             = "Bearer"
        credentials_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      }

      tls_config {
        ca_file              = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        insecure_skip_verify = true
      }
    }

    prometheus.scrape "kube_controller_manager" {
      targets         = discovery.relabel.kube_controller_manager.output
      forward_to      = [prometheus.remote_write.metricstore.receiver]
      job_name        = "kube-controller-manager"
      scrape_interval = "30s"
      metrics_path    = "/apis/metrics.eks.amazonaws.com/v1/kcm/container/metrics"
      scheme          = "https"

      authorization {
        type             = "Bearer"
        credentials_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      }

      tls_config {
        ca_file              = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        insecure_skip_verify = true
      }
    }

The problem is that the created clusterrole that is used by the Alloy pod for collecting metrics needs to be patched with the following permissions in order to access the metrics.eks.amazonaws.com endpoint:

{
  "effect": "allow",
  "apiGroups": [
    "metrics.eks.amazonaws.com"
  ],
  "resources": [
    "kcm/metrics",
    "ksh/metrics"
  ],
  "verbs": [
    "get"
  ] },

When I upgrade the chart, the patch for the new permissions is lost and the clusterrole needs to be repatched to get the metrics again. I searched the rbac.yaml template from the Alloy chart to check if permissions could be added but it seems they are hardcoded.

Is there any workaround here that could be provided? Maybe I am missing something. Thanks again!

The text was updated successfully, but these errors were encountered:

churtado-tech · 2025-02-11T14:09:31Z

it could be interest add "additionalRulesForClusterRole" in values to allow add custom specification, no?

RaulFiol93 · 2025-02-11T22:41:06Z

Yes, that could help

khjean · 2025-02-13T07:19:06Z

I wonder if this issue has been resolved?
I couldn't find 'additionalRulesForClusterRole' in values.yaml. Does what was mentioned above mean creating the RBAC resource from scratch?

churtado-tech · 2025-02-13T09:10:07Z

That value doesn't exist. I didn't find where rbac are defined to create a PR. It would be easy to add.

khjean · 2025-02-13T09:31:24Z

oh ok. i ll try to add custom clusterrole and clusterrolebinding
but i just wanted to add rule in existing clusterrole ( alloy ) using as extrarule or something..!
thanks :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetching kube-scheduler and kube-controller-manager metrics from AWS EKS Control Plane #1219

Fetching kube-scheduler and kube-controller-manager metrics from AWS EKS Control Plane #1219

RaulFiol93 commented Feb 9, 2025 •

edited

Loading

churtado-tech commented Feb 11, 2025

RaulFiol93 commented Feb 11, 2025

khjean commented Feb 13, 2025

churtado-tech commented Feb 13, 2025

khjean commented Feb 13, 2025

Fetching kube-scheduler and kube-controller-manager metrics from AWS EKS Control Plane #1219

Fetching kube-scheduler and kube-controller-manager metrics from AWS EKS Control Plane #1219

Comments

RaulFiol93 commented Feb 9, 2025 • edited Loading

churtado-tech commented Feb 11, 2025

RaulFiol93 commented Feb 11, 2025

khjean commented Feb 13, 2025

churtado-tech commented Feb 13, 2025

khjean commented Feb 13, 2025

RaulFiol93 commented Feb 9, 2025 •

edited

Loading