Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching kube-scheduler and kube-controller-manager metrics from AWS EKS Control Plane #1219

Open
RaulFiol93 opened this issue Feb 9, 2025 · 5 comments

Comments

@RaulFiol93
Copy link

RaulFiol93 commented Feb 9, 2025

First of all, I really appreciate the work you are doing with this helm chart. It helps a lot to build powerful observability solutions in a simple way!

I am having problems to fetch kube-scheduler and kube-controller-manager metrics using the Cluster Metrics feature in an EKS cluster. Setting a configuration like the following one, it does not work for me:

clusterMetrics:
  enabled: true
  apiServer:
    enabled: true
  cAdvisor:
    enabled: true
  controlPlane:
    enabled: true
  kubeControllerManager:
    enabled: true
  kubeScheduler:
    enabled: true
  windows-exporter:
    enabled: false
  node-exporter:
    enabled: true
  kube-state-metrics:
    enabled: true

On the AWS documentation, it is mentioned:

"For clusters that are Kubernetes version 1.28 and above, Amazon EKS also exposes metrics under the API group metrics.eks.amazonaws.com. These metrics include control plane components such as kube-scheduler and kube-controller-manager"

I added the following extra config to the Alloy collector for metrics converting part of a sample Prometheus configuration provided in the AWS documentation and now I am able to scrape kube-scheduler and kube-proxy metrics from the metrics.eks.amazonaws.com api group:

alloy-metrics:
  enabled: true
  extraConfig: |
    discovery.kubernetes "kube_scheduler" {
      role = "endpoints"
    }

    discovery.kubernetes "kube_controller_manager" {
      role = "endpoints"
    }

    discovery.relabel "kube_scheduler" {
      targets = discovery.kubernetes.kube_scheduler.targets

      rule {
        source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_service_name", "__meta_kubernetes_endpoint_port_name"]
        regex         = "default;kubernetes;https"
        action        = "keep"
      }
    }

    discovery.relabel "kube_controller_manager" {
      targets = discovery.kubernetes.kube_controller_manager.targets

      rule {
        source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_service_name", "__meta_kubernetes_endpoint_port_name"]
        regex         = "default;kubernetes;https"
        action        = "keep"
      }
    }

    prometheus.scrape "kube_scheduler" {
      targets         = discovery.relabel.kube_scheduler.output
      forward_to      = [prometheus.remote_write.metricstore.receiver]
      job_name        = "kube-scheduler"
      scrape_interval = "30s"
      metrics_path    = "/apis/metrics.eks.amazonaws.com/v1/ksh/container/metrics"
      scheme          = "https"

      authorization {
        type             = "Bearer"
        credentials_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      }

      tls_config {
        ca_file              = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        insecure_skip_verify = true
      }
    }

    prometheus.scrape "kube_controller_manager" {
      targets         = discovery.relabel.kube_controller_manager.output
      forward_to      = [prometheus.remote_write.metricstore.receiver]
      job_name        = "kube-controller-manager"
      scrape_interval = "30s"
      metrics_path    = "/apis/metrics.eks.amazonaws.com/v1/kcm/container/metrics"
      scheme          = "https"

      authorization {
        type             = "Bearer"
        credentials_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      }

      tls_config {
        ca_file              = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        insecure_skip_verify = true
      }
    }

The problem is that the created clusterrole that is used by the Alloy pod for collecting metrics needs to be patched with the following permissions in order to access the metrics.eks.amazonaws.com endpoint:

{
  "effect": "allow",
  "apiGroups": [
    "metrics.eks.amazonaws.com"
  ],
  "resources": [
    "kcm/metrics",
    "ksh/metrics"
  ],
  "verbs": [
    "get"
  ] },

When I upgrade the chart, the patch for the new permissions is lost and the clusterrole needs to be repatched to get the metrics again. I searched the rbac.yaml template from the Alloy chart to check if permissions could be added but it seems they are hardcoded.

Is there any workaround here that could be provided? Maybe I am missing something. Thanks again!

@churtado-tech
Copy link

it could be interest add "additionalRulesForClusterRole" in values to allow add custom specification, no?

@RaulFiol93
Copy link
Author

Yes, that could help

@khjean
Copy link

khjean commented Feb 13, 2025

I wonder if this issue has been resolved?
I couldn't find 'additionalRulesForClusterRole' in values.yaml. Does what was mentioned above mean creating the RBAC resource from scratch?

@churtado-tech
Copy link

That value doesn't exist. I didn't find where rbac are defined to create a PR. It would be easy to add.

@khjean
Copy link

khjean commented Feb 13, 2025

oh ok. i ll try to add custom clusterrole and clusterrolebinding
but i just wanted to add rule in existing clusterrole ( alloy ) using as extrarule or something..!
thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants