Using targetAllocator, seeing duplicate metric scraping across Collectors #3654

jlcrow · 2025-01-23T15:35:44Z

Component(s)

target allocator

What happened?

Description

I was seeing errors from my Mimir installation indicating duplicate timestamps, so I thought to add an attribute in the OpenTelemetry collector pipeline to identify the collector the metric was coming from, when adding this attribute my metrics being ingested tripled. I was seeing scrapes for the same pod coming from multiple collectors that were being managed by the targetAllocator.

Steps to Reproduce

Create statefulset collector deployment utilizing the target allocator with a consistent-hash allocation strategy and a filter strategy of relabel-configs and prometheus scrape config along with servicemonitors
Deploy
Observe metrics
Add an attribute and env to identify the pod name of the collector in the metrics
Observe metrics (counts become duplicative and show coming from multiple collectors)

Example config

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: monitoring
spec:
  config:
    processors:
      batch:
        send_batch_size: 1000
        send_batch_max_size: 2000
        timeout: 10s           
      memory_limiter: 
        check_interval: 5s
        limit_percentage: 90
      attributes:
        actions:
          - key: collector_instance
            value: "${MY_POD_NAME}"
            action: insert
    extensions:
      health_check:
        endpoint: ${MY_POD_IP}:13133
      k8s_observer:
        auth_type: serviceAccount
        node: ${K8S_NODE_NAME}        
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: kubernetes-pods
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:               
              # Include only pods annotated for scraping
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
              # Replace path and port annotations
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: node
            - action: drop
              regex: Pending|Succeeded|Failed
              source_labels:
              - __meta_kubernetes_pod_phase
            scrape_interval: 30s
            scrape_timeout: 10s      
     
    exporters:
      prometheusremotewrite:
        endpoint: https://mimir-tools.staging.twmlabs.com/api/v1/push
        retry_on_failure:
          enabled: true
          initial_interval: 1s
          max_interval: 10s
          max_elapsed_time: 30s
    service:
      telemetry:
        metrics:
          address: "${MY_POD_IP}:8888"
          level: basic    
        logs:
          level: "warn"  
      extensions:
      - health_check
      pipelines:
        metrics:
          receivers:
          - prometheus
          processors:
          - memory_limiter        
          - attributes
          - batch
          exporters:
          - prometheusremotewrite
  env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
  - name: MY_POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  mode: statefulset
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8888"
  autoscaler:
    behavior:
      scaleUp:
        stabilizationWindowSeconds: 30
    maxReplicas: 10
    minReplicas: 3
    targetCPUUtilization: 70
    targetMemoryUtilization: 70
  resources:
    limits:
      cpu: 1
      memory: 1Gi
    requests:
      cpu: 1
      memory: 1Gi
  targetAllocator:
    allocationStrategy: consistent-hashing
    enabled: true
    filterStrategy: relabel-config
    observability:
      metrics:
        enableMetrics: false
    prometheusCR:
      enabled: true
      podMonitorSelector: {}
      scrapeInterval: 30s
      serviceMonitorSelector: {}
    replicas: 1
    resources:
      limits:
        cpu: 250m
        memory: 500Mi
      requests:
        cpu: 250m
        memory: 500Mi

Expected Result

I would expect each scrape configuration to be distributed only once across collectors.

Actual Result

Metrics scrape configs seem to be duplicated across multiple collectors

Kubernetes Version

1.30.5

Operator version

0.78.2

Collector version

0.116.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") ContainerOS GKE
Compiler(if manually compiled): (e.g., "go 14.2") otel-collector-contrib library

Log output

No logs being info logs

Additional context

After removing the attribute:

The text was updated successfully, but these errors were encountered:

yuriolisa · 2025-01-28T08:15:32Z

@jlcrow, did you have the opportunity to check whether this issue might have the exact root cause of this one?

jlcrow added bug Something isn't working needs triage labels Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using targetAllocator, seeing duplicate metric scraping across Collectors #3654

Using targetAllocator, seeing duplicate metric scraping across Collectors #3654

jlcrow commented Jan 23, 2025

yuriolisa commented Jan 28, 2025

Using targetAllocator, seeing duplicate metric scraping across Collectors #3654

Using targetAllocator, seeing duplicate metric scraping across Collectors #3654

Comments

jlcrow commented Jan 23, 2025

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Kubernetes Version

Operator version

Collector version

Environment information

Environment

Log output

Additional context

yuriolisa commented Jan 28, 2025