Skip to content

Add pipeline for system metrics#2052

Merged
sky333999 merged 7 commits intomainfrom
sky333999/system
Mar 16, 2026
Merged

Add pipeline for system metrics#2052
sky333999 merged 7 commits intomainfrom
sky333999/system

Conversation

@sky333999
Copy link
Contributor

@sky333999 sky333999 commented Mar 13, 2026

Description of changes

  • Adds a new OTEL pipeline and receiver that collects JVM heap metrics and host system metrics on EC2 Linux hosts, publishing them to CloudWatch under the CWAgent/System namespace.
  • Pipeline: systemmetrics → ec2tagger → batch(15m) → awscloudwatch
  • Pipeline only setup in any of the following cases:
    • explicitly enabled via agent.system_metrics_enabled config field
    • explicitly enabled via SYSTEM_METRICS_ENABLED
    • not running in containerized envs and is running on a recognized instance
  • All metrics published with fixed schema - InstanceId dimension and No dimensions (i.e. aggregated to the account level)

Published metrics (18 total)

Metric Source Unit Scope
heap_max_bytes jvm_heap_committed_bytes from socket Bytes Per-JVM
heap_committed_bytes jvm_heap_committed_bytes from socket Bytes Per-JVM
heap_after_gc_bytes jvm_heap_after_gc_bytes from socket Bytes Per-JVM
heap_free_after_gc_bytes max − afterGC Bytes Per-JVM
aggregate_heap_after_gc_utilized SUM(afterGC) / SUM(max) × 100 Percent Per-Box
aggregate_heap_max_bytes SUM(max) Bytes Per-Box
aggregate_heap_free_after_gc_bytes SUM(max) − SUM(afterGC) Bytes Per-Box
aggregate_jvm_count Count of discovered JVMs None Per-Box
mem_total gopsutil /proc/meminfo MemTotal Bytes Per-Box
mem_available gopsutil /proc/meminfo MemAvailable Bytes Per-Box
mem_cached gopsutil /proc/meminfo Cached Bytes Per-Box
mem_active gopsutil /proc/meminfo Active Bytes Per-Box
cpu_time_iowait gopsutil /proc/stat (delta-based, 8-state denominator) Percent Per-Box
aggregate_disk_used gopsutil disk.Partitions/Usage (aggregated, /dev/ filter) Bytes Per-Box
aggregate_disk_free gopsutil disk.Partitions/Usage (aggregated, /dev/ filter) Bytes Per-Box
aggregate_bw_in_allowance_exceeded safchain/ethtool (delta, summed across interfaces, skip lo/veth) None Per-Box
aggregate_bw_out_allowance_exceeded safchain/ethtool (delta, summed across interfaces, skip lo/veth) None Per-Box
aggregate_pps_allowance_exceeded safchain/ethtool (delta, summed across interfaces, skip lo/veth) None Per-Box

CloudWatch exporter changes

Added optional max_retry_count, backoff_retry_base, and max_concurrent_publishers config fields to the CloudWatch exporter. Zero values preserve existing defaults. The system metrics pipeline uses conservative settings: 2 retries, 1-minute backoff base, single concurrent worker — minimizing PutMetricData API pressure.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Requirements

Before commiting your code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

Integration Tests

To run integration tests against this PR, add the ready for testing label.

@sky333999 sky333999 added the ready for testing Indicates this PR is ready for integration tests to run label Mar 16, 2026
@sky333999 sky333999 marked this pull request as ready for review March 16, 2026 17:44
@sky333999 sky333999 requested a review from a team as a code owner March 16, 2026 17:44
chadpatel
chadpatel previously approved these changes Mar 16, 2026
Copy link
Contributor

@chadpatel chadpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed agentically by comparing to the other baseline, looks good.

jefchien
jefchien previously approved these changes Mar 16, 2026
@sky333999 sky333999 dismissed stale reviews from jefchien and chadpatel via a84db8f March 16, 2026 20:09
@sky333999 sky333999 merged commit 7321479 into main Mar 16, 2026
267 checks passed
@sky333999 sky333999 deleted the sky333999/system branch March 16, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for testing Indicates this PR is ready for integration tests to run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants