[workloadmeta] Add cluster-agent collector catalog by davidor · Pull Request #50482 · DataDog/datadog-agent

davidor · 2026-05-07T12:38:42Z

What does this PR do?

This PR adds a new workloadmeta catalog specific to the cluster-agent.

This is similar to what already exists for other agents: there's a catalog for dogstatsd, otel, etc.

The reason I'm doing this is that the cluster-agent only needs one collector from the catalog: kubeapiserver. And this collector isn't needed in any other sub-agent. I think having a dedicated catalog makes the code easier to reason about, and avoids pulling in dependencies we don't need (not many in this case).

This change also helps simplify another PR I have open: #50305

I think there's something else we can do about workloadmeta catalogs. There's a "global" catalog, but I think most places that use it could use the "core" catalog instead. I'll leave this for a future PR to avoid introducing too many changes at once.

Describe how you validated your changes

CI + deployed locally on a kind cluster. I verified that kubeapiserver collector still works in the DCA by checking agent workload-list. Also verified that the agent check command still works in the DCA (agent check kubernetes_apiserver).

dd-octo-sts · 2026-05-07T12:46:20Z

Go Package Import Differences

Baseline: 2898c97
Comparison: 83092bd

binary

os

arch

change

cluster-agent

linux

amd64

+1, -2

-github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog-clusteragent
-github.com/DataDog/datadog-agent/comp/core/workloadmeta/proto

cluster-agent

linux

arm64

+1, -2

-github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog-clusteragent
-github.com/DataDog/datadog-agent/comp/core/workloadmeta/proto

dd-octo-sts · 2026-05-07T13:05:55Z

Files inventory check summary

File checks results against ancestor 2898c979:

Results for datadog-agent_7.80.0~devel.git.602.83092bd.pipeline.112144741-1_amd64.deb:

No change detected

dd-octo-sts · 2026-05-07T13:14:06Z

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 2898c97
📊 Static Quality Gates Dashboard
🔗 SQG Job

Successful checks

Info

	Quality gate	Change	Size (prev → curr → max)
✅	agent_deb_amd64	-120.19 KiB (0.02% reduction)	742.626 → 742.509 → 750.310
✅	agent_deb_amd64_fips	-116.19 KiB (0.02% reduction)	700.654 → 700.540 → 702.690
✅	agent_msi	-123.5 KiB (0.02% reduction)	608.438 → 608.318 → 623.540
✅	agent_rpm_amd64	-120.19 KiB (0.02% reduction)	742.610 → 742.493 → 750.280
✅	agent_rpm_amd64_fips	-116.19 KiB (0.02% reduction)	700.637 → 700.524 → 702.670
✅	agent_rpm_arm64	-108.19 KiB (0.01% reduction)	720.515 → 720.410 → 724.050
✅	agent_rpm_arm64_fips	-108.22 KiB (0.02% reduction)	681.630 → 681.525 → 684.460
✅	agent_suse_amd64	-120.19 KiB (0.02% reduction)	742.610 → 742.493 → 750.280
✅	agent_suse_amd64_fips	-116.19 KiB (0.02% reduction)	700.637 → 700.524 → 702.670
✅	agent_suse_arm64	-108.19 KiB (0.01% reduction)	720.515 → 720.410 → 724.050
✅	agent_suse_arm64_fips	-108.22 KiB (0.02% reduction)	681.630 → 681.525 → 684.460
✅	docker_agent_amd64	-120.19 KiB (0.01% reduction)	802.814 → 802.697 → 805.870
✅	docker_agent_arm64	-108.19 KiB (0.01% reduction)	805.552 → 805.446 → 809.730
✅	docker_agent_jmx_amd64	-120.19 KiB (0.01% reduction)	993.734 → 993.616 → 996.590
✅	docker_agent_jmx_arm64	-108.19 KiB (0.01% reduction)	985.250 → 985.145 → 989.410
✅	docker_cluster_agent_amd64	-181.76 KiB (0.09% reduction)	206.662 → 206.484 → 207.600
✅	docker_cluster_agent_arm64	-193.78 KiB (0.09% reduction)	220.697 → 220.508 → 221.150

16 successful checks with minimal change (< 2 KiB)

	Quality gate	Current Size
✅	agent_heroku_amd64	309.245 MiB
✅	docker_cws_instrumentation_amd64	7.142 MiB
✅	docker_cws_instrumentation_arm64	6.689 MiB
✅	docker_host_profiler_amd64	302.171 MiB
✅	docker_host_profiler_arm64	313.666 MiB
✅	docker_dogstatsd_amd64	39.507 MiB
✅	docker_dogstatsd_arm64	37.691 MiB
✅	dogstatsd_deb_amd64	30.169 MiB
✅	dogstatsd_deb_arm64	28.294 MiB
✅	dogstatsd_rpm_amd64	30.169 MiB
✅	dogstatsd_suse_amd64	30.169 MiB
✅	iot_agent_deb_amd64	44.518 MiB
✅	iot_agent_deb_arm64	41.494 MiB
✅	iot_agent_deb_armhf	42.230 MiB
✅	iot_agent_rpm_amd64	44.518 MiB
✅	iot_agent_suse_amd64	44.518 MiB

gabedos

lgtm

Should we remove the kubeapiserver references in the other catalogs too?

datadog-agent/comp/core/workloadmeta/collectors/catalog/options.go

Line 37 in f22e034

kubeapiserver.GetFxOptions(),

davidor · 2026-05-07T13:19:32Z

@gabedos I plan to do that on a separate PR.

It's related to what I mentioned in the PR description:

I think there's something else we can do about workloadmeta catalogs. There's a "global" catalog, but I think most places that use it could use the "core" catalog instead. I'll leave this for a future PR to avoid introducing too many changes at once.

The "global" catalog is still used from a few places and I prefer to address that separately.

davidor · 2026-05-07T13:32:27Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-05-07T13:32:31Z

View all feedbacks in Devflow UI.

2026-05-07 13:32:31 UTC ℹ️ Start processing command /merge

2026-05-07 13:32:39 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2026-05-07 17:36:13 UTC ⚠️ MergeQueue: This merge request was unqueued

devflow unqueued this merge request: It did not become mergeable within the expected time

cit-pr-commenter-54b7da · 2026-05-07T13:38:51Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 6d1fd823-1b85-49b8-acf9-d7ee6613e91a

Baseline: ef81861
Comparison: 6b1604f
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	+0.26	[-2.68, +3.20]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_logs	% cpu utilization	+1.55	[+0.56, +2.54]	1	Logs bounds checks dashboard
➖	tcp_syslog_to_blackhole	ingress throughput	+1.54	[+1.36, +1.73]	1	Logs
➖	quality_gate_metrics_logs	memory utilization	+0.99	[+0.74, +1.24]	1	Logs bounds checks dashboard
➖	ddot_logs	memory utilization	+0.67	[+0.60, +0.74]	1	Logs
➖	otlp_ingest_metrics	memory utilization	+0.50	[+0.35, +0.66]	1	Logs
➖	file_tree	memory utilization	+0.38	[+0.33, +0.42]	1	Logs
➖	docker_containers_cpu	% cpu utilization	+0.26	[-2.68, +3.20]	1	Logs
➖	ddot_metrics_sum_delta	memory utilization	+0.21	[+0.02, +0.40]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	+0.13	[-0.11, +0.36]	1	Logs
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	+0.07	[+0.03, +0.12]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	+0.07	[-0.37, +0.50]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	+0.01	[-0.13, +0.16]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.09, +0.10]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	-0.00	[-0.20, +0.20]	1	Logs
➖	otlp_ingest_logs	memory utilization	-0.00	[-0.10, +0.09]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	-0.01	[-0.53, +0.51]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.02	[-0.21, +0.18]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	-0.03	[-0.20, +0.13]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	-0.04	[-0.44, +0.36]	1	Logs
➖	docker_containers_memory	memory utilization	-0.05	[-0.16, +0.05]	1	Logs
➖	quality_gate_idle	memory utilization	-0.21	[-0.26, -0.16]	1	Logs bounds checks dashboard
➖	quality_gate_idle_all_features	memory utilization	-0.30	[-0.34, -0.26]	1	Logs bounds checks dashboard
➖	ddot_metrics	memory utilization	-0.37	[-0.57, -0.17]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	docker_containers_cpu	simple_check_run	10/10	694 ≥ 26
✅	docker_containers_memory	memory_usage	10/10	244.91MiB ≤ 370MiB
✅	docker_containers_memory	simple_check_run	10/10	698 ≥ 26
✅	file_to_blackhole_0ms_latency	memory_usage	10/10	0.16GiB ≤ 1.20GiB
✅	file_to_blackhole_0ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10	0.21GiB ≤ 1.20GiB
✅	file_to_blackhole_1000ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_100ms_latency	memory_usage	10/10	0.17GiB ≤ 1.20GiB
✅	file_to_blackhole_100ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_500ms_latency	memory_usage	10/10	0.19GiB ≤ 1.20GiB
✅	file_to_blackhole_500ms_latency	missed_bytes	10/10	0B = 0B
✅	quality_gate_idle	intake_connections	10/10	3 ≤ 4	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	142.10MiB ≤ 147MiB	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	3 ≤ 4	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	469.23MiB ≤ 495MiB	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	178.36MiB ≤ 195MiB	bounds checks dashboard
✅	quality_gate_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	354.90 ≤ 2000	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	3 ≤ 6	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	371.00MiB ≤ 430MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.

davidor · 2026-05-07T17:58:02Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-05-07T17:58:06Z

View all feedbacks in Devflow UI.

2026-05-07 17:58:05 UTC ℹ️ Start processing command /merge

2026-05-07 17:58:15 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2026-05-07 22:32:10 UTC ⚠️ MergeQueue: This merge request was unqueued

devflow unqueued this merge request: It did not become mergeable within the expected time

davidor · 2026-05-08T06:58:29Z

/merge -c

gh-worker-devflow-routing-ef8351 · 2026-05-08T06:58:36Z

View all feedbacks in Devflow UI.

2026-05-08 06:58:32 UTC ℹ️ Start processing command /merge -c
If you need support, contact us on Slack #devflow!

2026-05-08 06:58:34 UTC ❌ Devflow: /merge -c

This merge request is not in the queue and can't be unqueued

To get help about command usage, write /merge --help

If you need support, contact us on Slack #devflow with those details!

davidor · 2026-05-08T07:26:20Z

@gabedos I changed my mind on this one. I assumed the general catalog included all collectors and was thinking about migrating its users (a few commands like jmx, check, etc.) to the appropriate specific catalog in a future PR. But I realized it's not that simple, because there are a couple of collectors that aren't included in the general catalog: process and sbom. I'm not totally sure about the implications of running those in commands that don't run them today, so for now I'll just remove kubeapiserver from the general catalog, since that's what I need for my other PR about the lazy start of the autoscaling components.

davidor · 2026-05-08T08:12:09Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-05-08T08:12:13Z

View all feedbacks in Devflow UI.

2026-05-08 08:12:13 UTC ℹ️ Start processing command /merge

2026-05-08 08:12:21 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2026-05-08 08:25:14 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 2h (p90).

2026-05-08 09:03:25 UTC ℹ️ MergeQueue: This merge request was merged

davidor added this to the 7.80.0 milestone May 7, 2026

davidor requested review from a team as code owners May 7, 2026 12:38

davidor added the changelog/no-changelog No changelog entry needed label May 7, 2026

davidor requested a review from pgimalac May 7, 2026 12:38

davidor added the qa/done QA done before merge and regressions are covered by tests label May 7, 2026

dd-octo-sts Bot added internal Identify a non-fork PR team/container-platform The Container Platform Team team/agent-runtimes team/agent-integrations labels May 7, 2026

github-actions Bot added the medium review PR review might take time label May 7, 2026

pgimalac approved these changes May 7, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

gabedos approved these changes May 7, 2026

View reviewed changes

gabedos reviewed May 7, 2026

View reviewed changes

sarah-witt approved these changes May 7, 2026

View reviewed changes

davidor added 2 commits May 8, 2026 09:12

[workloadmeta] Add cluster-agent collector catalog

72c0cb5

[cli/check] Parameterize check command with wmeta catalog

30618f9

[workloadmeta] Delete kubeapiserver from generic catalog

83092bd

davidor force-pushed the davidor/wmeta-split-clusteragent-catalog branch from 1994c4f to 83092bd Compare May 8, 2026 07:25

gh-worker-dd-mergequeue-cf854d Bot merged commit 6b1604f into main May 8, 2026
290 checks passed

gh-worker-dd-mergequeue-cf854d Bot deleted the davidor/wmeta-split-clusteragent-catalog branch May 8, 2026 09:03

davidor mentioned this pull request May 8, 2026

[clusteragent/autoscaling] Defer autoscaling stack startup until first DPA #50305

Open

Conversation

davidor commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Describe how you validated your changes

Uh oh!

dd-octo-sts Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Go Package Import Differences

Uh oh!

dd-octo-sts Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files inventory check summary

Results for datadog-agent_7.80.0~devel.git.602.83092bd.pipeline.112144741-1_amd64.deb:

Uh oh!

dd-octo-sts Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static quality checks

Info

Uh oh!

This comment has been minimized.

gabedos left a comment

Choose a reason for hiding this comment

Uh oh!

davidor commented May 7, 2026

Uh oh!

davidor commented May 7, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cit-pr-commenter-54b7da Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

Uh oh!

davidor commented May 7, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidor commented May 8, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 8, 2026

Uh oh!

davidor commented May 8, 2026

Uh oh!

davidor commented May 8, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

davidor commented May 7, 2026 •

edited

Loading

dd-octo-sts Bot commented May 7, 2026 •

edited

Loading

dd-octo-sts Bot commented May 7, 2026 •

edited

Loading

dd-octo-sts Bot commented May 7, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 Bot commented May 7, 2026 •

edited

Loading

cit-pr-commenter-54b7da Bot commented May 7, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 Bot commented May 7, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 Bot commented May 8, 2026 •

edited

Loading