Setup CWS Quality Gates#50373
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Files inventory check summaryFile checks results against ancestor 8f67edd8: Results for datadog-agent_7.80.0~devel.git.484.c0afd84.pipeline.111519545-1_amd64.deb:No change detected |
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
On-wire sizes (compressed)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 6416cce Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.54 | [+0.57, +6.51] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.54 | [+0.57, +6.51] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | +0.93 | [-0.06, +1.92] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_logs | memory utilization | +0.80 | [+0.69, +0.91] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | +0.54 | [+0.38, +0.69] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.39 | [+0.19, +0.58] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.33 | [+0.16, +0.51] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.24 | [+0.19, +0.28] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_mean_fs_load | memory utilization | +0.18 | [+0.14, +0.22] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_idle | memory utilization | +0.14 | [+0.07, +0.22] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_memory | memory utilization | +0.10 | [+0.00, +0.21] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | +0.10 | [-0.06, +0.26] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +0.04 | [-0.17, +0.25] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.03 | [-0.41, +0.47] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.01 | [-0.19, +0.21] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.01 | [-0.10, +0.11] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | -0.00 | [-0.54, +0.53] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.00 | [-0.20, +0.20] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.01 | [-0.42, +0.39] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.05 | [-0.20, +0.09] | 1 | Logs |
| ➖ | quality_gate_security_no_fs_load | memory utilization | -0.10 | [-0.21, -0.00] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.16 | [-0.21, -0.12] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.26 | [-0.31, -0.21] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_logs | memory utilization | -0.29 | [-0.37, -0.21] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.46 | [-0.69, -0.22] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | -0.50 | [-0.75, -0.25] | 1 | Logs bounds checks dashboard |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 694 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 245.35MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 700 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.16GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.21GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.17GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.19GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 141.12MiB ≤ 147MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 466.82MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 175.80MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 336.11 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 380.27MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_security_idle | cpu_usage | 10/10 | 26.12 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_idle | memory_usage | 10/10 | 291.89MiB ≤ 330MiB | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | cpu_usage | 10/10 | 52.35 ≤ 70 | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | memory_usage | 10/10 | 266.30MiB ≤ 320MiB | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | cpu_usage | 10/10 | 22.25 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | memory_usage | 10/10 | 269.67MiB ≤ 320MiB | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
ed82d28 to
3e76efa
Compare
3e76efa to
c0afd84
Compare
|
🎯 Code Coverage (details) 🔗 Commit SHA: c0afd84 | Docs | Datadog PR Page | Give us feedback! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0afd844a3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| remote_configuration: | ||
| enabled: false |
There was a problem hiding this comment.
Nest remote configuration setting under runtime_security_config
remote_configuration.enabled is currently set at the top level of system-probe.yaml, but CWS remote-config gating is read from runtime_security_config.remote_configuration.enabled (and then combined with global remote_configuration.enabled) in pkg/security/config/config.go (isRemoteConfigEnabled, lines 735-748). With this placement, the CWS-specific flag stays at its default true, so these quality gates still run with RC enabled, which can introduce remote policy/config traffic and non-deterministic overhead instead of the intended local-policy-only baseline. The same pattern appears in the other two new security quality-gate cases as well.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This goes against the guidance I was provided by CWS. We can investigate later - non blocker.
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
devflow unqueued this merge request: It did not become mergeable within the expected time |
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|

What does this PR do?
Adds three SMP regression quality gates for CWS (Workload Protection) under
test/regression/cases/:quality_gate_security_idle— CWS on, shipped default policy, no lading generator. Baseline "turn it on and leave it alone" floor.quality_gate_security_no_fs_load— CWS on, experiment-supplieddefault.policy, no generator. Isolates policy + approver overhead at zero application load.quality_gate_security_mean_fs_load— CWS on, same experimentdefault.policy,file_treegenerator sized to org2's per-host meanperf_buffer.events.writerate.Each cases enforces a memory & cpu bound.
Removes the
file_treeexperiment:Motivation
CWS lacks quality-gate coverage in SMP. The three gates form a ladder — idle → no-load-with-policy → mean-FS-load — so a memory regression can be attributed to policy loading, approver overhead, or event-processing overhead rather than lumped together. The mean-FS-load rate (
open_per_second: 41) reflects the org2 per-host weekly mean forperf_buffer.events.writeso the gate tracks production rather than an arbitrary stressor.Describe how you validated your changes
The majority of the work involved understanding how things work: what the CWS SLOs are, what they measure, what that is/means in production, configuring the agent to properly exercise the load we're putting on it, etc.
Additional Notes
file_treemax_nodes: 500000is sized to exceedopen_per_second × run_durationso the tree never exhausts mid-capture (once all nodes exist, subsequent opens areO_RDONLYand rejected by CWS flag approvers).activity_dump.enabled: falsein bothsecurity-agent.yamlandsystem-probe.yaml— activity dump is being reworked and adds unpredictable kernel-event volume.