Conversation
| - **Compliance Config** (CSPM host benchmarks) | ||
| - **SBOM Scanning** (host vulnerability management) | ||
|
|
||
| ## Owners |
There was a problem hiding this comment.
This is a new concept for QGs.
|
|
||
| - Memory usage is below a threshold | ||
|
|
||
| ## Additional Information |
There was a problem hiding this comment.
This is a new concept for QGs.
|
|
||
| The emitted metric from SMP should have a similar value to the production data we source. | ||
|
|
||
| ### Verifying the Experiment Configuration |
There was a problem hiding this comment.
This is a new concept for QGs. This will allow us to close the loop and maintain QGs in a way that remain representative of real usage.
e5523fc to
cbe02d9
Compare
| /test/benchmarks/apm_scripts/ @DataDog/agent-apm | ||
| /test/regression/ @DataDog/single-machine-performance | ||
| /test/regression/cases/docker_containers* @DataDog/single-machine-performance @DataDog/container-integrations | ||
| /test/regression/cases/quality_gate_security_base* @DataDog/single-machine-performance @DataDog/agent-security |
There was a problem hiding this comment.
would want to drop SMP eventually (maybe after approval?)
Files inventory check summaryFile checks results against ancestor baafab8c: Results for datadog-agent_7.80.0~devel.git.462.fe74a2b.pipeline.111329495-1_amd64.deb:No change detected |
Static quality checks✅ Please find below the results from static quality gates 32 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: baafab8 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | -0.54 | [-3.41, +2.34] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +0.62 | [+0.45, +0.80] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | +0.49 | [+0.43, +0.55] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.48 | [+0.44, +0.52] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_memory | memory utilization | +0.46 | [+0.36, +0.56] | 1 | Logs |
| ➖ | quality_gate_security_no_fs_load | memory utilization | +0.38 | [+0.28, +0.48] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_logs | memory utilization | +0.37 | [+0.27, +0.48] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.29 | [+0.09, +0.49] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.18 | [-0.00, +0.36] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | +0.15 | [-0.00, +0.31] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.06 | [-0.51, +0.62] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | +0.03 | [-0.20, +0.27] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.02 | [-0.38, +0.43] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.00 | [-0.20, +0.21] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.20, +0.20] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.01 | [-0.10, +0.09] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.03 | [-0.16, +0.10] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.04 | [-0.20, +0.12] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | -0.06 | [-0.51, +0.39] | 1 | Logs |
| ➖ | quality_gate_security_idle | memory utilization | -0.08 | [-0.15, -0.01] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.13 | [-0.18, -0.07] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.27 | [-0.33, -0.22] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_mean_fs_load | memory utilization | -0.29 | [-0.32, -0.25] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs | memory utilization | -0.40 | [-0.65, -0.15] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_cpu | % cpu utilization | -0.54 | [-3.41, +2.34] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -1.72 | [-2.68, -0.75] | 1 | Logs bounds checks dashboard |
Bounds Checks: ❌ Failed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 697 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 244.00MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 727 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.16GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.21GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.17GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.19GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 143.42MiB ≤ 147MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 469.66MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 178.42MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 346.97 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 369.79MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_security_idle | cpu_usage | 10/10 | 24.93 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_idle | memory_usage | 10/10 | 285.01MiB ≤ 330MiB | bounds checks dashboard |
| ❌ | quality_gate_security_mean_fs_load | cpu_usage | 0/10 | 53.96 > 40 | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | memory_usage | 10/10 | 269.44MiB ≤ 320MiB | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | cpu_usage | 10/10 | 23.14 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | memory_usage | 10/10 | 274.91MiB ≤ 320MiB | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
❌ Failed. Some Quality Gates were violated.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check cpu_usage: 0/10 replicas passed. Failed 10 which is > 0. Gate FAILED.
16accf1 to
6bc92d5
Compare
There was a problem hiding this comment.
This will likely need some iteration after merge. Right now this works well considering I'm manually triggering the Quality Gates.
Ideally, we use data from regression detector runs off of main (ideally with some kind of tag that we can filter on) and nothing from CI itself as we don't want PR data to influence things.
| /test/benchmarks/apm_scripts/ @DataDog/agent-apm | ||
| /test/regression/ @DataDog/single-machine-performance | ||
| /test/regression/cases/docker_containers* @DataDog/single-machine-performance @DataDog/container-integrations | ||
| /test/regression/cases/quality_gate_security_* @DataDog/single-machine-performance @DataDog/agent-security |
There was a problem hiding this comment.
Let me know if you folks want exclusive ownership of the quality gates. I started with joint ownership but I'd love to be able to fully hand this over.
| # Must exceed open_per_second × run_duration_seconds so that lading never | ||
| # exhausts the tree during a capture. Once all nodes exist on disk, opens | ||
| # become O_RDONLY and are rejected by CWS kernel-side flag approvers. | ||
| max_nodes: 500000 |
There was a problem hiding this comment.
This is something I'm going to follow up on. We'll need to change how file system load gets generated so that there's a way to continuously generate unique files instead of using a cached set of data (I can elaborate if folks have questions).
Right now, this is a workaround that works just fine. Just not ideal.
| # become O_RDONLY and are rejected by CWS kernel-side flag approvers. | ||
| max_nodes: 500000 | ||
| open_per_second: 41 | ||
| rename_per_second: 1 |
There was a problem hiding this comment.
Ideally I'd like to get this to 0 since opens are the majority of traffic. This is a limitation in lading's current implementation.
Something that I'll be following up as well.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2226bd8fdd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| @@ -0,0 +1,87 @@ | |||
| --- | |||
| name: explain-lading-config | |||
There was a problem hiding this comment.
Should there be a skill in Lading that this skill delegates to? Is that possible to do across repositories, given that a prerequisite for this skill seems to be a clone of the Lading repository?
There was a problem hiding this comment.
Should there be a skill in Lading that this skill delegates to?
Yes. 100% agree.
Is that possible to do across repositories, given that a prerequisite for this skill seems to be a clone of the Lading repository?
Maybe. TBD.
All I know is that this has proven to be very useful for me and I'd like to figure out how to expose this more globally. Don't know how yet. I know that I don't want to copy-paste this skill in every repo.
fwiw I'd like to get the lading CLI to be a sub-command of the SMP CLI and if the SMP CLI is brew tappable, suddenly we can get "this" into a lot of people's hands.
I don't know if we should try to package skills or we should try to push as much functionality into the lading CLI itself. I think the latter, that way it's deterministic and users can wrap the CLI with skills themselves if the CLI exposes enough.
| @@ -0,0 +1,87 @@ | |||
| --- | |||
| name: explain-lading-config | |||
There was a problem hiding this comment.
Should there be a skill in Lading that this skill delegates to?
Yes. 100% agree.
Is that possible to do across repositories, given that a prerequisite for this skill seems to be a clone of the Lading repository?
Maybe. TBD.
All I know is that this has proven to be very useful for me and I'd like to figure out how to expose this more globally. Don't know how yet. I know that I don't want to copy-paste this skill in every repo.
fwiw I'd like to get the lading CLI to be a sub-command of the SMP CLI and if the SMP CLI is brew tappable, suddenly we can get "this" into a lot of people's hands.
I don't know if we should try to package skills or we should try to push as much functionality into the lading CLI itself. I think the latter, that way it's deterministic and users can wrap the CLI with skills themselves if the CLI exposes enough.
There was a problem hiding this comment.
I aggressively tested this by exercising all possible inputs, results are here: https://datadoghq.atlassian.net/wiki/x/uoMTigE
test matrix — 11 cases covering the various invocation paths:
Happy paths
1. Exact name → quality_gate_idle
2. Substring match → security_mean
3. Glob with ≤4 matches → *dogstatsd* (3 hits → AskUserQuestion path)
4. Exact name of disabled experiment → ddot_traces (x-disabled-cases flag)
5. Direct file path (absolute)
Error / edge paths
6. No match, no near-match → totally_fake_xyz
7. Typo triggering "did you mean?" → quality_gate_idlee
8. Ambiguous with >4 matches → quality_gate (bulleted list path)
9. No arguments → full listing path (>4 → bulleted list)
10. Non-existent file path → /tmp/nonexistent-xyz.yaml
11. Existing non-lading file → /etc/hosts
I'll run them one at a time so we can observe each outcome...
Summary
All 11 invocation paths exercised successfully:
┌─────┬─────────────────────┬────────┬──────────────────────────┬────────┐
│ # │ Case │ Script │ Skill behavior │ Result │
│ │ │ exit │ │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 1 │ Exact name │ 0 │ Resolved + explained │ PASS │
│ │ quality_gate_idle │ │ │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 2 │ Substring │ 0 │ Resolved unambiguously + │ PASS │
│ │ security_mean │ │ explained │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 3 │ Glob *dogstatsd* (3 │ 3 │ AskUserQuestion with 3 │ PASS │
│ │ hits) │ │ options │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ │ Disabled │ │ Resolved + flagged │ │
│ 4 │ ddot_traces │ 0 │ x-disabled-cases (also │ PASS │
│ │ │ │ caught a schema drift) │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 5 │ Direct absolute │ 0 │ Passed through + │ PASS │
│ │ path │ │ explained │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 6 │ Bogus │ 2 │ Not-found, no │ PASS │
│ │ totally_fake_xyz │ │ suggestions, stop │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 7 │ Typo │ 2 │ "did you mean?" → │ PASS │
│ │ quality_gate_idlee │ │ AskUserQuestion │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 8 │ Broad quality_gate │ 3 │ Plain bulleted list, no │ PASS │
│ │ (7 hits) │ │ AskUserQuestion │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 9 │ No argument (31 │ 0 │ Plain bulleted list │ PASS │
│ │ configs) │ │ │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 10 │ Non-existent path │ 2 │ Direct-file branch "not │ PASS │
│ │ │ │ found" │ │
├─────┼─────────────────────┼────────┼──────────────────────────┼────────┤
│ 11 │ Non-lading file │ 2 │ Non-lading guard fires │ PASS │
│ │ /etc/hosts │ │ │ │
└─────┴─────────────────────┴────────┴──────────────────────────┴────────┘
There was a problem hiding this comment.
I think for this PR, that level of testing makes sense.
Outside of scope for this PR, I'd like to see skills in the repo go through automated evals (behavioral first, then also triggering evals if we want AIs to call these skills). I'll reach out to some people internally to see what I can find.
There was a problem hiding this comment.
I'm going to bring it up with Agent DevX as well. I'd like to know the immediate term stance on how to review skills and the acceptance criteria.
Otherwise, it puts reviewers in a bind.
There was a problem hiding this comment.
The system-probe.yaml is good, this one should only contain runtime_security_config.enabled=true (activity_dump and remote_config should only be present system-probe side)
Same for the other 2 experiments
There was a problem hiding this comment.
Thanks for the callout, fixed!
| Two SMP experiments form a pair. Both run with the same custom `default.policy`; the axis that distinguishes them is whether lading generates filesystem load: | ||
|
|
||
| - **`quality_gate_security_no_fs_load`** — CWS enabled, custom `default.policy`, `generator: []`. Measures the floor for this policy: background event noise and policy-loaded memory footprint with no application-generated filesystem events. | ||
| - **`quality_gate_security_mean_fs_load`** — CWS enabled, custom `default.policy`, `file_tree` generator. Measures overhead under a production-representative mean filesystem load. |
There was a problem hiding this comment.
Is mean an interesting level of load? I would expect this to be a severely left skewed distribution, which mean will under-represent. Could we capture a higher percentile of the observed workload?
There was a problem hiding this comment.
I had a look. To give an idea of how skewed that distribution is, some hosts top out above 400k write events per second.
I got there with this query: top(max:datadog.runtime_security.perf_buffer.events.write{event_type:open} by {host}.as_rate(), 100, 'max', 'desc')
More interesting is something along these lines: percentile(max:datadog.runtime_security.perf_buffer.events.write{event_type:open} by {host}.as_rate(), 'p95', { * })
There was a problem hiding this comment.
Also, fwiw, I've been using this notebook to help visualize some of what we're looking at: https://app.datadoghq.com/notebook/13998267/cws-quality-gates?cell-eh89gz4d-from_ts=1776088673376&cell-eh89gz4d-refresh_mode=sliding&cell-eh89gz4d-to_ts=1776693473376&refresh_mode=paused&tpl_var_event_type=%2A&tpl_var_experiment=quality_gate_security_idle&utc_override=false&from_ts=1776710092745&to_ts=1776713692745
Is mean an interesting level of load?
I think it is. Being able to say "on average, this is the cost" is interesting to me. Having said that, being able to say "what's the expected cost at the 95th percentile" is also interesting. I could see us having both.
I also think mean is a lot easier to reason about and visualize in DataDog than doing a percentile of the maxes on hosts. What I would really like is for the underlying metric to be a distribution so we could use a percentile directly. Alas, it's not.
Personally, I'd like to revisit this later this year once I have a better understanding of COAT and other telemetry. I'd like to provide tooling that works for any Quality Gate. This is very much an intermediate step till then.
I'm going to push back a little bit on this ask for now. Folks from CWS were open to mean as an initial QG target. I'd like to get them started with mean and in the meeting next week with CWS, I'll make sure to encourage them to adapt the QGs according to the use cases they deem most important. I'll communicate that we'll/I'll be available to assist.
e75da91 to
6b980eb
Compare
This comment has been minimized.
This comment has been minimized.
Ishirui
left a comment
There was a problem hiding this comment.
I'm reviewing specifically the agent-devx-owned files, i.e. the Claude skills you are adding:
- Could you split this into a different PR ? This one is big enough as-is
- Could you add yourselves as CODEOWNERS for those skills ? We in Agent DevX don't have much context on how these work (esp. w.r.t
lading) 😅 - I think there was already a comment mentioning this, but it would be better imo to have these skills living in
lading, with maybe a skill in the Agent that delegates to the lading skill. - Would it be possible to avoid using complex bash scripts here ? Anything longer than a few lines should imo be in a more easily testable language, maybe an invoke task. I think there is also quite a bit of logic (e.g. resolving the git top-level) that should be put in a "library" (and might even already exist in
tasks/libs, haven't checked) to avoid duplication. - In general, I think we should avoid using AI for skills, or even intermediate skills steps, that can be replaced by standard scripting. In the analysis skill for instance, steps 1 through 4 are purely deterministic and do not require an LLM IIUC.
Sorry for the big set of comments, but we're still figuring out our policies regarding new skills in the repo, and would rather not have to do expensive cleanup to get everything up to standard once they are fully determined 🙏
Since everything you commented is about the skills, I'll move them out into their own PR and open a fresh PR with you folks. That will address 1, I'll address 2. I'll take the rest offline. |
e57817b to
fe74a2b
Compare
|
I've decided to split this PR into 3:
This is so that the quality gates are separated from the introduced skills. I'll be poking folks for reviews/approvals on those PRs. |

What does this PR do?
Adds three SMP regression quality gates for CWS (Workload Protection) under
test/regression/cases/:quality_gate_security_idle— CWS on, shipped default policy, no lading generator. Baseline "turn it on and leave it alone" floor.quality_gate_security_no_fs_load— CWS on, experiment-supplieddefault.policy, no generator. Isolates policy + approver overhead at zero application load.quality_gate_security_mean_fs_load— CWS on, same experimentdefault.policy,file_treegenerator sized to org2's per-host meanperf_buffer.events.writerate.Each cases enforces a memory & cpu bound.
Also adds two Claude skills used to author and maintain these gates:
.claude/skills/explain-lading-config/SKILL.md.claude/skills/analyze-quality-gate-security-mean-fs-load-experiment/SKILL.md— compares lading config vs. SMP capture vs. productionevent_type:openfor the mean-FS-load gateRemoves the
file_treeexperiment:Motivation
CWS lacks quality-gate coverage in SMP. The three gates form a ladder — idle → no-load-with-policy → mean-FS-load — so a memory regression can be attributed to policy loading, approver overhead, or event-processing overhead rather than lumped together. The mean-FS-load rate (
open_per_second: 41) reflects the org2 per-host weekly mean forperf_buffer.events.writeso the gate tracks production rather than an arbitrary stressor.Describe how you validated your changes
/analyze-quality-gate-security-mean-fs-load-experimentto confirm the lading-configured open rate, the SMP-captured open rate, and production per-host weekly open mean match within noise.The majority of the work involved understanding how things work: what the CWS SLOs are, what they measure, what that is/means in production, configuring the agent to properly exercise the load we're putting on it, etc.
Additional Notes
file_treemax_nodes: 500000is sized to exceedopen_per_second × run_durationso the tree never exhausts mid-capture (once all nodes exist, subsequent opens areO_RDONLYand rejected by CWS flag approvers).activity_dump.enabled: falsein bothsecurity-agent.yamlandsystem-probe.yaml— activity dump is being reworked and adds unpredictable kernel-event volume.