Add explain-lading-config skill#50374
Add explain-lading-config skill#50374gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intomainfrom
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Files inventory check summaryFile checks results against ancestor 762b1e43: Results for datadog-agent_7.80.0~devel.git.517.3c15b6f.pipeline.111754522-1_amd64.deb:No change detected |
Static quality checks✅ Please find below the results from static quality gates 32 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 655f1ba Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.96 | [+0.97, +6.95] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.96 | [+0.97, +6.95] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | +0.73 | [-0.26, +1.71] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_logs | memory utilization | +0.30 | [+0.20, +0.40] | 1 | Logs |
| ➖ | quality_gate_security_mean_fs_load | memory utilization | +0.16 | [+0.12, +0.20] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle | memory utilization | +0.14 | [+0.10, +0.19] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.08 | [+0.04, +0.13] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.06 | [+0.01, +0.11] | 1 | Logs |
| ➖ | quality_gate_security_idle | memory utilization | +0.04 | [-0.03, +0.11] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics | memory utilization | +0.03 | [-0.17, +0.22] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.02 | [-0.17, +0.21] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.00 | [-0.08, +0.08] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.01 | [-0.21, +0.19] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | -0.03 | [-0.46, +0.41] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.03 | [-0.43, +0.37] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.03 | [-0.23, +0.17] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | -0.04 | [-0.59, +0.51] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -0.05 | [-0.11, +0.00] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.07 | [-0.21, +0.06] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.09 | [-0.20, +0.03] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -0.12 | [-0.33, +0.10] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.13 | [-0.36, +0.11] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.15 | [-0.32, +0.01] | 1 | Logs |
| ➖ | quality_gate_security_no_fs_load | memory utilization | -0.36 | [-0.46, -0.27] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_metrics | memory utilization | -0.47 | [-0.62, -0.32] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | -0.64 | [-0.90, -0.39] | 1 | Logs bounds checks dashboard |
Bounds Checks: ❌ Failed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 695 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 241.93MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 582 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.16GiB ≤ 1.20GiB | |
| ❌ | file_to_blackhole_0ms_latency | missed_bytes | 8/10 | 500MiB > 0MiB | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.21GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.17GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.18GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 140.16MiB ≤ 147MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 466.62MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 175.80MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 356.60 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 390.41MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_security_idle | cpu_usage | 10/10 | 26.01 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_idle | memory_usage | 10/10 | 289.96MiB ≤ 330MiB | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | cpu_usage | 10/10 | 52.04 ≤ 70 | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | memory_usage | 10/10 | 266.20MiB ≤ 320MiB | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | cpu_usage | 10/10 | 21.12 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | memory_usage | 10/10 | 283.49MiB ≤ 320MiB | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
7177145 to
cb02849
Compare
ed82d28 to
3e76efa
Compare
cb02849 to
23b3368
Compare
3e76efa to
c0afd84
Compare
23b3368 to
11959ed
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11959ed7c7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # 1) Direct path — resolve and return if the file exists AND looks like a | ||
| # lading config. We reject arbitrary existing files (e.g. /etc/hosts) | ||
| # to avoid the downstream explainer operating on something unrelated. | ||
| if [[ "$arg" == */* || "$arg" == *.yaml ]]; then |
There was a problem hiding this comment.
Allow slash-containing experiment names to be resolved
The resolver currently treats any argument containing / as a filesystem path before trying experiment-name matching, so valid listed experiment names like tcp_rr/host and tcp_rr/vm fail with not found unless the caller passes a full file path. This breaks the documented flow of selecting by experiment name and makes disambiguation impossible for split-mode cases whose names intentionally include /.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I'll implement a fix shortly.
There was a problem hiding this comment.
This is for split mode lading support.
There was a problem hiding this comment.
@goxberry I'm going to hold off on this. I'd like to follow-up in the next few days. This is a non-trivial change and I don't want it holding up the stacked PR on top of this.
There was a problem hiding this comment.
Alright, just pushed some changes so that it ignores the ebpf experiments for now. I'll follow up here.
| # config path. This handles both the common `<case>/lading/lading.yaml` | ||
| # layout and split-mode `<case>/<variant>/lading/lading.yaml` layouts — the | ||
| # reported name is `<case>` for the former and `<variant>` for the latter, | ||
| # with `<case>/<variant>` disambiguating split-mode rows in the listing. |
There was a problem hiding this comment.
Would it make sense to document the non-zero exit codes returned? I don't know how much value we'd get out of it, but I imagine that if a human developer -- maybe even an AI -- wanted to audit the script in the event they got suspicious output from the skill, such documentation might be useful?
There was a problem hiding this comment.
Ah, I see the documentation later on in the PR, in SKILL.md (https://github.com/DataDog/datadog-agent/pull/50374/changes#diff-f26167bc789f544351958c535aaac2c09d9d7285e2d141fe80b916ad808be6e9R50-R63).
| reading whole files, use this invariant: every default in lading follows the | ||
| pattern `#[serde(default = "default_foo")]` → `fn default_foo() -> T { ... }`. |
There was a problem hiding this comment.
I wonder if we could enforce this invariant in lading static analysis. (Not a blocker, just idle musing.)
There was a problem hiding this comment.
I'd like that and I think @blt introduced some level of custom linting in lading. Could totally see us doing this.
| - Default values for any omitted fields. **Always resolve the default to a | ||
| concrete value**, not just the function name — the user wants to know | ||
| what actually runs. Follow `#[serde(default = "default_foo")]` → the body | ||
| of `fn default_foo()`, or the `impl Default` block, and report the literal | ||
| (e.g. `block_cache_method: Fixed (via lading_payload::block::default_cache_method)`). | ||
| If the default is a nested struct with its own defaults, recurse one level; | ||
| cite further nested defaults by path rather than expanding the whole tree. |
There was a problem hiding this comment.
More idle musing: I wonder if it would make sense to have a tool that takes a lading config as input and outputs an equivalent lading.yaml config in a single, flat file with all settings made explicit.
11959ed to
2fd3f25
Compare
| - Exit non-zero: the script prints a suggested `git clone` command on stderr. | ||
| Relay that to the user and stop. |
There was a problem hiding this comment.
Why not ask confirmation and let the model do the clone itself ?
There was a problem hiding this comment.
No particular reason. Personally i have lading cloned & ready, and I didn't exercise this part of the code path. This is one of those, could see us go either way.
Want me to change it?
| - `dogstatsd` → `lading_payload/src/dogstatsd.rs` | ||
| - `opentelemetry_metrics` → `lading_payload/src/opentelemetry/metric.rs` | ||
| - `opentelemetry_logs` → `lading_payload/src/opentelemetry/log.rs` | ||
| - `datadog_logs` → `lading_payload/src/datadog_logs.rs` | ||
| **Variant serialization forms:** | ||
| - `variant: "syslog5424"` (plain string) — the enum variant carries no | ||
| config fields (unit/empty struct). There are no knobs to explain; the | ||
| module itself encodes all behaviour. | ||
| - `variant: { opentelemetry_metrics: {} }` (mapping with empty body) — | ||
| the variant has a `Config` struct and is using `Config::default()`. | ||
| Follow `impl Default for Config` and any nested `Default` impls. | ||
| - `variant: { dogstatsd: { contexts: …, kind_weights: … } }` — explicit | ||
| field overrides; report them alongside the defaults for any omitted | ||
| sibling fields. |
There was a problem hiding this comment.
nit: maybe this could be offloaded into a companion file for easier editing
There was a problem hiding this comment.
I'm not sure what the companion file would be used for here. I don't see us editing this file?
2fd3f25 to
3c15b6f
Compare
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|

What does this PR do?
Adds a Claude Code skill
/explain-lading-configthat explains a lading.yaml regression test config, grounded in the lading Rust source as the source of truth for field meanings and defaults.Motivation
Lading configs in
test/regression/cases/.../lading/lading.yamluse defaults defined in the lading source, so a config file alone doesn't tell you what actually runs. Pulling defaults out of the Rust source by hand is repetitive; this skill codifies the workflow (resolve config -> grep source -> resolve defaults to concrete values) so it can be invoked from any session.Describe how you validated your changes
resolve-lading-config.shagainst the regression suite with no argument, with exact names, with substrings, with globs, with disabled cases, and with bogus inputs, and confirmed exit codes (0/2/3/4) match whatSKILL.mddocuments.validate-lading-checkout.shwith and without~/dd/ladingpresent and withLADING_DIRoverridden.x-disabled-casesconfigs to confirm the explanation flow handles each layout.Additional Notes
.claude/skills/explain-lading-configis set to@DataDog/single-machine-performance.~/dd/lading(overridable viaLADING_DIR); the validation script prints agit clonehint when it isn't there.resolve-lading-config.shreports split-mode experiments as<case>/<variant>so they don't collide with non-split rows in the listing.