[Logs Stateful Encoding] transport layer initial implementation by TheSafo · Pull Request #48879 · DataDog/datadog-agent

TheSafo · 2026-04-03T18:31:50Z

What does this PR do?

Implements the "transport" layer of the new gRPC based "stateful encoding" logs protocol (DataDog/agent-payload#443). This explicitly doesn't connect the change to existing pipeline/configuration or pattern extraction code to keep the PR a more reviewable size - follow up PRs will implement that.

Motivation

Partial implementation of experimental new logs protocol.

Describe how you validated your changes

We have tested the full feature end to end against a real Intake as well as in the Lading regression test setup.

How to Review this PR

This change is extracted from #43321 which has the full stateful encoding setup. This PR specifically extracts pkg/logs/sender/grpc (without hooking it up to config or integrating pattern extraction) to turn the transport layer into a more reviewable chunk.

./pkg/logs/sender/grpc/DESIGN.md is included to document how this code works.

Github now includes file filters on the "Files changed" tab to ignore all the go.mod / go.sum changes link

Additional Notes

Same as #46583 but pointed at the new feature branch

https://datadoghq.atlassian.net/browse/EPIN-2660

Hooks gRPC stream to Logs agent pipeline

During flow-control/back-pressure, a slow/blocking Send won't block the supervisor loop

…nsport Note the current implementation delta-encode at batch-level, which doesn't require protobuf change. We still make the protobuf change in case we want to extend the delta-encoding at stream-level in the future

…port

agent-platform-auto-pr · 2026-04-03T18:39:33Z

Go Package Import Differences

Baseline: a903409
Comparison: 136b0e3

binary	os	arch	change
agent	linux	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
agent	linux	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
agent	windows	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
agent	darwin	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
agent	darwin	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
iot-agent	linux	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
iot-agent	linux	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
heroku-agent	linux	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
cluster-agent	linux	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
cluster-agent	linux	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
cluster-agent-cloudfoundry	linux	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
cluster-agent-cloudfoundry	linux	arm64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
dogstatsd	linux	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
dogstatsd	linux	arm64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
process-agent	linux	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
process-agent	linux	arm64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
process-agent	windows	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
process-agent	darwin	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
process-agent	darwin	arm64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
heroku-process-agent	linux	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
security-agent	linux	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
security-agent	linux	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
security-agent	windows	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
system-probe	linux	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
system-probe	linux	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
system-probe	windows	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
system-probe	darwin	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
system-probe	darwin	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
otel-agent	linux	amd64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
otel-agent	linux	arm64	+2, -0 +github.com/DataDog/agent-payload/v5/statefulpb +github.com/DataDog/datadog-agent/pkg/logs/sender/grpc
privateactionrunner	linux	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
privateactionrunner	linux	arm64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
privateactionrunner	windows	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
privateactionrunner	darwin	amd64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb
privateactionrunner	darwin	arm64	+1, -0 +github.com/DataDog/agent-payload/v5/statefulpb

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 136b0e3399

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-03T19:24:31Z

+			statefulInputChan := make(chan *message.StatefulMessage, pkgconfigsetup.Datadog().GetInt("logs_config.message_channel_size"))
+			grpcsender.StartMessageTranslator(inputChan, statefulInputChan)
+
+			return grpcsender.NewBatchStrategy(statefulInputChan, outputChan, flushChan, endpoints.BatchWait, endpoints.BatchMaxSize, endpoints.BatchMaxContentSize, "logs", encoder, pipelineMonitor, instanceID)


Keep translator and strategy channels in one shutdown domain

In gRPC mode this function starts a long-lived translator from inputChan to statefulInputChan, but the returned strategy only owns (and closes) statefulInputChan on Stop(). Pipeline.Stop() stops the processor first, yet that does not close inputChan, so the translator can remain alive indefinitely; if it is still forwarding buffered messages when statefulInputChan is closed, it will panic on send to a closed channel. This makes shutdown unsafe and leaks goroutines whenever logs_config.use_grpc is enabled.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-03T19:24:31Z

+	s.backoffTimer.Stop()
+	s.drainTimer.Stop()
+	if s.batchToSendCh != nil {
+		close(s.batchToSendCh)


Prevent double-close of stream send channel during shutdown

tryBeginStreamRotation closes s.batchToSendCh when a stream fails, but the field is left non-nil; if shutdown happens while the worker is in connecting/disconnected/backoff before a new channel is installed, handleShutdown closes the same channel again and panics (close of closed channel). This is a crash path during stop under transient stream failures.

Useful? React with 👍 / 👎.

agent-platform-auto-pr · 2026-04-03T19:40:25Z

Files inventory check summary

File checks results against ancestor a9034095:

Results for datadog-agent_7.79.0~devel.git.448.136b0e3.pipeline.106013071-1_amd64.deb:

No change detected

agent-platform-auto-pr · 2026-04-03T19:51:40Z

Static quality checks

❌ Please find below the results from static quality gates
Comparison made with ancestor a903409
📊 Static Quality Gates Dashboard
🔗 SQG Job

Error

	Quality gate	Change	Size (prev → curr → max)
❌	agent_deb_amd64 (on disk)	+735.41 KiB (0.10% increase)	753.022 → 753.740 → 753.380
❌	agent_deb_amd64_fips (per-PR threshold)	+688.59 KiB (0.09% increase)	709.961 → 710.634 → 713.900
❌	agent_msi (per-PR threshold)	+914.16 KiB (0.15% increase)	604.911 → 605.804 → 651.440
❌	agent_rpm_amd64 (on disk)	+735.41 KiB (0.10% increase)	753.006 → 753.724 → 753.350
❌	agent_rpm_amd64_fips (per-PR threshold)	+688.59 KiB (0.09% increase)	709.945 → 710.617 → 713.880
❌	agent_rpm_arm64 (per-PR threshold)	+719.47 KiB (0.10% increase)	731.423 → 732.125 → 735.290
❌	agent_rpm_arm64_fips (per-PR threshold)	+668.59 KiB (0.09% increase)	691.387 → 692.039 → 696.840
❌	agent_suse_amd64 (on disk)	+735.41 KiB (0.10% increase)	753.006 → 753.724 → 753.350
❌	agent_suse_amd64_fips (per-PR threshold)	+688.59 KiB (0.09% increase)	709.945 → 710.617 → 713.880
❌	agent_suse_arm64 (per-PR threshold)	+719.47 KiB (0.10% increase)	731.423 → 732.125 → 735.290
❌	agent_suse_arm64_fips (per-PR threshold)	+668.59 KiB (0.09% increase)	691.387 → 692.039 → 696.840
❌	docker_agent_amd64 (per-PR threshold)	+735.41 KiB (0.09% increase)	813.325 → 814.044 → 815.700
❌	docker_agent_arm64 (per-PR threshold)	+719.47 KiB (0.09% increase)	816.512 → 817.215 → 821.970
❌	docker_agent_jmx_amd64 (per-PR threshold)	+735.41 KiB (0.07% increase)	1004.241 → 1004.959 → 1006.580
❌	docker_agent_jmx_arm64 (per-PR threshold)	+719.47 KiB (0.07% increase)	996.207 → 996.909 → 1001.570

Gate failure full details

Quality gate	Error type	Error message
agent_deb_amd64	StaticQualityGateFailed	static_quality_gate_agent_deb_amd64 failed! Disk size 753.7 MB exceeds limit of 753.4 MB by 368.8 KB
agent_deb_amd64_fips	PerPRThresholdExceeded	On-disk size increase of 688.59 KiB exceeds the per-PR threshold of 600.0 KiB
agent_msi	PerPRThresholdExceeded	On-disk size increase of 914.16 KiB exceeds the per-PR threshold of 600.0 KiB
agent_rpm_amd64	StaticQualityGateFailed	static_quality_gate_agent_rpm_amd64 failed! Disk size 753.7 MB exceeds limit of 753.3 MB by 382.9 KB
agent_rpm_amd64_fips	PerPRThresholdExceeded	On-disk size increase of 688.59 KiB exceeds the per-PR threshold of 600.0 KiB
agent_rpm_arm64	PerPRThresholdExceeded	On-disk size increase of 719.47 KiB exceeds the per-PR threshold of 600.0 KiB
agent_rpm_arm64_fips	PerPRThresholdExceeded	On-disk size increase of 668.59 KiB exceeds the per-PR threshold of 600.0 KiB
agent_suse_amd64	StaticQualityGateFailed	static_quality_gate_agent_suse_amd64 failed! Disk size 753.7 MB exceeds limit of 753.3 MB by 382.9 KB
agent_suse_amd64_fips	PerPRThresholdExceeded	On-disk size increase of 688.59 KiB exceeds the per-PR threshold of 600.0 KiB
agent_suse_arm64	PerPRThresholdExceeded	On-disk size increase of 719.47 KiB exceeds the per-PR threshold of 600.0 KiB
agent_suse_arm64_fips	PerPRThresholdExceeded	On-disk size increase of 668.59 KiB exceeds the per-PR threshold of 600.0 KiB
docker_agent_amd64	PerPRThresholdExceeded	On-disk size increase of 735.41 KiB exceeds the per-PR threshold of 600.0 KiB
docker_agent_arm64	PerPRThresholdExceeded	On-disk size increase of 719.47 KiB exceeds the per-PR threshold of 600.0 KiB
docker_agent_jmx_amd64	PerPRThresholdExceeded	On-disk size increase of 735.41 KiB exceeds the per-PR threshold of 600.0 KiB
docker_agent_jmx_arm64	PerPRThresholdExceeded	On-disk size increase of 719.47 KiB exceeds the per-PR threshold of 600.0 KiB

Static quality gates prevent the PR to merge!
You can check the static quality gates confluence page for guidance. We also have a toolbox page available to list tools useful to debug the size increase.
Please either fix the size violation or request an exception.

Successful checks

Info

	Quality gate	Change	Size (prev → curr → max)
✅	agent_heroku_amd64	+178.97 KiB (0.06% increase)	313.356 → 313.531 → 320.580
✅	docker_cluster_agent_amd64	+174.84 KiB (0.08% increase)	203.933 → 204.104 → 206.270
✅	docker_cluster_agent_arm64	+230.84 KiB (0.10% increase)	218.357 → 218.582 → 220.000
✅	docker_dogstatsd_amd64	+47.51 KiB (0.12% increase)	39.234 → 39.281 → 39.380
✅	docker_dogstatsd_arm64	+67.53 KiB (0.18% increase)	37.445 → 37.511 → 37.940
✅	dogstatsd_deb_amd64	+39.53 KiB (0.13% increase)	29.886 → 29.924 → 30.610
✅	dogstatsd_deb_arm64	+43.53 KiB (0.15% increase)	28.034 → 28.077 → 29.110
✅	dogstatsd_rpm_amd64	+39.53 KiB (0.13% increase)	29.886 → 29.924 → 30.610
✅	dogstatsd_suse_amd64	+39.53 KiB (0.13% increase)	29.886 → 29.924 → 30.610
✅	iot_agent_deb_amd64	+174.88 KiB (0.39% increase)	43.239 → 43.410 → 44.290
✅	iot_agent_deb_arm64	+174.84 KiB (0.42% increase)	40.286 → 40.456 → 41.920
✅	iot_agent_deb_armhf	+178.49 KiB (0.42% increase)	41.033 → 41.208 → 42.100
✅	iot_agent_rpm_amd64	+174.88 KiB (0.39% increase)	43.239 → 43.410 → 44.290
✅	iot_agent_suse_amd64	+174.88 KiB (0.39% increase)	43.239 → 43.410 → 44.290

2 successful checks with minimal change (< 2 KiB)

	Quality gate	Current Size
✅	docker_cws_instrumentation_amd64	7.142 MiB
✅	docker_cws_instrumentation_arm64	6.689 MiB

On-wire sizes (compressed)

	Quality gate	Change	Size (prev → curr → max)
❌	agent_deb_amd64	+139.72 KiB (0.08% increase)	174.750 → 174.886 → 178.360
❌	agent_deb_amd64_fips	+161.08 KiB (0.10% increase)	165.352 → 165.510 → 172.790
❌	agent_msi	+196.0 KiB (0.14% increase)	138.414 → 138.605 → 146.220
❌	agent_rpm_amd64	+60.4 KiB (0.03% increase)	177.595 → 177.654 → 181.830
❌	agent_rpm_amd64_fips	+30.08 KiB (0.02% increase)	167.672 → 167.701 → 173.370
❌	agent_rpm_arm64	-75.01 KiB (0.05% reduction)	159.574 → 159.501 → 163.060
❌	agent_rpm_arm64_fips	+194.63 KiB (0.13% increase)	151.401 → 151.591 → 156.170
❌	agent_suse_amd64	+60.4 KiB (0.03% increase)	177.595 → 177.654 → 181.830
❌	agent_suse_amd64_fips	+30.08 KiB (0.02% increase)	167.672 → 167.701 → 173.370
❌	agent_suse_arm64	-75.01 KiB (0.05% reduction)	159.574 → 159.501 → 163.060
❌	agent_suse_arm64_fips	+194.63 KiB (0.13% increase)	151.401 → 151.591 → 156.170
❌	docker_agent_amd64	+216.16 KiB (0.08% increase)	268.185 → 268.396 → 272.480
❌	docker_agent_arm64	+194.33 KiB (0.07% increase)	255.382 → 255.572 → 261.060
❌	docker_agent_jmx_amd64	+209.67 KiB (0.06% increase)	336.840 → 337.045 → 341.100
❌	docker_agent_jmx_arm64	+200.88 KiB (0.06% increase)	320.012 → 320.208 → 325.620
✅	agent_heroku_amd64	+38.6 KiB (0.05% increase)	75.017 → 75.054 → 79.970
✅	docker_cluster_agent_amd64	+34.48 KiB (0.05% increase)	71.368 → 71.402 → 72.920
✅	docker_cluster_agent_arm64	+42.91 KiB (0.06% increase)	66.996 → 67.038 → 68.220
✅	docker_cws_instrumentation_amd64	neutral	2.999 MiB → 3.330
✅	docker_cws_instrumentation_arm64	neutral	2.729 MiB → 3.090
✅	docker_dogstatsd_amd64	+4.25 KiB (0.03% increase)	15.174 → 15.179 → 15.820
✅	docker_dogstatsd_arm64	+12.77 KiB (0.09% increase)	14.487 → 14.500 → 14.830
✅	dogstatsd_deb_amd64	+6.32 KiB (0.08% increase)	7.893 → 7.899 → 8.790
✅	dogstatsd_deb_arm64	+7.17 KiB (0.10% increase)	6.778 → 6.785 → 7.710
✅	dogstatsd_rpm_amd64	+4.95 KiB (0.06% increase)	7.904 → 7.909 → 8.800
✅	dogstatsd_suse_amd64	+4.95 KiB (0.06% increase)	7.904 → 7.909 → 8.800
✅	iot_agent_deb_amd64	+35.94 KiB (0.31% increase)	11.391 → 11.426 → 13.040
✅	iot_agent_deb_arm64	+31.99 KiB (0.32% increase)	9.712 → 9.744 → 11.450
✅	iot_agent_deb_armhf	+29.79 KiB (0.29% increase)	9.931 → 9.960 → 11.620
✅	iot_agent_rpm_amd64	+37.7 KiB (0.32% increase)	11.407 → 11.444 → 13.060
✅	iot_agent_suse_amd64	+37.7 KiB (0.32% increase)	11.407 → 11.444 → 13.060

cit-pr-commenter-54b7da · 2026-04-03T20:00:47Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 24ec8ef2-405c-45ab-972e-b3cfbcfe65b9

Baseline: a903409
Comparison: 136b0e3
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	+1.03	[-2.04, +4.10]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_logs	% cpu utilization	+3.95	[+2.29, +5.60]	1	Logs bounds checks dashboard
➖	docker_containers_cpu	% cpu utilization	+1.03	[-2.04, +4.10]	1	Logs
➖	otlp_ingest_logs	memory utilization	+0.79	[+0.69, +0.88]	1	Logs
➖	otlp_ingest_metrics	memory utilization	+0.63	[+0.47, +0.78]	1	Logs
➖	docker_containers_memory	memory utilization	+0.53	[+0.44, +0.61]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	+0.46	[+0.32, +0.61]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	+0.43	[+0.20, +0.65]	1	Logs
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	+0.35	[+0.29, +0.42]	1	Logs
➖	ddot_metrics	memory utilization	+0.33	[+0.15, +0.51]	1	Logs
➖	ddot_logs	memory utilization	+0.29	[+0.23, +0.34]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	+0.26	[+0.23, +0.30]	1	Logs bounds checks dashboard
➖	quality_gate_idle	memory utilization	+0.14	[+0.09, +0.19]	1	Logs bounds checks dashboard
➖	ddot_metrics_sum_delta	memory utilization	+0.11	[-0.06, +0.28]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	+0.05	[-0.17, +0.26]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	+0.02	[-0.19, +0.23]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	+0.01	[-0.43, +0.44]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.00	[-0.11, +0.11]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	-0.02	[-0.41, +0.38]	1	Logs
➖	file_tree	memory utilization	-0.02	[-0.08, +0.04]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	-0.02	[-0.56, +0.52]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.03	[-0.16, +0.10]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-0.08	[-0.26, +0.10]	1	Logs
➖	quality_gate_metrics_logs	memory utilization	-0.69	[-0.92, -0.46]	1	Logs bounds checks dashboard

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	docker_containers_cpu	simple_check_run	10/10	704 ≥ 26
✅	docker_containers_memory	memory_usage	10/10	274.13MiB ≤ 370MiB
✅	docker_containers_memory	simple_check_run	10/10	692 ≥ 26
✅	file_to_blackhole_0ms_latency	memory_usage	10/10	0.19GiB ≤ 1.20GiB
✅	file_to_blackhole_0ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10	0.23GiB ≤ 1.20GiB
✅	file_to_blackhole_1000ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_100ms_latency	memory_usage	10/10	0.19GiB ≤ 1.20GiB
✅	file_to_blackhole_100ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_500ms_latency	memory_usage	10/10	0.21GiB ≤ 1.20GiB
✅	file_to_blackhole_500ms_latency	missed_bytes	10/10	0B = 0B
✅	quality_gate_idle	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	172.86MiB ≤ 181MiB	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	491.58MiB ≤ 550MiB	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	201.63MiB ≤ 220MiB	bounds checks dashboard
✅	quality_gate_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	357.58 ≤ 2000	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	431.65MiB ≤ 475MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.

DDuongNguyen · 2026-04-07T18:07:26Z

+	// Currently this is treated as stream error, which will trigger a stream rotation
+	// and retry of the same payload, which loops on. this IS NOT the desired behavior.
+	// TODO: Implement proper handling of irrecoverable errors, by blocking the ingestion
+	log.Infof("Worker %s: irrecoverable error detected: %s", s.workerID, reason)


We should eventually make this either a metric or a log metric to track why some errors happen

Ack. I do have improved metrics in a follow up PR. An open question i have though is what metrics should be shared w/ HTTP vs isolated between the two

Top of my head i think the following should be shared:

Bytes sent/ bytes dropped

If possible Compression Ratio

Logs component utilization (But i think this might be impossible since 1 is a long lived connection while the other one is per instance)

And the following i think is beneficial to track stateful encoding behavior:

Patterns added/removed

Inflight bytes (backpressure health)

Total state size bytes? (track total state footprints (patterns+tokens+tags*number of pipeline used), but im not sure if we should just count this in a hypothetical sense instead of actually tracking it)

@ddrthall what do you think?

e63bacb#diff-f6b32aa8d2763ef6abcf429dd989e58459cd22700852f21645501fa44f3f736c is what I have so far to followup with.

joyzhang-dd and others added 14 commits April 3, 2026 13:29

Initial implementation of stateful gRPC stream for Logs agent

1abfad8

Hooks gRPC stream to Logs agent pipeline

Handle state management and snapshot transmission

d94e3cb

Use github.com/DataDog/agent-payload/v5/statefulpb as proto

95b9e28

Implement backoff strategy at stream level covering common failures

9e0f435

Seperate out stream Send into its own goroutine

f7cf975

During flow-control/back-pressure, a slow/blocking Send won't block the supervisor loop

Adding compression for stateful encoding transport

2185fb0

add per-worker metrics

c83ad68

delta encoding timestamps as int64s instead of uint64

abea64f

Added a Readme detailing the implementation details of stateful trans…

09f5812

…port

add gomod, dda inv tidy

77e0261

fix go test -race via atomic.Int32 for streamstate

4a7c6fe

agent-payload version bump

c471ae3

updated todos/placeholder docs

f0a6172

TheSafo added 2 commits April 3, 2026 14:54

Fix linter issues, bad rebase

d9ec77d

generate license

136b0e3

TheSafo changed the title ~~Jsaf/stateful transport init~~ [Logs Stateful Encoding] transport layer initial implementation Apr 3, 2026

TheSafo marked this pull request as ready for review April 3, 2026 19:18

TheSafo requested review from a team as code owners April 3, 2026 19:18

TheSafo requested review from jose-manuel-almaza and rahulkaukuntla April 3, 2026 19:18

chatgpt-codex-connector Bot reviewed Apr 3, 2026

View reviewed changes

github-actions Bot added the long review PR is complex, plan time to review it label Apr 3, 2026

DDuongNguyen reviewed Apr 7, 2026

View reviewed changes

DDuongNguyen mentioned this pull request Apr 17, 2026

feat(logs): max template size limit + json_as_raw raw bypass #49552

Draft

This was referenced Apr 20, 2026

local proto plus servic #49628

Draft

better json #49672

Draft

DDuongNguyen mentioned this pull request Apr 23, 2026

fix(logs): staging fidelity fixes — tabs, newlines, Unicode positions, UTF-8 safety #49810

Draft

This was referenced May 4, 2026

flat-log-encoding #50321

Draft

extedned json nesting #50347

Draft

jsaf/flatten-json-encoding-values #50349

Draft

json presence #50402

Draft

inflight fixes #50530

Draft

jsaf/updated-inflight-eviction #50557

Draft

Conversation

TheSafo commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Describe how you validated your changes

How to Review this PR

Additional Notes

Uh oh!

agent-platform-auto-pr Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Go Package Import Differences

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

agent-platform-auto-pr Bot commented Apr 3, 2026

Files inventory check summary

Results for datadog-agent_7.79.0~devel.git.448.136b0e3.pipeline.106013071-1_amd64.deb:

Uh oh!

agent-platform-auto-pr Bot commented Apr 3, 2026

Static quality checks

Error

Info

Uh oh!

cit-pr-commenter-54b7da Bot commented Apr 3, 2026

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

Uh oh!

DDuongNguyen Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

TheSafo Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

DDuongNguyen Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

TheSafo Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TheSafo commented Apr 3, 2026 •

edited

Loading

agent-platform-auto-pr Bot commented Apr 3, 2026 •

edited

Loading