-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[AGENTCFG-626] Adding a consumer component for streaming configuration updates from the core agent to remote agents registered via RAR (attempt 2) #50385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
gh-worker-dd-mergequeue-cf854d
merged 9 commits into
main
from
rahul/config-stream-consumer-attempt-2
May 8, 2026
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
d2b7bc1
[AGENTCFG-626] Adding a consumer component for streaming configuratio…
rahulkaukuntla 62c6e7f
avoiding a repeat of incident-53989 and incident-53990
rahulkaukuntla 5862fb5
change config option to remote_agent.configstream.consumer.enabled an…
rahulkaukuntla 2af3ff7
update schema
rahulkaukuntla d0be095
lint
rahulkaukuntla 262106f
move gazelle exclude commands to top-level bazel file
rahulkaukuntla 821efc8
Merge branches 'rahul/config-stream-consumer-attempt-2' and 'main' of…
rahulkaukuntla 825e666
Update BUILD.bazel
aiuto d28bb2e
trigger ci
rahulkaukuntla File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,143 @@ | ||
| # Config Stream Consumer Component | ||
|
|
||
| A shared Go library for remote agents (system-probe, trace-agent, process-agent, etc.) to consume configuration streams from the core Datadog Agent. It provides gRPC connection management, snapshot gating, and ordered config application, writing received settings directly into the agent's `config.Component`. | ||
|
|
||
| ## Overview | ||
|
|
||
| - **Real-time config**: Receive full snapshot then incremental updates from the core agent over gRPC. | ||
| - **RAR-gated**: Only registered remote agents can subscribe; session ID is required (fixed or via `SessionIDProvider`). | ||
| - **Readiness gating**: `Start` blocks until the first config snapshot is received, aborting startup if `Params.ReadyTimeout` (default: 60s) is exceeded. | ||
| - **Single source of truth**: Streamed config is written into `config.Component` via `model.Writer`. Callers read config through `config.Component` directly — not through this component. | ||
| - **Ordered updates**: Sequential application by sequence ID; stale updates dropped, discontinuities trigger resync. | ||
| - **Restart safety**: `lastSeqID` is never reset on reconnect. If the core agent restarts and its sequence counter resets, the consumer logs an error and refuses the new snapshot until the sub-process itself restarts. | ||
| - **Telemetry**: Metrics for time-to-first-snapshot, reconnects, sequence ID, and dropped updates. | ||
|
|
||
| ## Architecture | ||
|
|
||
| Producer (core agent) and consumer (remote agents) communicate over the same gRPC contract: | ||
|
|
||
| ``` | ||
| ┌─────────────────────────┐ ┌─────────────────────────┐ | ||
| │ Core Agent Process │ │ Remote Agent Process │ | ||
| │ │ │ (e.g. system-probe) │ | ||
| │ ┌──────────────────┐ │ │ ┌──────────────────┐ │ | ||
| │ │ configstream │ │ gRPC │ │ configstream- │ │ | ||
| │ │ (producer) │◄──┼──────────┼─►│ consumer │ │ | ||
| │ │ │ │ stream │ │ │ │ | ||
| │ └──────────────────┘ │ │ └──────────────────┘ │ | ||
| └─────────────────────────┘ └─────────────────────────┘ | ||
| ``` | ||
|
|
||
| **Flow:** | ||
|
|
||
| 1. Remote agent registers with RAR and obtains `session_id` (or supplies it via `SessionIDProvider`). | ||
| 2. Consumer connects to core agent and calls `StreamConfigEvents` with `session_id` in gRPC metadata. | ||
| 3. Core agent validates the session and sends an initial snapshot, then streams incremental updates. | ||
| 4. Consumer applies snapshot/updates in order and writes them into `config.Component` via `model.Writer`. | ||
|
|
||
| See `../configstream/README.md` for the producer side and the gRPC/protobuf contract. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| Supply **either** a fixed `SessionID` **or** a `SessionIDProvider` (e.g. from the remote agent component). The consumer uses the provider at connect time so RAR can register first. | ||
|
|
||
| ## Wiring guide | ||
|
|
||
| ### Only include the module when the feature is enabled | ||
|
|
||
| Including `configstreamconsumerfx.Module()` when config streaming is disabled will abort FX startup. Gate on `remote_agent.configstream.consumer.enabled` before building FX options: | ||
|
|
||
| ```go | ||
| if cfg.GetBool("remote_agent.configstream.consumer.enabled") { | ||
| opts = append(opts, configstreamFxOptions()) | ||
| } | ||
| ``` | ||
|
|
||
| ### Full example | ||
|
|
||
| ```go | ||
| func configstreamFxOptions() fx.Option { | ||
| return fx.Options( | ||
| // Bridge config.Component to model.Writer so the consumer can write streamed config. | ||
| fx.Provide(func(c config.Component) model.Writer { return c }), | ||
|
|
||
| // Provide the SessionIDProvider from the remote agent (blocks until RAR registration). | ||
| fx.Provide(func(ra remoteagent.Component) configstreamconsumerimpl.SessionIDProvider { | ||
| if ra == nil { | ||
| return nil | ||
| } | ||
| if p, ok := ra.(configstreamconsumerimpl.SessionIDProvider); ok { | ||
| return p | ||
| } | ||
| return nil | ||
| }), | ||
|
|
||
| // Provide Params — only reached when configstream is known to be enabled. | ||
| fx.Provide(func(c config.Component, deps struct { | ||
| fx.In | ||
| SessionProvider configstreamconsumerimpl.SessionIDProvider `optional:"true"` | ||
| }) *configstreamconsumerimpl.Params { | ||
| host := c.GetString("cmd_host") | ||
| port := c.GetInt("cmd_port") | ||
| if port <= 0 { | ||
| port = 5001 | ||
| } | ||
| return &configstreamconsumerimpl.Params{ | ||
| ClientName: "my-agent", | ||
| CoreAgentAddress: net.JoinHostPort(host, strconv.Itoa(port)), | ||
| SessionIDProvider: deps.SessionProvider, | ||
| } | ||
| }), | ||
|
|
||
| configstreamconsumerfx.Module(), | ||
| // Force instantiation so Start runs and blocks until the first snapshot. | ||
| fx.Invoke(func(_ configstreamconsumer.Component) {}), | ||
| ) | ||
| } | ||
| ``` | ||
|
|
||
| ## Requirements | ||
|
|
||
| - **Core agent**: configstream component (always on by default) and RAR enabled (`remote_agent.registry.enabled: true`). | ||
| - **Consumer opt-in**: Set `remote_agent.configstream.consumer.enabled: true` on the remote agent to enable this component. | ||
| - **RAR**: Remote agent must register with RAR before subscribing; pass `session_id` via gRPC metadata (supply fixed `SessionID` or `SessionIDProvider` with `WaitSessionID(ctx) (string, error)`). | ||
| - **IPC**: mTLS and auth token for gRPC (same as other core-agent IPC). | ||
| - **`model.Writer`**: `config.Component` must be explicitly provided as `model.Writer` in the same FX scope. Streamed settings are written using the same source the core agent assigned (e.g. `SourceDefault`, `SourceFile`, `SourceEnvVar`), preserving the original priority semantics on the remote process. | ||
|
|
||
| ## Telemetry | ||
|
|
||
| | Metric | Type | Description | | ||
| |--------|------|-------------| | ||
| | `configstream_consumer.time_to_first_snapshot_seconds` | Gauge | Time to receive first snapshot | | ||
| | `configstream_consumer.reconnect_count` | Counter | Stream reconnections | | ||
| | `configstream_consumer.last_sequence_id` | Gauge | Last received config sequence ID | | ||
| | `configstream_consumer.dropped_stale_updates` | Counter | Stale updates dropped | | ||
|
|
||
| ## Testing | ||
|
|
||
| ### Manual testing with system-probe | ||
|
|
||
| 1. Start the core agent with RAR and config stream enabled. | ||
| 2. Set `cmd_host` / `cmd_port` in the config used by system-probe. | ||
| 3. Start system-probe. You should see: | ||
| - `Waiting for initial configuration from core agent...` | ||
| - After snapshot: `Initial configuration received from core agent. Starting system-probe.` | ||
| 4. If the core agent is down or the stream never sends a snapshot, system-probe exits with: `waiting for initial config snapshot: context deadline exceeded`. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| - **session_id required in metadata** | ||
| Ensure the remote agent registers with RAR first and that the consumer is given either a fixed `SessionID` or a `SessionIDProvider` that returns the session ID. | ||
|
|
||
| - **Startup timeout (no snapshot received within `ReadyTimeout`)** | ||
| Core agent must be running, config stream enabled, and RAR returning a valid session. Check core agent logs for config stream and RAR errors. | ||
|
|
||
| - **"core agent may have restarted" error in logs** | ||
| The consumer received a snapshot with a lower sequence ID than its last-known value, indicating the core agent restarted. Restart the sub-process to accept the new configuration. | ||
|
|
||
| ## Related documentation | ||
|
|
||
| - **Producer**: `../configstream/README.md` — core agent config streaming service and gRPC contract. | ||
| - **Test client**: `cmd/config-stream-client/README.md` — standalone client for end-to-end testing. | ||
|
|
||
| **Team**: agent-configuration |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| load("@rules_go//go:def.bzl", "go_library") | ||
|
|
||
| go_library( | ||
| name = "def", | ||
| srcs = ["component.go"], | ||
| importpath = "github.com/DataDog/datadog-agent/comp/core/configstreamconsumer/def", | ||
| visibility = ["//visibility:public"], | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| // Unless explicitly stated otherwise all files in this repository are licensed | ||
| // under the Apache License Version 2.0. | ||
| // This product includes software developed at Datadog (https://www.datadoghq.com/). | ||
| // Copyright 2025-present Datadog, Inc. | ||
|
|
||
| // Package configstreamconsumer implements a component that consumes config streams from the core agent. | ||
| // | ||
| // team: agent-configuration | ||
| package configstreamconsumer | ||
|
|
||
| import ( | ||
| "context" | ||
| "time" | ||
| ) | ||
|
|
||
| // SessionIDProvider supplies the RAR session ID, typically after registration completes. | ||
| // When set, the consumer will call WaitSessionID at connect time instead of using Params.SessionID. | ||
| type SessionIDProvider interface { | ||
| WaitSessionID(ctx context.Context) (string, error) | ||
| } | ||
|
|
||
| // Params defines the parameters for the configstreamconsumer component | ||
| type Params struct { | ||
| // ClientName is the identity of this remote agent (e.g., "system-probe", "trace-agent") | ||
| ClientName string | ||
| // CoreAgentAddress is the address of the core agent IPC endpoint | ||
| CoreAgentAddress string | ||
| // SessionID is the RAR session ID for authorization. Required if SessionIDProvider is nil. | ||
| SessionID string | ||
| // SessionIDProvider supplies the session ID at connect time (e.g. from remote agent component). | ||
| // When set, SessionID may be empty; the consumer will block on WaitSessionID before connecting. | ||
| SessionIDProvider SessionIDProvider | ||
| // ReadyTimeout is how long OnStart blocks waiting for the first config snapshot before | ||
| // returning an error and aborting startup. Defaults to 60s when zero. | ||
| ReadyTimeout time.Duration | ||
| } | ||
|
|
||
| // Component is the config stream consumer component interface. | ||
| // Its sole purpose is to receive configuration from the core agent stream and write it | ||
| // into the local config.Component via the model.Writer provided at construction. | ||
| // Callers that need to read config or subscribe to changes should use config.Component directly. | ||
| // Readiness is guaranteed by the FX lifecycle: start blocks until the first snapshot is received. | ||
| type Component interface{} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have the same problem with
remote_agent.registry.enabled? If it's false then ADP will fail?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No,
configstreamdoesn't check if theremote_agentconfig options are set anymore--go-based remote agents are now assigned the responsibility of checking that these options are set. ADP should be fine as-is.