Skip to content

Conversation

@paullegranddc
Copy link
Collaborator

Motivation

This feature is inspired from the go tracer DataDog/dd-trace-go#2188

It's hard to understand and locate spans that are never finished. This PR adds a debug mode to the tracer that will track the age and root span name of traces, and

  • log warnings from time to time if the traces are older than some amount of time.
  • log warning if some traces are still open during tracer shutdown

Implementation

This feature should not add any cost to the tracer if it is not enabled, but requires storing some extra data associated with each trace.

In order to not use any extra memory if the feature is not enabled, I track it in an additional registry which is only used in debug mode.

This registry tracks the root span name, and the age of the trace.

The debug mode is controlled by 2 new configurations:

  • DD_TRACE_DEBUG_OPEN_SPANS
  • DD_TRACE_OPEN_SPAN_TIMEOUT

In order to test this feature correctly, I added extra code to intercept and store logs during integration tests. This is done through a thread local Logger, which is overridden and propagated locally during tests.

Everything is hidden behind the test-utils feature and should thus be zero cost

# Motivation

This feature is inspired from the go tracer DataDog/dd-trace-go#2188

It's hard to understand and locate spans that are never finished.
This PR adds a debug mode to the tracer that will track the age and root span name of traces, and
* log warnings from time to time if the traces are older than some amount of time.
* log warning if some traces are still open during tracer shutdown

# Implementation

This feature should not add any cost to the tracer if it is not enabled, but requires storing some extra data associated with each trace.

In order to not use any extra memory if the feature is not enabled, I track it in an additional registry which is only used in debug mode.

This registry tracks the root span name, and the age of the trace.

The debug mode is controlled by 2 new configurations:
* DD_TRACE_DEBUG_OPEN_SPANS
* DD_TRACE_OPEN_SPAN_TIMEOUT

In order to test this feature correctly, I added extra code to intercept and store logs during integration tests.
This is done through a thread local Logger, which is overridden and propagated locally during tests.

Everything is hidden behind the test-utils feature and should thus be zero cost
@paullegranddc paullegranddc requested a review from a team as a code owner November 6, 2025 16:23
"propertyKeys": ["trace_debug_open_spans"]
}
],
"DD_TRACE_OPEN_SPAN_TIMEOUT": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is in the wrong alphabetical order. I also think it's weird that it's not named in the same way as DD_TRACE_DEBUG_OPEN_SPANS, since it's related. Naming it DD_TRACE_DEBUG_OPEN_SPANS_TIMEOUT would make more sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I picked these names because they were the ones used by the go tracer for it's debug mode
I'll probably change the timeout one to DD_TRACE_DEBUG_OPEN_SPANS_TIMEOUT, as I don't think it's that bad to have different configs between languages in this specific case

Comment on lines +1047 to +1049
#[cfg(feature = "test-utils")]
wait_agent_info_ready: default.wait_agent_info_ready,
span_metrics_interval: default.span_metrics_interval,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is missing here a #[cfg(feature = "test-utils")]?

Suggested change
#[cfg(feature = "test-utils")]
wait_agent_info_ready: default.wait_agent_info_ready,
span_metrics_interval: default.span_metrics_interval,
#[cfg(feature = "test-utils")]
wait_agent_info_ready: default.wait_agent_info_ready,
#[cfg(feature = "test-utils")]
span_metrics_interval: default.span_metrics_interval,

Comment on lines +1510 to +1512
#[cfg(feature = "test-utils")]
wait_agent_info_ready: false,
span_metrics_interval: Duration::from_secs(10),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here?

Suggested change
#[cfg(feature = "test-utils")]
wait_agent_info_ready: false,
span_metrics_interval: Duration::from_secs(10),
#[cfg(feature = "test-utils")]
wait_agent_info_ready: false,
#[cfg(feature = "test-utils")]
span_metrics_interval: Duration::from_secs(10),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants