Skip to content

Propagate opentelemetry spans across remote connections.#74

Open
rread wants to merge 5 commits inton0-computer:mainfrom
rread:rread/tracing
Open

Propagate opentelemetry spans across remote connections.#74
rread wants to merge 5 commits inton0-computer:mainfrom
rread:rread/tracing

Conversation

@rread
Copy link
Copy Markdown

@rread rread commented Oct 16, 2025

The current span context is extracted and serialized with the message, and deserialized in read_request_raw() and saved in thread local storage. The generated RemoteService now creates a new span based on the message variant name, and sets the new span's parent to the extracted context.

Fixes #71

Copy link
Copy Markdown
Member

@Frando Frando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The feature sounds useful.

  • With this PR the wire format of a protocol changes based on feature flags for irpc. We can't do this unfortunately: You would never want your wire messages to change because some dependency enabled a feature on irpc which will - due to feature unification - affect all crates in the workspace dependency tree. So, we need a different approach here. I'll have to do some thinking how best to do this.

  • The PR makes the code generation dependant on feature flags on the proc macro crate. I think this is a no-go: Due to feature unification you can never know for sure if and where a feature got enabled. So this should be based on configuration via attributes on the rpc_requests macro instead.

  • The opentelemetry integration should be behind a separate feature use-tracing-opentelemetry or similar

@rread
Copy link
Copy Markdown
Author

rread commented Jan 15, 2026

Thanks for your review. I updated this to make OpenTelemetry span propagation opt-in per service instead of being globally enabled via feature flags.

Changes

  • Added span_propagation attribute to the rpc_requests macro - services that want span context propagation now need to explicitly opt in: #[rpc_requests(message = MyMessage, span_propagation)]
  • Moved OpenTelemetry deps behind a new use-tracing-opentelemetry feature instead of bundling them with rpc
  • Wire format is now controlled per-service via Service::SPAN_PROPAGATION - services without span_propagation skip the context wrapper
  • Removed the spans and rpc feature flags from irpc-derive - the generated code now calls a helper function that handles the cfg logic internally

@rread rread force-pushed the rread/tracing branch 2 times, most recently from a13164a to dcc2dd5 Compare January 20, 2026 21:17
@rread rread requested a review from Frando January 27, 2026 13:41
@rread rread force-pushed the rread/tracing branch 2 times, most recently from 83eb3bf to 51a993a Compare January 30, 2026 16:58
@rread
Copy link
Copy Markdown
Author

rread commented Jan 30, 2026

Fixed the CI errors.

@jgrund
Copy link
Copy Markdown
Contributor

jgrund commented Feb 9, 2026

Any chance this can make the next release? I'd like to use this to get distributed traces of my app

@rklaehn
Copy link
Copy Markdown
Collaborator

rklaehn commented Feb 11, 2026

Any chance this can make the next release? I'd like to use this to get distributed traces of my app

I will let @Frando push the button. But thanks a lot, this is very neat.

Copy link
Copy Markdown
Member

@Frando Frando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your continued work on this. The impl now looks much better!

  • I'm wondering if the thread-local storage based approach is correct? irpc is usually run in a multi-threaded tokio runtime, so things can move between threads. Wouldn't that mean that the span context can end up being stored on a different thread than where it's being read from?
    I'll think a bit if we can pass the span context without a thread local, that would be preferably IMO.

  • irpc-iroh should also get the use-tracing-opentelemetry feature, if enabled it should propagate the feature to irpc

  • I think we can rename the feature flag to just tracing-opentelemetry if we set resolver = 2 in the Cargo.toml. Edit: I think we need to switch to edition 2024 for that

  • Please add docs for the span_propagation flag to the docs of the rpc_requests macro (in irpc/src/lib.rs)

  • We should add an end-to-end test to verify this all works as intended, and so that we don't break the feature accidentally in the future.

@Frando
Copy link
Copy Markdown
Member

Frando commented Feb 12, 2026

I tried to get this to work, but did not succeed. Maybe I don't know enough about the interaction between tracing and opentelemetry.

I pushed a branch that contains a test example, and adds more debug logging:
https://github.com/rread/irpc/compare/rread/tracing...n0-computer:irpc:Frando/span-propagation-test?expand=1

If you check out this Frando/span-propagation-test branch, you can run my test with much debug logging with:

RUST_LOG=irpc=trace,irpc_iroh=trace cargo test -p irpc-iroh --lib --all-features -- span_propagation --nocapture

I would have expected to get the req_id = 23 from the client spans somewhere in a server span. However, I couldn't find it anywhere in the debug output (the test does not actually test anything for now).

Or did I get something wrong what this feature enables?

@rread
Copy link
Copy Markdown
Author

rread commented Feb 13, 2026

Thanks for the feedback, I'll work on the suggestions and take a look at your test.

IIRC resolver 2 only requires edition 2021.

@rread
Copy link
Copy Markdown
Author

rread commented Mar 20, 2026

The test needed to setup propagation and also create a trace provider and exported. I've included it here and used the InMemorySpanExporter so we can assert the server span has the same trace_id as the client span. Note, extra fields like "req_id" are not propagated. Only the OpenTelemetry context (trace ID and span ID) is serialized and propagated to the server.

I also updated the feature name and added docs for the span_propagation option, and other fixes.

It still using thread local memory. We could avoid that but it would likely be a more invasive change.

@rread rread requested a review from Frando March 20, 2026 13:02
@jgrund
Copy link
Copy Markdown
Contributor

jgrund commented Apr 9, 2026

Hi, any updates on this one? Sorry to nag, but I'd really like to see associated traces when I run my app :)

@rread rread force-pushed the rread/tracing branch from 4570e04 to 661800a Compare May 4, 2026 13:57
@rread
Copy link
Copy Markdown
Author

rread commented May 4, 2026

I rebased to fix the Cargo.toml conflicts.

@rklaehn
Copy link
Copy Markdown
Collaborator

rklaehn commented May 5, 2026

Hi @rread. Sorry about the conflicts.

It all looks good, but the thread local storage needs to go. It is fundamentally broken if there is the possibility of a yield point between the storage and the use. Basically: you store something in thread local storage, then you yield, tokio schedules the task on another OS thread, and you get whatever in the thread local storage slot, but not the original.

This will not show up in a test that just runs through, and it also probably won't show up with a single threaded runtime. But for a server process that does a lot of stuff concurrently (whether on a single threaded or multithreaded runtime) it will break.

Tokio is using such hidden state in a lot of places (I don't like this very much, but they do...), but what they use is task-local storage. I got a robot generated branch where I am using this, and it works even in a heavily multithreaded test.

https://docs.rs/tokio/latest/tokio/macro.task_local.html

Basically as far as I can see there are two ways to solve this:

  1. use task_local as a drop in replacement for thread_local.
  2. change the public API and add some kind of context that carries the needed state without having to resort to any kind of ...local storage.

Since this is an optional API that most people won't use I would probably prefer 1. Then if we got more state to drag around we can think about 2 in the future.

Here is some claude slop that shows the issue. Take of this what you want: https://github.com/n0-computer/irpc/tree/rklaehn/fix-thread-local

@rread
Copy link
Copy Markdown
Author

rread commented May 5, 2026

That was very helpful, thanks. I updated the patch to use task_local. I also added a concurrent test (confirmed it also failed on with thread_local) and a new example to help folks use this. I also took the liberty of adding another line to the ci check the new feature, hope that is ok.

I see there some new conflicts so I will go resolve those.

rread added 5 commits May 5, 2026 13:22
Adds a new `rpc_request` feature `span_propagation` and a newcrate
feature `use-tracing-opentelemetry`.When enalbed, the current span
context is extracted and serialized with the message,and deserialized in
`read_request_raw()` and saved in thread localstorage.  The generated
`RemoteService` creates a new span based onthe message variant name, and
sets the new span's parent to theextracted context.
Added OTEL setup:
 -- propagator
 -- tracer provider
- update resolver to 2
- change feature name to tracing-opentelemetry
- add span_propagation option docs
* replace thread_local with task_local
* add concurrent integration test based on the sequential unit testr
* add an example for how to use this
* update ci with check for tracing-opentelemetry feature
@rread rread force-pushed the rread/tracing branch from 4e1aa4a to f6ff439 Compare May 5, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support trace context propagation for rpc

4 participants