Propagate opentelemetry spans across remote connections.#74
Propagate opentelemetry spans across remote connections.#74rread wants to merge 5 commits inton0-computer:mainfrom
Conversation
There was a problem hiding this comment.
Thanks! The feature sounds useful.
-
With this PR the wire format of a protocol changes based on feature flags for
irpc. We can't do this unfortunately: You would never want your wire messages to change because some dependency enabled a feature onirpcwhich will - due to feature unification - affect all crates in the workspace dependency tree. So, we need a different approach here. I'll have to do some thinking how best to do this. -
The PR makes the code generation dependant on feature flags on the proc macro crate. I think this is a no-go: Due to feature unification you can never know for sure if and where a feature got enabled. So this should be based on configuration via attributes on the
rpc_requestsmacro instead. -
The opentelemetry integration should be behind a separate feature
use-tracing-opentelemetryor similar
|
Thanks for your review. I updated this to make OpenTelemetry span propagation opt-in per service instead of being globally enabled via feature flags. Changes
|
a13164a to
dcc2dd5
Compare
83eb3bf to
51a993a
Compare
|
Fixed the CI errors. |
|
Any chance this can make the next release? I'd like to use this to get distributed traces of my app |
I will let @Frando push the button. But thanks a lot, this is very neat. |
There was a problem hiding this comment.
Thanks for your continued work on this. The impl now looks much better!
-
I'm wondering if the thread-local storage based approach is correct? irpc is usually run in a multi-threaded tokio runtime, so things can move between threads. Wouldn't that mean that the span context can end up being stored on a different thread than where it's being read from?
I'll think a bit if we can pass the span context without a thread local, that would be preferably IMO. -
irpc-irohshould also get theuse-tracing-opentelemetryfeature, if enabled it should propagate the feature toirpc -
I think we can rename the feature flag to just
tracing-opentelemetryif we setresolver = 2in theCargo.toml. Edit:I think we need to switch to edition 2024 for that -
Please add docs for the
span_propagationflag to the docs of therpc_requestsmacro (in irpc/src/lib.rs) -
We should add an end-to-end test to verify this all works as intended, and so that we don't break the feature accidentally in the future.
|
I tried to get this to work, but did not succeed. Maybe I don't know enough about the interaction between tracing and opentelemetry. I pushed a branch that contains a test example, and adds more debug logging: If you check out this I would have expected to get the Or did I get something wrong what this feature enables? |
|
Thanks for the feedback, I'll work on the suggestions and take a look at your test. IIRC resolver 2 only requires edition 2021. |
|
The test needed to setup propagation and also create a trace provider and exported. I've included it here and used the InMemorySpanExporter so we can assert the server span has the same trace_id as the client span. Note, extra fields like "req_id" are not propagated. Only the OpenTelemetry context (trace ID and span ID) is serialized and propagated to the server. I also updated the feature name and added docs for the It still using thread local memory. We could avoid that but it would likely be a more invasive change. |
|
Hi, any updates on this one? Sorry to nag, but I'd really like to see associated traces when I run my app :) |
|
I rebased to fix the |
|
Hi @rread. Sorry about the conflicts. It all looks good, but the thread local storage needs to go. It is fundamentally broken if there is the possibility of a yield point between the storage and the use. Basically: you store something in thread local storage, then you yield, tokio schedules the task on another OS thread, and you get whatever in the thread local storage slot, but not the original. This will not show up in a test that just runs through, and it also probably won't show up with a single threaded runtime. But for a server process that does a lot of stuff concurrently (whether on a single threaded or multithreaded runtime) it will break. Tokio is using such hidden state in a lot of places (I don't like this very much, but they do...), but what they use is task-local storage. I got a robot generated branch where I am using this, and it works even in a heavily multithreaded test. https://docs.rs/tokio/latest/tokio/macro.task_local.html Basically as far as I can see there are two ways to solve this:
Since this is an optional API that most people won't use I would probably prefer 1. Then if we got more state to drag around we can think about 2 in the future. Here is some claude slop that shows the issue. Take of this what you want: https://github.com/n0-computer/irpc/tree/rklaehn/fix-thread-local |
|
That was very helpful, thanks. I updated the patch to use task_local. I also added a concurrent test (confirmed it also failed on with thread_local) and a new example to help folks use this. I also took the liberty of adding another line to the ci check the new feature, hope that is ok. I see there some new conflicts so I will go resolve those. |
Adds a new `rpc_request` feature `span_propagation` and a newcrate feature `use-tracing-opentelemetry`.When enalbed, the current span context is extracted and serialized with the message,and deserialized in `read_request_raw()` and saved in thread localstorage. The generated `RemoteService` creates a new span based onthe message variant name, and sets the new span's parent to theextracted context.
Added OTEL setup: -- propagator -- tracer provider
* replace thread_local with task_local * add concurrent integration test based on the sequential unit testr * add an example for how to use this * update ci with check for tracing-opentelemetry feature
The current span context is extracted and serialized with the message, and deserialized in read_request_raw() and saved in thread local storage. The generated RemoteService now creates a new span based on the message variant name, and sets the new span's parent to the extracted context.
Fixes #71