fix: prevent OTel context leak in fire-and-forget background tasks by iamemilio · Pull Request #5168 · llamastack/llama-stack

iamemilio · 2026-03-16T17:47:48Z

What's the problem?

When you look at a trace in Jaeger, you expect it to show what happened during a single request. Instead, we found traces that looked like this during load testing:

A request that took 5 seconds showed a trace lasting 62 seconds
That trace contained 2,594 spans, including 334 database writes that belonged to completely different requests

The trace was essentially garbage -- you couldn't tell what actually happened during the request vs. what leaked in from other requests happening at the same time.

Why does this happen?

The server uses background worker tasks to write data to the database without blocking the API response. These workers are long-lived -- they start up once and process a shared queue forever.

The problem is how Python's asyncio.create_task works: it copies all context variables (including the OpenTelemetry trace context) at the moment the task is created. So whichever API request happens to first trigger worker creation permanently stamps its trace ID onto that worker. Every database write the worker processes from that point forward -- regardless of which request it came from -- gets attributed to that original request's trace.

Request A arrives → spawns worker → worker inherits trace A
Request B arrives → enqueues work → worker processes it under trace A  ← wrong!
Request C arrives → enqueues work → worker processes it under trace A  ← wrong!
...forever

How does this fix it?

Two changes working together:

1. Workers start with a clean slate.
A new helper (create_task_with_detached_otel_context) creates the worker task with an empty trace context, so it doesn't permanently inherit any request's identity.

2. Each queue item carries its own trace context.
When a request enqueues work, it snapshots its current trace context and attaches it to the queue item. When the worker picks up that item, it temporarily activates the captured context for the duration of that work, then returns to a clean state before processing the next item.

Request A arrives → enqueues work with trace A context
Request B arrives → enqueues work with trace B context

Worker (no trace) → picks up item A → activates trace A → writes to DB → deactivates
                  → picks up item B → activates trace B → writes to DB → deactivates

The result: each database write shows up under the correct request's trace. No inflation, no cross-contamination.

What changed?

File	What it does
`core/task.py` (new)	Three utilities: `create_task_with_detached_otel_context` (start tasks clean), `capture_otel_context` (snapshot current context), `activate_otel_context` (temporarily restore a captured context)
`inference_store.py`	Queue items now carry the OTel context; workers activate it per-item before writing
`openai_responses.py`	Same pattern for the responses background worker

How is this tested?

14 new tests across three files:

test_task.py (9 tests) -- validates the primitives: detached tasks get clean context, captured context can be re-activated, context flows correctly through a queue, and two requests don't contaminate each other
test_inference_store.py (2 tests) -- end-to-end with a real SQLite-backed InferenceStore: simulates two API requests, lets the queue + workers process the writes, and asserts each write lands in the correct trace (this directly reproduces the original bug)
test_responses_background.py (3 tests) -- same validation for the responses worker, plus a test proving that error-handling DB writes (marking a response as failed) are also attributed to the correct trace

Test plan

All 14 new unit tests pass
All existing unit tests unaffected
Inference and Responses API tests that use in memory OTEL span collectors pass

mergify · 2026-03-16T17:48:50Z

This pull request has merge conflicts that must be resolved before it can be merged. @iamemilio please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

asyncio.create_task copies all contextvars at creation time, which caused long-lived worker tasks to permanently inherit the first request's trace context. This inflated traces from 5s to 62s and attributed 334 unrelated DB writes to a single request's trace. Fix: Workers start with a detached (empty) OTel context via create_task_with_detached_otel_context. Each queue item carries a snapshot of the originating request's context (capture_otel_context), and workers temporarily activate it per-item (activate_otel_context) so every DB write is attributed to the correct request trace with no cross-contamination. Applied to both InferenceStore and OpenAIResponsesImpl background workers. Made-with: Cursor

iamemilio · 2026-03-16T19:05:22Z

Context Leak from 50 concurrent responses create requests with background=True from before this patch. Note the 1965 spans:

iamemilio · 2026-03-16T19:09:44Z

After the patch, same load, all the spans are now the same length:

src/llama_stack/providers/inline/agents/builtin/responses/openai_responses.py

Replace private _otel_context kwarg with typed _BackgroundWorkItem dataclass (responses) and _WriteItem NamedTuple (inference store) to cleanly separate transport metadata from business data. Made-with: Cursor

jaideepr97 · 2026-03-18T15:09:13Z

thanks for the detailed explanation in the PR description @iamemilio, it is super helpful
I don't have much experience or context in the observability domain so I enlisted the help of claude to help review/build context
it pointed out that there are a few more instances of asyncio.create_task being used in these places during vector store operations that aren't fixed by this PR

would these also need to be updated to use your helper instead?

iamemilio · 2026-03-18T15:20:47Z

I am not sure, its hard to say without testing it, but maybe we can create an issue and action item to test these other code locations as a follow up action item just so the responses work isn't blocked on it?

jaideepr97 · 2026-03-18T16:22:58Z

I am not sure, its hard to say without testing it, but maybe we can create an issue and action item to test these other code locations as a follow up action item just so the responses work isn't blocked on it?

yeah fair
I suppose those locations are not reachable from the responses API interface after all

iamemilio · 2026-03-18T16:31:26Z

Yeah, sounds good. Thanks for the leads though, that should be helpful to follow up on and clean up.

iamemilio · 2026-03-18T23:15:51Z

Follow up filed: #5205

jaideepr97

lgtm
thanks @iamemilio

cdoern · 2026-03-20T14:54:43Z

@Mergifyio backport release-0.6.x

mergify · 2026-03-20T14:55:08Z

backport release-0.6.x

✅ Backports have been created

Details

#5228 fix: prevent OTel context leak in fire-and-forget background tasks (backport #5168) has been created for branch release-0.6.x but encountered conflicts

Cherry-pick of 20916be has failed:

On branch mergify/bp/release-0.6.x/pr-5168
Your branch is up to date with 'origin/release-0.6.x'.

You are currently cherry-picking commit 20916bef.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   src/llama_stack/core/task.py
	modified:   src/llama_stack/providers/utils/inference/inference_store.py
	new file:   tests/unit/core/test_task.py
	modified:   tests/unit/providers/agents/meta_reference/test_responses_background.py
	modified:   tests/unit/utils/inference/test_inference_store.py

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   src/llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

…5168) ## What's the problem? When you look at a trace in Jaeger, you expect it to show what happened during a single request. Instead, we found traces that looked like this during load testing: - A request that took **5 seconds** showed a trace lasting **62 seconds** - That trace contained **2,594 spans**, including **334 database writes that belonged to completely different requests** The trace was essentially garbage -- you couldn't tell what actually happened during the request vs. what leaked in from other requests happening at the same time. ## Why does this happen? The server uses background worker tasks to write data to the database without blocking the API response. These workers are long-lived -- they start up once and process a shared queue forever. The problem is how Python's `asyncio.create_task` works: it copies all context variables (including the OpenTelemetry trace context) at the moment the task is created. So whichever API request happens to **first** trigger worker creation permanently stamps its trace ID onto that worker. Every database write the worker processes from that point forward -- regardless of which request it came from -- gets attributed to that original request's trace. ``` Request A arrives → spawns worker → worker inherits trace A Request B arrives → enqueues work → worker processes it under trace A ← wrong! Request C arrives → enqueues work → worker processes it under trace A ← wrong! ...forever ``` ## How does this fix it? Two changes working together: **1. Workers start with a clean slate.** A new helper (`create_task_with_detached_otel_context`) creates the worker task with an empty trace context, so it doesn't permanently inherit any request's identity. **2. Each queue item carries its own trace context.** When a request enqueues work, it snapshots its current trace context and attaches it to the queue item. When the worker picks up that item, it temporarily activates the captured context for the duration of that work, then returns to a clean state before processing the next item. ``` Request A arrives → enqueues work with trace A context Request B arrives → enqueues work with trace B context Worker (no trace) → picks up item A → activates trace A → writes to DB → deactivates → picks up item B → activates trace B → writes to DB → deactivates ``` The result: each database write shows up under the correct request's trace. No inflation, no cross-contamination. ## What changed? | File | What it does | |------|-------------| | `core/task.py` (new) | Three utilities: `create_task_with_detached_otel_context` (start tasks clean), `capture_otel_context` (snapshot current context), `activate_otel_context` (temporarily restore a captured context) | | `inference_store.py` | Queue items now carry the OTel context; workers activate it per-item before writing | | `openai_responses.py` | Same pattern for the responses background worker | ## How is this tested? **14 new tests** across three files: - **`test_task.py`** (9 tests) -- validates the primitives: detached tasks get clean context, captured context can be re-activated, context flows correctly through a queue, and two requests don't contaminate each other - **`test_inference_store.py`** (2 tests) -- end-to-end with a real SQLite-backed InferenceStore: simulates two API requests, lets the queue + workers process the writes, and asserts each write lands in the correct trace (this directly reproduces the original bug) - **`test_responses_background.py`** (3 tests) -- same validation for the responses worker, plus a test proving that error-handling DB writes (marking a response as failed) are also attributed to the correct trace ## Test plan - [x] All 14 new unit tests pass - [x] All existing unit tests unaffected - [x] Inference and Responses API tests that use in memory OTEL span collectors pass --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Charlie Doern <cdoern@redhat.com> (cherry picked from commit 20916be) # Conflicts: # src/llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py

# What does this PR do? Following PR description is generated using claude: PR #5168 fixed OTel trace context leaking into background workers, but `PROVIDER_DATA_VAR` — the `ContextVar` that carries authenticated user identity — suffers from the same `asyncio.create_task` copy semantics. When a background worker is spawned, it permanently inherits the spawning request's `PROVIDER_DATA_VAR`, causing all subsequent DB writes to be stamped with the wrong user's identity. In multi-tenant deployments with auth enabled, this means: - Chat completions written through the `InferenceStore` write queue get attributed to whichever user's request first triggered worker creation, breaking row-level access control via `AuthorizedSqlStore`. - Responses processed through the `OpenAIResponsesImpl` background worker pool run under the wrong user's identity, affecting status updates, error handling, and stored response ownership. This PR generalizes the OTel-only utilities from #5168 into a unified `RequestContext` that captures **both** the OTel trace context and `PROVIDER_DATA_VAR` together. The three helpers in `core/task.py` are replaced: | Before (#5168) | After (this PR) | |---|---| | `capture_otel_context()` | `capture_request_context()` — snapshots OTel context **and** provider data | | `activate_otel_context(ctx)` | `activate_request_context(ctx)` — restores both per work-item | | `create_task_with_detached_otel_context(coro)` | `create_detached_background_task(coro)` — clears both before task creation | Both `InferenceStore` and `OpenAIResponsesImpl` are updated to capture a `RequestContext` at enqueue time and activate it in the worker loop, ensuring each work-item runs under the correct user identity and trace. Closes #5221 ## Test Plan - **`tests/unit/core/test_task.py`** (10 tests): Verifies `RequestContext` capture/activate semantics, detached task isolation for both OTel and `PROVIDER_DATA_VAR`, caller context restoration, queue-based propagation patterns, and cross-contamination prevention. - **`tests/unit/utils/inference/test_provider_data_leak.py`** (1 test): Reproduces the `InferenceStore` write queue leak end-to-end — two users store completions through the async queue, then verifies each user can only see their own completions via `AuthorizedSqlStore` access policies. This test fails without the fix. - **`tests/unit/providers/agents/builtin/test_responses_background.py`** (6 new tests): - `TestResponsesOtelContextPropagation` (3 tests): Verifies OTel trace attribution through the responses background worker — each response is processed under its originating request's trace, contexts don't leak between items, and error handlers run under the correct trace. - `TestResponsesProviderDataPropagation` (3 tests): Verifies user identity propagation — each response runs as the correct user, identity doesn't leak between queue items, and error-handling DB writes use the correct user. --------- Signed-off-by: Jaideep Rao <jrao@redhat.com>

# What does this PR do? Following PR description is generated using claude: PR #5168 fixed OTel trace context leaking into background workers, but `PROVIDER_DATA_VAR` — the `ContextVar` that carries authenticated user identity — suffers from the same `asyncio.create_task` copy semantics. When a background worker is spawned, it permanently inherits the spawning request's `PROVIDER_DATA_VAR`, causing all subsequent DB writes to be stamped with the wrong user's identity. In multi-tenant deployments with auth enabled, this means: - Chat completions written through the `InferenceStore` write queue get attributed to whichever user's request first triggered worker creation, breaking row-level access control via `AuthorizedSqlStore`. - Responses processed through the `OpenAIResponsesImpl` background worker pool run under the wrong user's identity, affecting status updates, error handling, and stored response ownership. This PR generalizes the OTel-only utilities from #5168 into a unified `RequestContext` that captures **both** the OTel trace context and `PROVIDER_DATA_VAR` together. The three helpers in `core/task.py` are replaced: | Before (#5168) | After (this PR) | |---|---| | `capture_otel_context()` | `capture_request_context()` — snapshots OTel context **and** provider data | | `activate_otel_context(ctx)` | `activate_request_context(ctx)` — restores both per work-item | | `create_task_with_detached_otel_context(coro)` | `create_detached_background_task(coro)` — clears both before task creation | Both `InferenceStore` and `OpenAIResponsesImpl` are updated to capture a `RequestContext` at enqueue time and activate it in the worker loop, ensuring each work-item runs under the correct user identity and trace. Closes #5221 ## Test Plan - **`tests/unit/core/test_task.py`** (10 tests): Verifies `RequestContext` capture/activate semantics, detached task isolation for both OTel and `PROVIDER_DATA_VAR`, caller context restoration, queue-based propagation patterns, and cross-contamination prevention. - **`tests/unit/utils/inference/test_provider_data_leak.py`** (1 test): Reproduces the `InferenceStore` write queue leak end-to-end — two users store completions through the async queue, then verifies each user can only see their own completions via `AuthorizedSqlStore` access policies. This test fails without the fix. - **`tests/unit/providers/agents/builtin/test_responses_background.py`** (6 new tests): - `TestResponsesOtelContextPropagation` (3 tests): Verifies OTel trace attribution through the responses background worker — each response is processed under its originating request's trace, contexts don't leak between items, and error handlers run under the correct trace. - `TestResponsesProviderDataPropagation` (3 tests): Verifies user identity propagation — each response runs as the correct user, identity doesn't leak between queue items, and error-handling DB writes use the correct user. --------- Signed-off-by: Jaideep Rao <jrao@redhat.com> (cherry picked from commit 9b86ce8) # Conflicts: # src/llama_stack/core/task.py # src/llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py # src/llama_stack/providers/utils/inference/inference_store.py # tests/unit/core/test_task.py # tests/unit/providers/agents/builtin/test_responses_background.py

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 16, 2026

mergify bot added the needs-rebase label Mar 16, 2026

iamemilio force-pushed the fix/otel-context-leak-fire-and-forget branch from 88b220d to 11b1eb6 Compare March 16, 2026 17:51

mergify bot removed the needs-rebase label Mar 16, 2026

iamemilio marked this pull request as ready for review March 16, 2026 17:56

iamemilio requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners March 16, 2026 17:56

gyliu513 reviewed Mar 16, 2026

View reviewed changes

src/llama_stack/providers/inline/agents/builtin/responses/openai_responses.py Outdated Show resolved Hide resolved

mergify bot and others added 5 commits March 17, 2026 13:37

Merge branch 'main' into fix/otel-context-leak-fire-and-forget

847654a

refactor: use typed queue items for OTel context passing

992f917

Replace private _otel_context kwarg with typed _BackgroundWorkItem dataclass (responses) and _WriteItem NamedTuple (inference store) to cleanly separate transport metadata from business data. Made-with: Cursor

Merge branch 'main' into fix/otel-context-leak-fire-and-forget

32830fd

Merge branch 'main' into fix/otel-context-leak-fire-and-forget

180519d

Merge branch 'main' into fix/otel-context-leak-fire-and-forget

aed7684

iamemilio mentioned this pull request Mar 18, 2026

Investigate potential OTel context leak in vector store background tasks #5205

Open

Merge branch 'main' into fix/otel-context-leak-fire-and-forget

062618a

jaideepr97 approved these changes Mar 19, 2026

View reviewed changes

jaideepr97 mentioned this pull request Mar 19, 2026

PROVIDER_DATA_VAR context leak in asyncio.create_task #5221

Closed

2 tasks

Merge branch 'main' into fix/otel-context-leak-fire-and-forget

0da8cf8

cdoern approved these changes Mar 20, 2026

View reviewed changes

jaideepr97 mentioned this pull request Mar 20, 2026

fix: provider_data_var context leak #5227

Merged

cdoern merged commit 20916be into llamastack:main Mar 20, 2026
71 checks passed

mergify bot mentioned this pull request Mar 20, 2026

fix: prevent OTel context leak in fire-and-forget background tasks (backport #5168) #5228

Open

3 tasks

mergify bot mentioned this pull request Mar 23, 2026

fix: provider_data_var context leak (backport #5227) #5247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent OTel context leak in fire-and-forget background tasks#5168

fix: prevent OTel context leak in fire-and-forget background tasks#5168
cdoern merged 8 commits intollamastack:mainfrom
iamemilio:fix/otel-context-leak-fire-and-forget

iamemilio commented Mar 16, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

iamemilio commented Mar 16, 2026

Uh oh!

iamemilio commented Mar 16, 2026

Uh oh!

Uh oh!

jaideepr97 commented Mar 18, 2026

Uh oh!

iamemilio commented Mar 18, 2026 •

edited

Loading

Uh oh!

jaideepr97 commented Mar 18, 2026 •

edited

Loading

Uh oh!

iamemilio commented Mar 18, 2026

Uh oh!

iamemilio commented Mar 18, 2026

Uh oh!

jaideepr97 left a comment

Uh oh!

Uh oh!

cdoern commented Mar 20, 2026

Uh oh!

mergify bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

iamemilio commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's the problem?

Why does this happen?

How does this fix it?

What changed?

How is this tested?

Test plan

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

iamemilio commented Mar 16, 2026

Uh oh!

iamemilio commented Mar 16, 2026

Uh oh!

Uh oh!

jaideepr97 commented Mar 18, 2026

Uh oh!

iamemilio commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaideepr97 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iamemilio commented Mar 18, 2026

Uh oh!

iamemilio commented Mar 18, 2026

Uh oh!

jaideepr97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cdoern commented Mar 20, 2026

Uh oh!

mergify bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Backports have been created

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iamemilio commented Mar 16, 2026 •

edited

Loading

iamemilio commented Mar 18, 2026 •

edited

Loading

jaideepr97 commented Mar 18, 2026 •

edited

Loading

mergify bot commented Mar 20, 2026 •

edited

Loading