fix: prevent OTel context leak in fire-and-forget background tasks (backport #5168) by mergify[bot] · Pull Request #5228 · llamastack/llama-stack

mergify · 2026-03-20T14:55:20Z

What's the problem?

When you look at a trace in Jaeger, you expect it to show what happened during a single request. Instead, we found traces that looked like this during load testing:

A request that took 5 seconds showed a trace lasting 62 seconds
That trace contained 2,594 spans, including 334 database writes that belonged to completely different requests

The trace was essentially garbage -- you couldn't tell what actually happened during the request vs. what leaked in from other requests happening at the same time.

Why does this happen?

The server uses background worker tasks to write data to the database without blocking the API response. These workers are long-lived -- they start up once and process a shared queue forever.

The problem is how Python's asyncio.create_task works: it copies all context variables (including the OpenTelemetry trace context) at the moment the task is created. So whichever API request happens to first trigger worker creation permanently stamps its trace ID onto that worker. Every database write the worker processes from that point forward -- regardless of which request it came from -- gets attributed to that original request's trace.

Request A arrives → spawns worker → worker inherits trace A
Request B arrives → enqueues work → worker processes it under trace A  ← wrong!
Request C arrives → enqueues work → worker processes it under trace A  ← wrong!
...forever

How does this fix it?

Two changes working together:

1. Workers start with a clean slate.
A new helper (create_task_with_detached_otel_context) creates the worker task with an empty trace context, so it doesn't permanently inherit any request's identity.

2. Each queue item carries its own trace context.
When a request enqueues work, it snapshots its current trace context and attaches it to the queue item. When the worker picks up that item, it temporarily activates the captured context for the duration of that work, then returns to a clean state before processing the next item.

Request A arrives → enqueues work with trace A context
Request B arrives → enqueues work with trace B context

Worker (no trace) → picks up item A → activates trace A → writes to DB → deactivates
                  → picks up item B → activates trace B → writes to DB → deactivates

The result: each database write shows up under the correct request's trace. No inflation, no cross-contamination.

What changed?

File	What it does
`core/task.py` (new)	Three utilities: `create_task_with_detached_otel_context` (start tasks clean), `capture_otel_context` (snapshot current context), `activate_otel_context` (temporarily restore a captured context)
`inference_store.py`	Queue items now carry the OTel context; workers activate it per-item before writing
`openai_responses.py`	Same pattern for the responses background worker

How is this tested?

14 new tests across three files:

test_task.py (9 tests) -- validates the primitives: detached tasks get clean context, captured context can be re-activated, context flows correctly through a queue, and two requests don't contaminate each other
test_inference_store.py (2 tests) -- end-to-end with a real SQLite-backed InferenceStore: simulates two API requests, lets the queue + workers process the writes, and asserts each write lands in the correct trace (this directly reproduces the original bug)
test_responses_background.py (3 tests) -- same validation for the responses worker, plus a test proving that error-handling DB writes (marking a response as failed) are also attributed to the correct trace

Test plan

All 14 new unit tests pass
All existing unit tests unaffected
Inference and Responses API tests that use in memory OTEL span collectors pass
This is an automatic backport of pull request fix: prevent OTel context leak in fire-and-forget background tasks #5168 done by Mergify.

…5168) ## What's the problem? When you look at a trace in Jaeger, you expect it to show what happened during a single request. Instead, we found traces that looked like this during load testing: - A request that took **5 seconds** showed a trace lasting **62 seconds** - That trace contained **2,594 spans**, including **334 database writes that belonged to completely different requests** The trace was essentially garbage -- you couldn't tell what actually happened during the request vs. what leaked in from other requests happening at the same time. ## Why does this happen? The server uses background worker tasks to write data to the database without blocking the API response. These workers are long-lived -- they start up once and process a shared queue forever. The problem is how Python's `asyncio.create_task` works: it copies all context variables (including the OpenTelemetry trace context) at the moment the task is created. So whichever API request happens to **first** trigger worker creation permanently stamps its trace ID onto that worker. Every database write the worker processes from that point forward -- regardless of which request it came from -- gets attributed to that original request's trace. ``` Request A arrives → spawns worker → worker inherits trace A Request B arrives → enqueues work → worker processes it under trace A ← wrong! Request C arrives → enqueues work → worker processes it under trace A ← wrong! ...forever ``` ## How does this fix it? Two changes working together: **1. Workers start with a clean slate.** A new helper (`create_task_with_detached_otel_context`) creates the worker task with an empty trace context, so it doesn't permanently inherit any request's identity. **2. Each queue item carries its own trace context.** When a request enqueues work, it snapshots its current trace context and attaches it to the queue item. When the worker picks up that item, it temporarily activates the captured context for the duration of that work, then returns to a clean state before processing the next item. ``` Request A arrives → enqueues work with trace A context Request B arrives → enqueues work with trace B context Worker (no trace) → picks up item A → activates trace A → writes to DB → deactivates → picks up item B → activates trace B → writes to DB → deactivates ``` The result: each database write shows up under the correct request's trace. No inflation, no cross-contamination. ## What changed? | File | What it does | |------|-------------| | `core/task.py` (new) | Three utilities: `create_task_with_detached_otel_context` (start tasks clean), `capture_otel_context` (snapshot current context), `activate_otel_context` (temporarily restore a captured context) | | `inference_store.py` | Queue items now carry the OTel context; workers activate it per-item before writing | | `openai_responses.py` | Same pattern for the responses background worker | ## How is this tested? **14 new tests** across three files: - **`test_task.py`** (9 tests) -- validates the primitives: detached tasks get clean context, captured context can be re-activated, context flows correctly through a queue, and two requests don't contaminate each other - **`test_inference_store.py`** (2 tests) -- end-to-end with a real SQLite-backed InferenceStore: simulates two API requests, lets the queue + workers process the writes, and asserts each write lands in the correct trace (this directly reproduces the original bug) - **`test_responses_background.py`** (3 tests) -- same validation for the responses worker, plus a test proving that error-handling DB writes (marking a response as failed) are also attributed to the correct trace ## Test plan - [x] All 14 new unit tests pass - [x] All existing unit tests unaffected - [x] Inference and Responses API tests that use in memory OTEL span collectors pass --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Charlie Doern <cdoern@redhat.com> (cherry picked from commit 20916be) # Conflicts: # src/llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py

mergify · 2026-03-20T14:55:23Z

Cherry-pick of 20916be has failed:

On branch mergify/bp/release-0.6.x/pr-5168
Your branch is up to date with 'origin/release-0.6.x'.

You are currently cherry-picking commit 20916bef.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   src/llama_stack/core/task.py
	modified:   src/llama_stack/providers/utils/inference/inference_store.py
	new file:   tests/unit/core/test_task.py
	modified:   tests/unit/providers/agents/meta_reference/test_responses_background.py
	modified:   tests/unit/utils/inference/test_inference_store.py

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   src/llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify bot added the conflicts label Mar 20, 2026

mergify bot requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners March 20, 2026 14:55

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 20, 2026

mergify bot mentioned this pull request Mar 20, 2026

fix: prevent OTel context leak in fire-and-forget background tasks #5168

Merged

3 tasks

jaideepr97 mentioned this pull request Mar 23, 2026

fix: provider_data_var context leak (backport #5227) #5247

Closed

iamemilio mentioned this pull request Mar 23, 2026

fix: provider_data_var context leak (backport #5227) #5250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent OTel context leak in fire-and-forget background tasks (backport #5168)#5228

fix: prevent OTel context leak in fire-and-forget background tasks (backport #5168)#5228
mergify[bot] wants to merge 1 commit intorelease-0.6.xfrom
mergify/bp/release-0.6.x/pr-5168

mergify bot commented Mar 20, 2026

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mergify bot commented Mar 20, 2026

What's the problem?

Why does this happen?

How does this fix it?

What changed?

How is this tested?

Test plan

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant