fix: provider_data_var context leak by jaideepr97 · Pull Request #5227 · llamastack/llama-stack

jaideepr97 · 2026-03-20T14:43:48Z

What does this PR do?

Following PR description is generated using claude:

PR #5168 fixed OTel trace context leaking into background workers, but PROVIDER_DATA_VAR — the ContextVar that carries authenticated user identity — suffers from the same asyncio.create_task copy semantics. When a background worker is spawned, it permanently inherits the spawning request's PROVIDER_DATA_VAR, causing all subsequent DB writes to be stamped with the wrong user's identity. In multi-tenant deployments with auth enabled, this means:

Chat completions written through the InferenceStore write queue get attributed to whichever user's request first triggered worker creation, breaking row-level access control via AuthorizedSqlStore.
Responses processed through the OpenAIResponsesImpl background worker pool run under the wrong user's identity, affecting status updates, error handling, and stored response ownership.

This PR generalizes the OTel-only utilities from #5168 into a unified RequestContext that captures both the OTel trace context and PROVIDER_DATA_VAR together. The three helpers in core/task.py are replaced:

Before (#5168)	After (this PR)
`capture_otel_context()`	`capture_request_context()` — snapshots OTel context and provider data
`activate_otel_context(ctx)`	`activate_request_context(ctx)` — restores both per work-item
`create_task_with_detached_otel_context(coro)`	`create_detached_background_task(coro)` — clears both before task creation

Both InferenceStore and OpenAIResponsesImpl are updated to capture a RequestContext at enqueue time and activate it in the worker loop, ensuring each work-item runs under the correct user identity and trace.

Closes #5221

Test Plan

tests/unit/core/test_task.py (10 tests): Verifies RequestContext capture/activate semantics, detached task isolation for both OTel and PROVIDER_DATA_VAR, caller context restoration, queue-based propagation patterns, and cross-contamination prevention.
tests/unit/utils/inference/test_provider_data_leak.py (1 test): Reproduces the InferenceStore write queue leak end-to-end — two users store completions through the async queue, then verifies each user can only see their own completions via AuthorizedSqlStore access policies. This test fails without the fix.
tests/unit/providers/agents/builtin/test_responses_background.py (6 new tests):
- TestResponsesOtelContextPropagation (3 tests): Verifies OTel trace attribution through the responses background worker — each response is processed under its originating request's trace, contexts don't leak between items, and error handlers run under the correct trace.
- TestResponsesProviderDataPropagation (3 tests): Verifies user identity propagation — each response runs as the correct user, identity doesn't leak between queue items, and error-handling DB writes use the correct user.

mergify · 2026-03-20T15:00:59Z

This pull request has merge conflicts that must be resolved before it can be merged. @jaideepr97 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Jaideep Rao <jrao@redhat.com>

iamemilio · 2026-03-20T16:12:45Z

LGTM. This was a good catch. Thanks!

cdoern · 2026-03-20T16:29:12Z

src/llama_stack/core/task.py

-    request happened to spawn them. This inflates trace durations and bundles
-    unrelated DB operations under the wrong trace.
+@dataclass
+class RequestContext:


hmmmm looking at this, I wonder if this would've been useful to be in the API pkg if used by providers... not something to change in this PR though.

jaideepr97 · 2026-03-20T17:21:03Z

@Mergifyio backport release-0.6.x

mergify · 2026-03-20T17:21:11Z

backport release-0.6.x

☑️ Command disallowed due to command restrictions in the Mergify configuration.

Details

sender-permission >= write

jaideepr97 · 2026-03-23T12:58:12Z

@cdoern please backport this

leseb · 2026-03-23T13:33:38Z

@Mergifyio backport release-0.6.x

mergify · 2026-03-23T13:34:02Z

backport release-0.6.x

✅ Backports have been created

Details

#5247 fix: provider_data_var context leak (backport #5227) has been created for branch release-0.6.x but encountered conflicts

Cherry-pick of 9b86ce8 has failed:

On branch mergify/bp/release-0.6.x/pr-5227
Your branch is up to date with 'origin/release-0.6.x'.

You are currently cherry-picking commit 9b86ce80.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   tests/unit/utils/inference/test_provider_data_leak.py

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	deleted by us:   src/llama_stack/core/task.py
	both modified:   src/llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py
	both modified:   src/llama_stack/providers/utils/inference/inference_store.py
	deleted by us:   tests/unit/core/test_task.py
	deleted by us:   tests/unit/providers/agents/builtin/test_responses_background.py

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

# What does this PR do? Following PR description is generated using claude: PR #5168 fixed OTel trace context leaking into background workers, but `PROVIDER_DATA_VAR` — the `ContextVar` that carries authenticated user identity — suffers from the same `asyncio.create_task` copy semantics. When a background worker is spawned, it permanently inherits the spawning request's `PROVIDER_DATA_VAR`, causing all subsequent DB writes to be stamped with the wrong user's identity. In multi-tenant deployments with auth enabled, this means: - Chat completions written through the `InferenceStore` write queue get attributed to whichever user's request first triggered worker creation, breaking row-level access control via `AuthorizedSqlStore`. - Responses processed through the `OpenAIResponsesImpl` background worker pool run under the wrong user's identity, affecting status updates, error handling, and stored response ownership. This PR generalizes the OTel-only utilities from #5168 into a unified `RequestContext` that captures **both** the OTel trace context and `PROVIDER_DATA_VAR` together. The three helpers in `core/task.py` are replaced: | Before (#5168) | After (this PR) | |---|---| | `capture_otel_context()` | `capture_request_context()` — snapshots OTel context **and** provider data | | `activate_otel_context(ctx)` | `activate_request_context(ctx)` — restores both per work-item | | `create_task_with_detached_otel_context(coro)` | `create_detached_background_task(coro)` — clears both before task creation | Both `InferenceStore` and `OpenAIResponsesImpl` are updated to capture a `RequestContext` at enqueue time and activate it in the worker loop, ensuring each work-item runs under the correct user identity and trace. Closes #5221 ## Test Plan - **`tests/unit/core/test_task.py`** (10 tests): Verifies `RequestContext` capture/activate semantics, detached task isolation for both OTel and `PROVIDER_DATA_VAR`, caller context restoration, queue-based propagation patterns, and cross-contamination prevention. - **`tests/unit/utils/inference/test_provider_data_leak.py`** (1 test): Reproduces the `InferenceStore` write queue leak end-to-end — two users store completions through the async queue, then verifies each user can only see their own completions via `AuthorizedSqlStore` access policies. This test fails without the fix. - **`tests/unit/providers/agents/builtin/test_responses_background.py`** (6 new tests): - `TestResponsesOtelContextPropagation` (3 tests): Verifies OTel trace attribution through the responses background worker — each response is processed under its originating request's trace, contexts don't leak between items, and error handlers run under the correct trace. - `TestResponsesProviderDataPropagation` (3 tests): Verifies user identity propagation — each response runs as the correct user, identity doesn't leak between queue items, and error-handling DB writes use the correct user. --------- Signed-off-by: Jaideep Rao <jrao@redhat.com> (cherry picked from commit 9b86ce8) # Conflicts: # src/llama_stack/core/task.py # src/llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py # src/llama_stack/providers/utils/inference/inference_store.py # tests/unit/core/test_task.py # tests/unit/providers/agents/builtin/test_responses_background.py

Backport of commit 9b86ce8 from main to release-0.6.x. PROVIDER_DATA_VAR — the ContextVar that carries authenticated user identity — leaks through asyncio.create_task copy semantics into long-lived background workers. When a background worker is spawned, it permanently inherits the spawning request's PROVIDER_DATA_VAR, causing all subsequent DB writes to be stamped with the wrong user's identity. This introduces a unified RequestContext in core/task.py that captures both OTel trace context and PROVIDER_DATA_VAR together. Background workers in InferenceStore and OpenAIResponsesImpl now capture context at enqueue time and re-activate it per work-item, ensuring each operation runs under the correct user identity and trace. Adapted for release-0.6.x directory structure (meta_reference paths instead of builtin). Signed-off-by: Jaideep Rao <jrao@redhat.com> Made-with: Cursor

jaideepr97 requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners March 20, 2026 14:43

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 20, 2026

mergify bot added the needs-rebase label Mar 20, 2026

fix provider_data_var context leak

793ad1f

Signed-off-by: Jaideep Rao <jrao@redhat.com>

jaideepr97 force-pushed the provider-data-leak branch from 7fa11d5 to 793ad1f Compare March 20, 2026 15:10

mergify bot removed the needs-rebase label Mar 20, 2026

empty commit

047fa0f

Signed-off-by: Jaideep Rao <jrao@redhat.com>

cdoern approved these changes Mar 20, 2026

View reviewed changes

cdoern merged commit 9b86ce8 into llamastack:main Mar 20, 2026
73 checks passed

mergify bot mentioned this pull request Mar 23, 2026

fix: provider_data_var context leak (backport #5227) #5247

Closed

jaideepr97 mentioned this pull request Mar 23, 2026

fix: provider_data_var context leak (backport #5227) #5250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: provider_data_var context leak#5227

fix: provider_data_var context leak#5227
cdoern merged 2 commits intollamastack:mainfrom
jaideepr97:provider-data-leak

jaideepr97 commented Mar 20, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

iamemilio commented Mar 20, 2026

Uh oh!

cdoern Mar 20, 2026

Uh oh!

Uh oh!

jaideepr97 commented Mar 20, 2026

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

jaideepr97 commented Mar 23, 2026

Uh oh!

leseb commented Mar 23, 2026

Uh oh!

mergify bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jaideepr97 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

iamemilio commented Mar 20, 2026

Uh oh!

cdoern Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jaideepr97 commented Mar 20, 2026

Uh oh!

mergify bot commented Mar 20, 2026

☑️ Command disallowed due to command restrictions in the Mergify configuration.

Uh oh!

jaideepr97 commented Mar 23, 2026

Uh oh!

leseb commented Mar 23, 2026

Uh oh!

mergify bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Backports have been created

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jaideepr97 commented Mar 20, 2026 •

edited

Loading

mergify bot commented Mar 23, 2026 •

edited

Loading