Skip to content

[pull] main from danny-avila:main#112

Merged
pull[bot] merged 6 commits intoinnFactory:mainfrom
danny-avila:main
Apr 22, 2026
Merged

[pull] main from danny-avila:main#112
pull[bot] merged 6 commits intoinnFactory:mainfrom
danny-avila:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 22, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

danny-avila and others added 6 commits April 21, 2026 05:44
* 🔒 fix: Validate MCP OAuth Protected Resource Metadata binding (GHSA-gvpj-vm2f-2m23)

RFC 9728 §3.3/§7.3 requires clients to verify that the `resource` identifier
advertised by an OAuth Protected Resource Metadata document matches the URL
used to fetch it. Without this check, a malicious MCP server can serve
metadata pointing at a legitimate server's `authorization_servers`, causing
LibreChat to obtain an access token for the real server and send it to the
attacker in subsequent API calls.

Validation happens at discovery time so the entire metadata document is
discarded on mismatch — `authorization_servers` on a spoofed document is
equally untrustworthy and is the primary theft vector in the PoC.

Uses the MCP SDK's own `checkResourceAllowed` (origin + path-prefix) for
semantic parity with `selectResourceURL`, a SDK code path LibreChat bypasses.
This is looser than RFC-strict equality (handles common cases like
`/mcp/sse` server vs `/mcp` advertised resource, or trailing-slash
normalization) while still rejecting cross-origin spoofs and same-origin
sibling-path confusion.

* 🛡️ fix: Re-validate Resource Binding at MCP OAuth Token Exchange

Adds a defense-in-depth re-assertion of the RFC 9728 §3.3 resource/server
binding inside `completeOAuthFlow`. Flows have a 10-minute TTL, so a flow
initiated under pre-fix (vulnerable) code could still be in-flight at upgrade
time carrying unvalidated resource metadata. Re-checking here closes that
window without requiring operators to flush flow state on deploy.

Also guards against future regressions that might re-introduce unvalidated
paths into the flow-metadata pipeline (GHSA-gvpj-vm2f-2m23).

* ✅ test: Address Review Findings on MCP OAuth Resource Validation

Follow-up to GHSA-gvpj-vm2f-2m23 fix. Resolves three reviewer findings:

- Assert `failFlow` is called when `completeOAuthFlow` rejects at
  re-validation — locks in the "no stuck PENDING entry" guarantee that the
  catch block already provides in production code.
- Update the "no resource metadata" warning in `initiateOAuthFlow`: post-fix,
  that branch is only reachable when PRM discovery returned nothing (404,
  network error, server without RFC 9728). A document with a missing
  `resource` field now throws earlier in `assertResourceBoundToServer`, so
  the old "missing 'resource' property" phrasing described a case that can
  no longer reach this branch.
- Add a test for an unparseable `resource` string triggering the
  error-wrapping path in `assertResourceBoundToServer` (verifies the wrapper
  surfaces a descriptive message instead of leaking a raw `TypeError:
  Invalid URL` from the SDK's `new URL()` call).
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…12763)

* 📦 chore: Bump @modelcontextprotocol/sdk to v1.29.0

* ♻️ refactor: Extract WWW-Authenticate Probe Helper for MCP OAuth

* 🔐 fix: Prefer WWW-Authenticate resource_metadata Hint for MCP OAuth

Per RFC 9728 §5.1, the `resource_metadata=<url>` parameter in a 401
`WWW-Authenticate: Bearer` challenge is the authoritative protected-resource
metadata source. Path-aware `.well-known` discovery was winning over the
hint, so split deployments that serve valid-but-wrong metadata at the
path-aware endpoint stranded OAuth at defunct authorization servers.

Threads the hint through `discoverOAuthProtectedResourceMetadata` via
`opts.resourceMetadataUrl` in both startup detection and the OAuth handler,
matching the behavior of Claude Desktop, the MCP Inspector, OpenAI tooling,
and Microsoft Copilot Studio.

Fixes #12761.

* 🧵 fix: Thread OAuth-Aware fetchFn Through Resource-Metadata Probe

Without this, admin-configured `oauthHeaders` (e.g. a gateway API key that
fronts the MCP endpoint) were stripped from the probe, causing the gateway
to 401 for the wrong reason and masking the real `WWW-Authenticate` hint.

The helper now accepts a FetchLike and defaults to global fetch, so the
startup detection path is unchanged while the handler passes its OAuth-
aware wrapper through.

* 🧹 refactor: Address MCP OAuth Probe Review Findings

- Thread `fetchFn` through `probeResourceMetadataHint` so admin-configured
  `oauthHeaders` reach the probe (a gateway API key that fronts the MCP
  endpoint would otherwise 401 us for the wrong reason and hide the real
  Bearer challenge).
- Skip the redundant HEAD request in `checkAuthErrorFallback` when the
  probe already observed a 401/403; fall back to a fresh HEAD only when
  every probe attempt threw (transient network error).
- Narrow the oauth barrel: drop `export * from './resourceHint'` so the
  helper stays an internal module.
- Add `scope` extraction coverage (`Bearer scope="read write"`) and a
  403-only observation path; isolate `MCP_OAUTH_ON_AUTH_ERROR=true` in a
  dedicated suite so precise-outcome tests aren't muddied by the safety net.

* ✅ fix: Use Zod Schema in MCP Reconnection-Storm Test Tool

MCP SDK 1.28 tightened `McpServer.tool()` to require Zod schemas instead
of plain JSON-Schema objects. Swap the `{ message: { type: 'string' } }`
shape for `z.string()` so the fixture server spins up under SDK 1.29.

* 🛡️ fix: Harden MCP OAuth resource_metadata Hint Against SSRF

The `resource_metadata` URL is echoed from an untrusted MCP server, so
handing it straight to the SDK lets a malicious server redirect discovery
at private IPs, the cloud metadata service, or any host the admin did not
intend to reach. Caught by the Copilot review on #12763.

- `handler.ts`: run the hint through the same `validateOAuthUrl` /
  `allowedDomains` gate that already guards the authorization-server URL;
  drop it and fall back to path-aware discovery on rejection.
- `detectOAuth.ts`: no admin-scoped allowedDomains here, so apply a
  strict `isSSRFTarget` + DNS resolution check and silently discard any
  hint pointing at a private/loopback/metadata address.
- Tests cover both the hostname-list and DNS-resolution rejection paths
  and assert the SDK falls back to path-aware discovery unharmed.

* 🧪 test: Mock ~/auth in fallback Suite for Consistency

Matches the main `detectOAuth.test.ts` mock so the SSRF guards added in the
previous commit don't touch the real `~/auth` module at test time.

* 🔍 fix: Scope OAuth Fallback to HEAD + Parse Multi-Scheme WWW-Authenticate

Two codex findings on #12763:

- **P1**: the merged `authChallenge` flag was letting POST-only 401/403 flip
  the `MCP_OAUTH_ON_AUTH_ERROR` fallback, misclassifying WAF/CSRF-hardened
  endpoints (HEAD 200 + POST 403) as OAuth-required. Rename to
  `headAuthChallenge` and derive it only from the HEAD probe, matching the
  legacy fallback's HEAD-only semantics. Add a regression test.
- **P2**: the SDK's `extractWWWAuthenticateParams` only inspects the first
  scheme token, so multi-scheme headers like
  `Basic realm="api", Bearer resource_metadata="..."` silently dropped the
  authoritative Bearer hint. Fall back to a regex across the full header
  when the SDK returns nothing but Bearer is present. Add a regression
  test covering the multi-scheme case.

* 🧽 refactor: Tighten MCP OAuth Probe Semantics

Addresses the second external review pass plus codex P2:

- Merge the two stacked JSDoc blocks on `probeResourceMetadataHint` into one
  with a proper `@returns` section.
- Only short-circuit HEAD when it delivered the `resource_metadata` hint
  itself — a Bearer-without-params HEAD now lets POST run, since some
  servers surface their hint only on POST and we were missing it.
- Drop the unused `scope` field from `ResourceHintProbeResult`; no caller
  read it, and YAGNI beats a reserved field.
- Remove the redundant `OAUTH_ON_AUTH_ERROR` guard inside
  `checkAuthErrorFallback` — the only call site already gates on it.
- Codex P2: signal "HEAD status unknown" via `null` when the HEAD probe
  threw and POST returned non-auth. Previously that combination leaked a
  `{headAuthChallenge: false}` result and silently skipped the fallback's
  retry HEAD, which could misclassify OAuth-required servers after a
  transient HEAD failure.

Regression tests cover every path: Bearer-no-hint-on-HEAD + hint-on-POST,
multi-scheme `Basic + Bearer` headers, HEAD-threw + POST-200 retry, and
the WAF/CSRF-only POST 403 case.

* 🪥 polish: Tighten Probe Null-Guard Ordering + Add Malformed-Hint Test

Two NITs from the follow-up review:

- Move `bearerChallenge` computation after the `!wwwAuth` guard so the
  variable is only derived when it can be meaningfully `true`. The
  early-return path is now a clean unconditional exit.
- Add a regression test that asserts `Bearer resource_metadata="not-a-url"`
  yields `resourceMetadataUrl: undefined` without throwing, locking in
  the try/catch safety net in `extractHintFromHeader` and the SDK
  parser alike.
Registers text/calendar across the MIME allowlists (fullMimeTypesList,
codeInterpreterMimeTypesList, textMimeTypes regex) and maps the .ics,
.ical, .ifb, .icalendar extensions in codeTypeMapping, so iCalendar
files produced by the code interpreter are accepted as valid output
and rendered as downloadable attachments.
…Vars on Save (#12770)

* fix: prevent browser autofill from silently dropping MCP customUserVars on save

The customUserVars form in CustomUserVarsSection renders each
credential as a plain `<input type="text">` with no autofill
guards. This caused two user-visible problems:

1. Browser password managers (Chrome's built-in, 1Password, LastPass,
   Bitwarden) treat the fields as savable and offer to store values,
   which undermines the per-user credential model -- the whole point
   of customUserVars is that each user's own secrets stay in their
   own session and backend-encrypted, not cached by the browser.

2. When autofill fills a field, it does so via DOM mutation that does
   NOT fire React's synthetic `onChange`. react-hook-form's
   Controller therefore never sees the value, the form state stays
   `""`, and on submit the backend receives an empty string for
   every affected field. The user's typed credentials are silently
   dropped -- LibreChat stores 16 encrypted-empty-string rows in
   pluginauths, tool calls subsequently fail with "API key not set"
   errors, and the UI shows an "Unset" pill next to fields the user
   is certain they filled in.

The fix matches the pattern already used in
`SidePanel/Builder/ActionsAuth.tsx` for the analogous actions
credential input:

  type="new-password"
  autoComplete="new-password"

plus vendor-specific ignore attributes for LastPass and 1Password:

  data-lpignore="true"
  data-1p-ignore="true"

(Modern browsers ignore `autocomplete="off"` for password-shaped
fields, so the vendor attributes are the reliable defence against
LastPass and 1Password, which have their own heuristics.)

No behavioural change beyond preventing autofill: the input still
accepts typed or pasted values, react-hook-form still collects them,
submit still writes to the backend. The difference is that the
values now actually reach the submit handler.

* test: add regression guard for MCP customUserVars autofill prevention

Condenses the rationale comment on the credential `<Input>` and adds a
render test asserting that the autofill-prevention attributes
(`type`, `autoComplete`, `data-lpignore`, `data-1p-ignore`) remain on
the rendered input. The underlying bug (browser DOM mutations
bypassing React's synthetic onChange) is severe enough that a future
refactor accidentally dropping these attributes would silently
re-introduce credential data loss.

---------

Co-authored-by: Danny Avila <danny@librechat.ai>
* 🧹 fix: Prune Orphaned File References on File Deletion

Deleting a file via the Manage Files tab left its file_id in every agent's
tool_resources.*.file_ids. Stubs accumulate until the frontend dedupe keys
them as duplicates and blocks all new uploads (issue #12776).

- Add removeAgentResourceFilesFromAllAgents in packages/data-schemas: a
  single updateMany/$pullAll across every EToolResources category.
- Invoke it from processDeleteRequest after db.deleteFiles so every
  referencing agent is cleaned up, not just the one passed in req.body.
- Wrap the cleanup in try/catch so a stale agent update cannot mask a
  successful file deletion.

* 🧼 fix: Prune Orphaned File References on Agent Update

Already-affected agents would stay broken even after the delete-time fix:
the stubs sit on the agent document until something strips them. Heal them
on the next save (issue #12776).

- Add collectToolResourceFileIds + stripFileIdsFromToolResources helpers
  in @librechat/api — centralizing the tool_resources traversal used by
  the controller and the follow-up migration script.
- In updateAgentHandler, check the effective tool_resources against the
  files collection. When orphans are found, either strip them from the
  incoming tool_resources (if the update sets them) or run the bulk
  cleanup (if the update leaves tool_resources untouched).

* 🧰 chore: Add Migration to Clean Up Orphaned Agent File References

Complements the delete-time and save-time fixes by healing agents that
already accumulated orphan stubs before the upgrade (issue #12776). The
script is idempotent — re-running it on a clean database is a no-op.

- Add config/migrate-orphaned-agent-files.js following the existing
  migrate-*.js convention: --dry-run by default omitted (writes by
  default) and --batch-size= tuning knob. Streams agents via cursor.
- Register migrate:orphaned-agent-files and :dry-run npm scripts.
- Reuse collectToolResourceFileIds from @librechat/api so migration and
  runtime share the same traversal logic.

* 🩹 fix: Address Codex/Copilot Review on Orphaned Agent File Cleanup

Refines the #12776 fix series based on automated review feedback.

- Scope save-time pruning to the current agent only. When a PATCH
  carries tool_resources, strip orphans from the incoming payload and
  pay the DB round-trip only then. Removes the collection-wide
  updateMany previously triggered when tool_resources was absent
  (Codex P2 / Copilot).
- Wrap the orphan check in try/catch so a transient db.getFiles
  failure can't turn a good save into a 500 (comprehensive review #1).
- Replace Object.values(EToolResources) casts with an explicit list of
  agent-side categories in both orphans.ts and agent.ts. code_interpreter
  belongs to the Assistants API and isn't a key of AgentToolResources —
  including it was a type lie and generated dead MongoDB clauses
  (comprehensive review #3, #8).
- Export TOOL_RESOURCE_KEYS from @librechat/api and consume it in the
  migration script, dropping one duplicated definition (#4).
- Cap migration results.details at 50 sample entries so the memory
  footprint stays bounded on deployments with thousands of corrupted
  agents (Codex P3).
- Add migrate:orphaned-agent-files:batch npm script to match the
  convention set by migrate-agent-permissions / migrate-prompt-permissions
  (#7).
- Add controller-level tests covering the three orphan-pruning paths:
  strip from incoming tool_resources, leave alone when tool_resources
  is absent, swallow db.getFiles errors and still save (#6).
- Back pre-existing "should validate tool_resources in updates" test's
  file_ids with real File docs — the new pruning would otherwise strip
  them, and that test is about OCR conversion / schema filtering, not
  file existence. Register the File model in beforeAll so the fixture
  works.

* 🩹 fix: Tighten TOOL_RESOURCE_KEYS Type and Align Migration Sample Output

Two follow-ups from the second review pass.

- Type data-schemas' TOOL_RESOURCE_KEYS as ReadonlyArray<keyof
  AgentToolResources> instead of readonly string[]. Data-schemas depends
  on data-provider, so the import is clean. Catches typos and aligns
  with the matching export in @librechat/api — doesn't guarantee
  exhaustiveness, but that's a TypeScript limitation, not a workspace
  one.
- Align the migration's console output with DETAIL_SAMPLE_LIMIT: print
  every collected detail (up to 50) and, when more agents were affected
  than the sample size allowed, show a truncation notice. The old hard
  cap of 25 meant affected agents in the 26-50 range were collected
  but never shown.

* ✅ test: Add Integration Coverage for Orphan Cleanup Paths (#12776)

Exercise the delete-time and migration paths end-to-end against a real
in-memory Mongo. Catches integration bugs the isolated unit tests on
each layer couldn't.

- api/server/services/Files/process.integration.spec.js — the primary
  repro: seed an Agent + File, call processDeleteRequest, assert the
  file_id disappears from every referencing agent's tool_resources
  while unrelated agents stay untouched. Also covers the no-op case
  and confirms a failure in the new cleanup step cannot roll back the
  file deletion itself.
- api/test/migrate-orphaned-agent-files.spec.js — drives the migration
  module: --dry-run reports without writing, apply mode prunes across
  every tool_resource category, re-running is idempotent, and
  DETAIL_SAMPLE_LIMIT caps the in-memory sample on wide corruption.
  Mocks only the connect helper (the spec owns the mongoose instance)
  so the real migration code path — cursor, $pullAll, reduce — runs.

* 🔒 fix: Run Orphan Cleanup Migration in System Tenant Context

Codex P2 catch: under TENANT_ISOLATION_STRICT=true, the migration
throws on the very first Agent.countDocuments() because the tenant
isolation plugin fail-closes on queries without tenant context — which
makes migrate:orphaned-agent-files unusable on the exact deployments
most likely to have accumulated corruption.

- Wrap the scan/prune body in runAsSystem so queries bypass the tenant
  filter (SYSTEM_TENANT_ID sentinel). The migration legitimately needs
  cross-tenant visibility — this is the same pattern seedDatabase and
  the S3 refresh job already use.
- Add a regression test that spies on Agent.countDocuments() and
  asserts the active tenantStorage context is SYSTEM_TENANT_ID during
  the call. Pins the wrap against future regressions without the
  brittleness of toggling the strict-mode env var (which caches on
  first read).

Note: the delete-time and save-time paths already run inside an
authenticated HTTP request where tenantStorage.run is set by auth
middleware, so the cleanup naturally scopes to the current tenant —
which is the correct behavior there since file ownership is
tenant-scoped.

* 🧹 chore: Drop Unused path Import From Process Integration Spec

Leftover from an earlier iteration that resolved the migration path
via path.resolve before I switched to a relative require. The import
does nothing now — removing it.
@pull pull Bot locked and limited conversation to collaborators Apr 22, 2026
@pull pull Bot added the ⤵️ pull label Apr 22, 2026
@pull pull Bot merged commit 181d705 into innFactory:main Apr 22, 2026
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants