Skip to content

Conversation

@aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Nov 13, 2025

feat(mcp): Add HTTP header-based authentication support for cloud operations

Summary

Implements HTTP header-based authentication for the PyAirbyte MCP server when running in HTTP/SSE modes. This allows clients to pass Airbyte Cloud credentials via HTTP headers instead of relying solely on environment variables.

Key Changes:

  • Added _request_context.py with ContextVars for storing per-request auth values
  • Added _middleware.py with ASGI middleware to extract auth from HTTP headers
  • Updated cloud_ops.py to use header values when available, falling back to env vars
  • Modified server.py to inject middleware into FastMCP app

Supported Headers (case-insensitive, both prefixes supported):

  • X-Airbyte-Cloud-Client-Id or Airbyte-Cloud-Client-Id
  • X-Airbyte-Cloud-Client-Secret or Airbyte-Cloud-Client-Secret
  • X-Airbyte-Cloud-Workspace-Id or Airbyte-Cloud-Workspace-Id
  • X-Airbyte-Cloud-Api-Url or Airbyte-Cloud-Api-Url (optional)

Design:

  • Uses Python ContextVars for request-scoped storage (no global state pollution)
  • Middleware only processes HTTP requests, STDIO mode unaffected
  • ContextVars properly reset after each request to prevent leakage
  • Header values wrapped in SecretString for security

Review & Testing Checklist for Human

⚠️ IMPORTANT: This implementation has NOT been tested in actual HTTP/SSE mode - only lint checks and type checks have passed.

  • Test HTTP/SSE mode with headers: Start the MCP server in HTTP mode and verify it accepts auth via headers (this is the critical test - the implementation is untested in this mode)
  • Verify STDIO mode still works: Ensure backward compatibility - STDIO mode should work exactly as before
  • Test both header prefixes: Verify both X-Airbyte-Cloud-* and Airbyte-Cloud-* headers work
  • Check secret handling: Verify secrets from headers aren't being logged or exposed
  • Review type ignore: The middleware parameter has a # type: ignore[list-item] - verify this is acceptable or if there's a better approach

Test Plan

  1. Run MCP server in HTTP mode: python -m airbyte.mcp.server (may need to add HTTP mode support to main())
  2. Send requests with auth headers and verify cloud operations work
  3. Send requests without headers and verify env var fallback works
  4. Run existing STDIO mode tests to ensure no regression

Notes

Summary by CodeRabbit

  • New Features
    • MCP server supports HTTP header–based authentication for Airbyte Cloud in HTTP/SSE modes.
    • Accepts client credentials, workspace ID, and API URL via case-insensitive request headers (multiple header names supported).
    • Header-provided values apply only to the individual request and automatically fall back to environment configuration when absent.

…rations

- Add ContextVar-based request context management for per-request auth values
- Implement ASGI middleware to extract auth from HTTP headers
- Support both X-Airbyte-Cloud-* and Airbyte-Cloud-* header prefixes
- Update cloud operations to use header values when available, fallback to env vars
- Middleware only active in HTTP/SSE modes, STDIO mode unchanged

Co-Authored-By: AJ Steers <[email protected]>
@devin-ai-integration
Copy link
Contributor

Original prompt from AJ Steers
Received message in Slack channel #ask-devin-ai:

@Devin - Can you update connector builder mcp and pyairbyte mcp so that they can both a client ID and client secret (and any other required input values) via http headers? When running in remote modes, we want the servers to tolerate not getting input values from env vars.
Thread URL: https://airbytehq-team.slack.com/archives/C08BHPUMEPJ/p1763054879077789

@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1763055706-mcp-http-header-auth' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1763055706-mcp-http-header-auth'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

- Remove CLOUD_API_URL_CVAR and get_effective_cloud_api_url()
- Remove X-Airbyte-Cloud-Api-Url header extraction from middleware
- Remove api_root parameter from CloudWorkspace construction
- Simplify to only support client ID, client secret, and workspace ID headers

Co-Authored-By: AJ Steers <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

📝 Walkthrough

Walkthrough

Adds per-request header-based Cloud authentication: a ContextVar-backed request context, an ASGI HeaderAuthMiddleware that populates it from HTTP headers for MCP HTTP/SSE requests, and wiring so cloud_ops and the FastMCP app use those per-request values with environment fallbacks.

Changes

Cohort / File(s) Change Summary
Request-context module
airbyte/mcp/_request_context.py
Adds four ContextVars (CLOUD_CLIENT_ID_CVAR, CLOUD_CLIENT_SECRET_CVAR, CLOUD_WORKSPACE_ID_CVAR, CLOUD_API_URL_CVAR) and four getters (get_effective_cloud_client_id, get_effective_cloud_client_secret, get_effective_cloud_workspace_id, get_effective_cloud_api_url) that prefer header-scoped values and fall back to existing resolvers.
ASGI middleware
airbyte/mcp/_middleware.py
New HeaderAuthMiddleware class that extracts client ID, client secret, workspace ID, and API URL from case-insensitive HTTP headers, sets corresponding ContextVars for request duration, logs when headers are used, delegates to the app, and resets ContextVars in a finally block.
Cloud operations integration
airbyte/mcp/cloud_ops.py
Replaced direct environment resolver calls with the new get_effective_cloud_* getters when constructing/using CloudWorkspace; removed legacy cloud.auth imports and adjusted docstring to reflect header-based auth option.
Server wiring
airbyte/mcp/server.py
FastMCP app now includes HeaderAuthMiddleware in middleware list; register_cloud_ops_tools(app) is invoked during initialization; module/app docstring updated to mention header-based authentication via the middleware.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Middleware as HeaderAuthMiddleware
    participant Context as RequestContext (ContextVars)
    participant App as FastMCP / cloud_ops
    participant Env as EnvironmentResolvers

    Client->>Middleware: HTTP request with auth headers
    activate Middleware
    Middleware->>Middleware: Normalize headers & extract auth values
    Middleware->>Context: Set CLOUD_*_CVAR values (request-scoped)
    Middleware->>App: Forward request
    activate App
    App->>Context: Call get_effective_cloud_*()
    Context-->>App: Return header value if set
    alt header not set
        App->>Env: Call resolve_cloud_*() fallbacks
        Env-->>App: Return env value
    end
    App-->>Client: Response
    deactivate App
    Middleware->>Context: Reset CLOUD_*_CVAR values (finally)
    deactivate Middleware
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus:
    • airbyte/mcp/_middleware.py: async/await correctness, header normalization, robust finally cleanup
    • airbyte/mcp/_request_context.py: ContextVar types and correct fallback to resolvers (SecretString wrapping)
    • airbyte/mcp/cloud_ops.py and airbyte/mcp/server.py: correct wiring to use getters and middleware registration
  • Potential extra attention: SecretString handling when wrapping header-provided credentials; ensure no accidental global state leakage.

Would you like me to draft unit tests for HeaderAuthMiddleware and the request-context getters? Wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding HTTP header-based authentication support for cloud operations in the MCP server, which is directly reflected in all the file changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1763055706-mcp-http-header-auth

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte/mcp/_request_context.py (1)

36-38: Potential double-wrapping of SecretString values.

If header_value is already a SecretString (e.g., manually set in tests), line 38 wraps it again: SecretString(SecretString(...)). While this works due to inheritance, it's redundant.

Wdyt about checking if not isinstance(header_value, SecretString) before wrapping? Same applies to the other getter functions below.

airbyte/mcp/_middleware.py (1)

60-61: Header decoding could raise UnicodeDecodeError.

Line 61 decodes headers without error handling. If a malformed header contains invalid UTF-8, this would raise UnicodeDecodeError and crash the request. While rare (most ASGI servers validate headers), wdyt about adding a try-except to gracefully handle this edge case?

For example:

-        headers = scope.get("headers", [])
-        header_dict = {name.decode().lower(): value.decode() for name, value in headers}
+        headers = scope.get("headers", [])
+        try:
+            header_dict = {name.decode().lower(): value.decode() for name, value in headers}
+        except UnicodeDecodeError:
+            logger.warning("Failed to decode headers, proceeding without auth headers")
+            header_dict = {}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 261a28f and d4f3001.

📒 Files selected for processing (3)
  • airbyte/mcp/_middleware.py (1 hunks)
  • airbyte/mcp/_request_context.py (1 hunks)
  • airbyte/mcp/cloud_ops.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/mcp/cloud_ops.py (2)
airbyte/mcp/_request_context.py (3)
  • get_effective_cloud_client_id (30-39)
  • get_effective_cloud_client_secret (42-51)
  • get_effective_cloud_workspace_id (54-63)
airbyte/cloud/workspaces.py (1)
  • CloudWorkspace (64-612)
airbyte/mcp/_request_context.py (2)
airbyte/cloud/auth.py (3)
  • resolve_cloud_client_id (17-22)
  • resolve_cloud_client_secret (9-14)
  • resolve_cloud_workspace_id (36-41)
airbyte/secrets/base.py (1)
  • SecretString (38-143)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (6)
airbyte/mcp/cloud_ops.py (2)

15-19: Clean import migration to request context pattern.

The switch from resolve_* to get_effective_* functions properly reflects the new header-aware authentication approach. Looks good!


46-50: CloudWorkspace construction looks good with current implementation.

The usage of the new get_effective_* getters is correct. Once the API URL support is added (per my earlier comment), this would need to pass api_root=get_effective_cloud_api_url() to the constructor, wdyt?

airbyte/mcp/_middleware.py (3)

56-58: Correct scope type filtering.

Only processing HTTP requests and bypassing other scope types (like websocket) is the right approach for this middleware. STDIO mode will remain unaffected. 👍


73-94: Excellent ContextVar cleanup pattern.

The token tracking and finally block ensure CVARs are properly reset even if the app raises an exception. This is the correct concurrency-safe pattern for ContextVar usage in ASGI middleware. The debug logs also avoid exposing sensitive values. Well done!


96-109: Clean helper for header name fallback.

The implementation correctly handles trying multiple header name variants (with/without X- prefix) and returns the first match. Simple and effective!

airbyte/mcp/_request_context.py (1)

13-17: Missing API URL support - inconsistent with PR objectives.

The PR objectives state that X-Airbyte-Cloud-Api-Url / Airbyte-Cloud-Api-Url (optional) should be supported, but there's no resolve_cloud_api_url import, no CLOUD_API_URL_CVAR, and no get_effective_cloud_api_url() function in this file. The middleware also doesn't extract this header.

Was the api_url support intentionally deferred, or is this an oversight? If it's intentional, the PR objectives should be updated. Otherwise, wdyt about adding the missing pieces?

⛔ Skipped due to learnings
Learnt from: yohannj
Repo: airbytehq/PyAirbyte PR: 716
File: airbyte/logs.py:384-402
Timestamp: 2025-07-11T19:53:44.427Z
Learning: In the PyAirbyte project, when reviewing PRs, maintain clear separation of concerns. Don't suggest changes that are outside the scope of the PR's main objective, even if they would improve consistency or fix other issues. This helps with reviewing changes and potential reverts.

@github-actions
Copy link

PyTest Results (Fast Tests Only, No Creds)

320 tests  ±0   320 ✅ ±0   5m 44s ⏱️ -10s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit d4f3001. ± Comparison against base commit 7eb746b.

@github-actions
Copy link

PyTest Results (Full)

389 tests  ±0   373 ✅ +1   25m 50s ⏱️ - 4m 34s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     0 ❌  - 1 

Results for commit d4f3001. ± Comparison against base commit 7eb746b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants