Skip to content

feat(control-plane): replace TokenReview with RSA keypair auth for runner token endpoint#1216

Merged
markturansky merged 3 commits intoalphafrom
fix/cp-runner-rsa-auth
Apr 5, 2026
Merged

feat(control-plane): replace TokenReview with RSA keypair auth for runner token endpoint#1216
markturansky merged 3 commits intoalphafrom
fix/cp-runner-rsa-auth

Conversation

@markturansky
Copy link
Copy Markdown
Contributor

@markturansky markturansky commented Apr 5, 2026

Summary

  • CP bootstraps an RSA-4096 keypair Secret (ambient-cp-token-keypair) in its namespace on startup via the project kube client; generates if missing
  • Private key loaded into token server for decryption; public key injected as AMBIENT_CP_TOKEN_PUBLIC_KEY into all runner Job pods
  • Runners RSA-OAEP/SHA-256 encrypt their SESSION_ID with the public key, send base64 ciphertext as Authorization: Bearer
  • CP decrypts to verify the caller — no TokenReview cluster permission required
  • Keypair persists across CP restarts in the K8s Secret; future path is Vault-backed ExternalSecret with no code change

Motivation

The CP SA does not have (and cannot be granted via tenant operator) cluster-scoped create tokenreviews permission. The previous TokenReview-based validation returned 401 for all runners.

Test plan

  • Deploy CP — confirm ambient-cp-token-keypair Secret created in CP namespace on first boot
  • Create a session — confirm runner pod starts without CP token endpoint unreachable error
  • Confirm acpctl session events $id streams without 502
  • Restart CP — confirm runner pods created after restart can still fetch tokens

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Automatic control-plane token keypair bootstrap on startup.
    • Added token server health check endpoint.
  • Refactor

    • Token authentication now uses RSA-encrypted session IDs with local cryptographic validation.
    • Runner now encrypts session IDs with the control-plane public key; public key is injected into runtime containers.
    • NetworkPolicy added to restrict access to the token endpoint.
  • Chores

    • Added cryptography dependency.
  • Tests

    • New unit tests for keypair bootstrapping, token handling, and runner token fetch behavior.

Ambient Code Bot and others added 2 commits April 4, 2026 20:34
…nner token endpoint

CP bootstraps an RSA-4096 keypair Secret in its namespace on startup (via project
kube client) if one does not exist. The private key decrypts bearer tokens sent by
runners; the public key is injected into runner Job pods as AMBIENT_CP_TOKEN_PUBLIC_KEY.

Runners encrypt their SESSION_ID with the public key (RSA-OAEP/SHA-256) and send
the base64 ciphertext as the Authorization header. CP decrypts it to verify the
caller is a legitimate runner without needing TokenReview cluster permissions.

Survives CP restarts: keypair persists in the Kubernetes Secret.
Future path: replace Secret with Vault-backed ExternalSecret, no code change needed.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7ffc90fa-0687-43de-b05d-b6ac4939fdea

📥 Commits

Reviewing files that changed from the base of the PR and between 68cf56a and a391263.

📒 Files selected for processing (3)
  • components/ambient-control-plane/internal/keypair/bootstrap_test.go
  • components/ambient-control-plane/internal/tokenserver/handler_test.go
  • components/runners/ambient-runner/tests/test_grpc_client.py

📝 Walkthrough

Walkthrough

Control plane now bootstraps an RSA keypair stored as a Kubernetes Secret, provides the public key to runners via env, and the token server decrypts RSA-OAEP-encrypted bearer tokens from runners to extract session IDs and mint tokens.

Changes

Cohort / File(s) Summary
Control Plane bootstrap & main
components/ambient-control-plane/cmd/ambient-control-plane/main.go
Bootstraps/ensures RSA keypair Secret for cfg.CPRuntimeNamespace, parses private key, passes KeyPair/private key into token server startup; removed Kubernetes REST config helper usage.
Keypair management
components/ambient-control-plane/internal/keypair/bootstrap.go, components/ambient-control-plane/internal/keypair/bootstrap_test.go
New package to read-or-create 4096-bit RSA keypair stored in a Secret (private.pem/public.pem), plus ParsePrivateKey and unit tests for generation, parsing, and Secret handling.
Token server refactor
components/ambient-control-plane/internal/tokenserver/server.go, components/ambient-control-plane/internal/tokenserver/handler.go, components/ambient-control-plane/internal/tokenserver/handler_test.go
Server New now accepts a parsed *rsa.PrivateKey (removed rest.Config/client creation). Handler replaces TokenReview validation with RSA-OAEP decryption of Base64 Bearer tokens to obtain session IDs; handler gained handleHealthz and unit tests cover decryption/validation flows.
Reconciler env injection
components/ambient-control-plane/internal/reconciler/kube_reconciler.go
Added CPTokenPublicKey to KubeReconcilerConfig and injects AMBIENT_CP_TOKEN_PUBLIC_KEY into runner and MCP sidecar env.
Runner client changes & tests
components/runners/ambient-runner/ambient_runner/_grpc_client.py, components/runners/ambient-runner/tests/test_grpc_client.py, components/runners/ambient-runner/pyproject.toml
Runner now encrypts SESSION_ID with CP public RSA key via RSA-OAEP(SHA-256), base64-encodes it, and sends as Authorization: Bearer <encrypted> to CP /token. Added cryptography dependency and tests covering encryption, request headers, retries, and env-driven behavior.
Manifests / NetworkPolicy
components/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yaml, components/manifests/overlays/mpp-openshift/kustomization.yaml
New ingress-only NetworkPolicy allowing TCP 8080 to CP pods from namespaces labeled tenant.paas.redhat.com/tenant: ambient-code; added to OpenShift overlay kustomization.

Sequence Diagram(s)

sequenceDiagram
    participant CP as Control Plane
    participant K8s as Kubernetes API
    participant Runner as Ambient Runner
    participant TokenServer as Token Server

    Note over CP: Startup/bootstrap
    CP->>K8s: EnsureKeypairSecret(namespace)
    K8s-->>CP: Return KeyPair (private.pem, public.pem)
    CP->>Runner: Set AMBIENT_CP_TOKEN_PUBLIC_KEY env var (public.pem)

    Note over Runner,TokenServer: Token exchange
    Runner->>Runner: Load SESSION_ID
    Runner->>Runner: Encrypt SESSION_ID with CP public key (RSA-OAEP SHA-256)
    Runner->>Runner: Base64-encode ciphertext
    Runner->>TokenServer: POST /token with Authorization: Bearer <ciphertext>

    rect rgba(200,150,100,0.5)
        TokenServer->>TokenServer: Base64-decode bearer token
        TokenServer->>TokenServer: Decrypt with private key (RSA-OAEP)
        TokenServer->>TokenServer: Validate session ID format
    end

    alt valid session ID
        TokenServer->>TokenServer: Mint access token
        TokenServer-->>Runner: 200 OK + token
    else decryption failure
        TokenServer-->>Runner: 401 Unauthorized
    else invalid session ID
        TokenServer-->>Runner: 403 Forbidden
    end
Loading

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Security And Secret Handling ❌ Error SESSION_ID logged in plaintext (handler.go lines 48, 55, 60) and Kubernetes Secret missing ownerReferences (bootstrap.go lines 46-64). Remove session_id from logs or replace with hash; add ownerReferences to Secret metadata for proper garbage collection.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Kubernetes Resource Safety ⚠️ Warning Secret 'ambient-cp-token-keypair' in bootstrap.go lacks ownerReferences, causing orphaned resources after CP Deployment deletion. Add ownerReferences field to Secret metadata referencing CP Deployment with apiVersion, kind, name, and uid for garbage collection.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (feat(scope): description) and accurately describes the main change: replacing TokenReview with RSA keypair-based authentication for runner token endpoint.
Performance And Algorithmic Complexity ✅ Passed RSA-4096 OAEP-SHA256 token validation introduces acceptable cryptographic overhead. No algorithmic complexity issues: keypair bootstrap is O(1) single GetSecret call at startup, token handler decryption is O(1) per-request with no loops or N+1 patterns, and no unbounded cache growth detected.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cp-runner-rsa-auth
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/cp-runner-rsa-auth

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/ambient-control-plane/internal/keypair/bootstrap.go`:
- Around line 81-83: In keypairFromSecret, don't ignore the error returned by
unstructured.NestedMap(secret.Object, "data"); instead capture both the map and
the error (and the existence bool), validate that the call succeeded and return
a descriptive error if it failed or if the "data" field is missing/invalid so
downstream lookups on `data` don't panic—update keypairFromSecret to check the
returned error and existence flag and propagate a clear error when malformed
Secret data is detected.

In `@components/runners/ambient-runner/ambient_runner/_grpc_client.py`:
- Around line 37-48: The _encrypt_session_id function currently calls
serialization.load_pem_public_key(public_key_pem.encode()) without handling
malformed PEMs; wrap the load_pem_public_key call (and the subsequent encrypt)
in a try/except that catches ValueError (and optionally TypeError) raised for
invalid/malformed PEMs and raise or return a clear, domain-specific error
message (e.g., raise ValueError("Invalid CP public key PEM in
_encrypt_session_id: ...") or log and rethrow) so callers get a descriptive
failure instead of an unhandled exception; keep the rest of the RSA-OAEP flow
(padding, hashes, base64 encoding) unchanged and reference _encrypt_session_id
and the public_key_pem parameter when updating error text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5ff26ea5-5556-4eb7-bff4-6b8048688778

📥 Commits

Reviewing files that changed from the base of the PR and between f5f7516 and 68cf56a.

⛔ Files ignored due to path filters (1)
  • components/runners/ambient-runner/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • components/ambient-control-plane/cmd/ambient-control-plane/main.go
  • components/ambient-control-plane/internal/keypair/bootstrap.go
  • components/ambient-control-plane/internal/reconciler/kube_reconciler.go
  • components/ambient-control-plane/internal/tokenserver/handler.go
  • components/ambient-control-plane/internal/tokenserver/server.go
  • components/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yaml
  • components/manifests/overlays/mpp-openshift/kustomization.yaml
  • components/runners/ambient-runner/ambient_runner/_grpc_client.py
  • components/runners/ambient-runner/pyproject.toml

Comment on lines +81 to +83
func keypairFromSecret(secret *unstructured.Unstructured) (*KeyPair, error) {
data, _, _ := unstructured.NestedMap(secret.Object, "data")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silently ignoring NestedMap error may mask corrupted Secret data.

If the Secret exists but has a malformed structure, the error is discarded and the code proceeds to fail confusingly on subsequent key lookups.

Proposed fix
 func keypairFromSecret(secret *unstructured.Unstructured) (*KeyPair, error) {
-	data, _, _ := unstructured.NestedMap(secret.Object, "data")
+	data, found, err := unstructured.NestedMap(secret.Object, "data")
+	if err != nil || !found {
+		return nil, fmt.Errorf("keypair secret has invalid or missing data field")
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func keypairFromSecret(secret *unstructured.Unstructured) (*KeyPair, error) {
data, _, _ := unstructured.NestedMap(secret.Object, "data")
func keypairFromSecret(secret *unstructured.Unstructured) (*KeyPair, error) {
data, found, err := unstructured.NestedMap(secret.Object, "data")
if err != nil || !found {
return nil, fmt.Errorf("keypair secret has invalid or missing data field")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/ambient-control-plane/internal/keypair/bootstrap.go` around lines
81 - 83, In keypairFromSecret, don't ignore the error returned by
unstructured.NestedMap(secret.Object, "data"); instead capture both the map and
the error (and the existence bool), validate that the call succeeded and return
a descriptive error if it failed or if the "data" field is missing/invalid so
downstream lookups on `data` don't panic—update keypairFromSecret to check the
returned error and existence flag and propagate a clear error when malformed
Secret data is detected.

Comment on lines +37 to +48
def _encrypt_session_id(public_key_pem: str, session_id: str) -> str:
"""RSA-OAEP encrypt session_id with the CP public key, return base64-encoded ciphertext."""
public_key = serialization.load_pem_public_key(public_key_pem.encode())
ciphertext = public_key.encrypt(
session_id.encode(),
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None,
),
)
return base64.b64encode(ciphertext).decode()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing error handling for invalid public key PEM.

load_pem_public_key raises ValueError if the PEM is malformed. This would surface as an unhandled exception rather than a clear error message.

Proposed fix
 def _encrypt_session_id(public_key_pem: str, session_id: str) -> str:
     """RSA-OAEP encrypt session_id with the CP public key, return base64-encoded ciphertext."""
-    public_key = serialization.load_pem_public_key(public_key_pem.encode())
+    try:
+        public_key = serialization.load_pem_public_key(public_key_pem.encode())
+    except (ValueError, TypeError) as e:
+        raise RuntimeError(f"invalid CP token public key: {e}") from e
     ciphertext = public_key.encrypt(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _encrypt_session_id(public_key_pem: str, session_id: str) -> str:
"""RSA-OAEP encrypt session_id with the CP public key, return base64-encoded ciphertext."""
public_key = serialization.load_pem_public_key(public_key_pem.encode())
ciphertext = public_key.encrypt(
session_id.encode(),
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None,
),
)
return base64.b64encode(ciphertext).decode()
def _encrypt_session_id(public_key_pem: str, session_id: str) -> str:
"""RSA-OAEP encrypt session_id with the CP public key, return base64-encoded ciphertext."""
try:
public_key = serialization.load_pem_public_key(public_key_pem.encode())
except (ValueError, TypeError) as e:
raise RuntimeError(f"invalid CP token public key: {e}") from e
ciphertext = public_key.encrypt(
session_id.encode(),
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None,
),
)
return base64.b64encode(ciphertext).decode()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/runners/ambient-runner/ambient_runner/_grpc_client.py` around
lines 37 - 48, The _encrypt_session_id function currently calls
serialization.load_pem_public_key(public_key_pem.encode()) without handling
malformed PEMs; wrap the load_pem_public_key call (and the subsequent encrypt)
in a try/except that catches ValueError (and optionally TypeError) raised for
invalid/malformed PEMs and raise or return a clear, domain-specific error
message (e.g., raise ValueError("Invalid CP public key PEM in
_encrypt_session_id: ...") or log and rethrow) so callers get a descriptive
failure instead of an unhandled exception; keep the rest of the RSA-OAEP flow
(padding, hashes, base64 encoding) unchanged and reference _encrypt_session_id
and the public_key_pem parameter when updating error text.

… and runner encryption

- keypair/bootstrap_test.go: generateKeypair, ParsePrivateKey, keypairFromSecret (valid/missing keys), EnsureKeypairSecret (creates on missing, reuses existing)
- tokenserver/handler_test.go: success path, missing/wrong auth header, invalid base64, wrong RSA key, method not allowed, isValidSessionID table, decrypt round-trip
- tests/test_grpc_client.py: URL validation, RSA-OAEP encrypt/decrypt round-trip, ciphertext randomness, fetch retries, HTTP error body capture, missing token field, from_env integration

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@markturansky markturansky merged commit b0ed2b8 into alpha Apr 5, 2026
45 checks passed
@markturansky markturansky deleted the fix/cp-runner-rsa-auth branch April 5, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant