Skip to content

fix(api-server): authorize runner OIDC service account in WatchSessionMessages#1183

Merged
markturansky merged 3 commits intoalphafrom
fix/cp-credential-rolebinding-and-project-delete
Apr 3, 2026
Merged

fix(api-server): authorize runner OIDC service account in WatchSessionMessages#1183
markturansky merged 3 commits intoalphafrom
fix/cp-credential-rolebinding-and-project-delete

Conversation

@markturansky
Copy link
Copy Markdown
Contributor

@markturansky markturansky commented Apr 3, 2026

Summary

  • Runner's BOT_TOKEN is an OIDC JWT with preferred_username=service-account-ocm-ams-service
  • AMBIENT_API_TOKEN is not set on the api-server deployment → IsServiceCaller is always false
  • Runner's JWT was parsed as a regular user → ownership check fired → PERMISSION_DENIED: not authorized to watch this session
  • Fix: read GRPC_SERVICE_ACCOUNT env var at startup into a package-level var; bypass ownership enforcement when the authenticated username matches that value

Test plan

  • Deploy updated api-server image
  • Run demo-github.sh — runner pod should connect to gRPC stream and execute the task without PERMISSION_DENIED
  • Verify session reaches Completed phase

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added service account authentication support for gRPC operations
    • Introduced automatic token refresh for long-running sessions
  • Bug Fixes

    • Fixed unnecessary credential patch emissions when values haven't changed
    • Improved event handler retry mechanisms for better reliability

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces session-based service account authentication, improves credential patch change detection to avoid unnecessary updates, and adds background token refresh for running SDK sessions. Authorization gates skip ownership validation when requests originate from configured service accounts. Pod volumes now mount credential tokens, and periodic background loops refresh them.

Changes

Cohort / File(s) Summary
Service Account Authorization
components/ambient-api-server/plugins/sessions/grpc_handler.go, components/ambient-api-server/plugins/sessions/plugin.go
Extended gRPC handler to accept and store grpcServiceAccount parameter from environment; modified WatchSessionMessages authorization gate to skip per-session ownership checks when caller username matches grpcServiceAccount.
Credential Patch Refinement
components/ambient-cli/cmd/acpctl/apply/cmd.go
Updated buildCredentialPatch to only emit patch fields when manifest values differ from existing credential; extended strategicMerge to support overriding provider, token, url, and email fields.
Demo Script Credential Flow
components/ambient-cli/demo-github.sh
Restructured credential creation from direct invoke to YAML manifest + apply pattern; updated role binding to use credential:token-reader role instead of credential:reader; modified cleanup to short-circuit when NO_CLEANUP is set.
Per-Handler Event Retry
components/ambient-control-plane/internal/informer/informer.go
Refactored retry mechanism to target specific handlers rather than replay all handlers; added handlerIndex to retry events and introduced dispatchHandler and scheduleRetry functions for granular retry logic with exponential backoff.
Token Refresh & Secret Management
components/ambient-control-plane/internal/reconciler/kube_reconciler.go, components/ambient-control-plane/cmd/ambient-control-plane/main.go, components/ambient-control-plane/internal/kubeclient/kubeclient.go
Pod spec now mounts credential secret's api-token to /var/run/secrets/ambient/bot-token; added background token refresh loop that periodically updates running session secrets; added UpdateSecret method to KubeClient for dynamic secret updates.

Sequence Diagrams

sequenceDiagram
    participant Timer as 10-min Ticker
    participant Reconciler as SimpleKubeReconciler
    participant KubeClient as KubeClient
    participant API as Kubernetes API
    participant Secret as Session Secret

    Timer->>Reconciler: tick
    Reconciler->>Reconciler: refreshAllRunningTokens()
    Reconciler->>KubeClient: ListSessions(Running phase)
    KubeClient->>API: List SDK Sessions
    API-->>KubeClient: session list
    KubeClient-->>Reconciler: sessions

    loop for each running session
        Reconciler->>Reconciler: compute namespace
        Reconciler->>Reconciler: refreshRunnerToken(sessionID)
        Reconciler->>KubeClient: GetSecret(by sessionID name)
        KubeClient->>API: Get Secret
        API-->>Secret: retrieved
        Secret-->>KubeClient: secret data
        KubeClient-->>Reconciler: secret obj
        Reconciler->>Reconciler: resolve fresh token
        Reconciler->>KubeClient: UpdateSecret(api-token)
        KubeClient->>API: Update Secret
        API-->>KubeClient: updated
    end
Loading
sequenceDiagram
    participant Source as Event Source
    participant Dispatcher as dispatchEvent()
    participant Handler as Handler[i]
    participant Retry as scheduleRetry()
    participant Queue as Retry Queue
    participant Timer as time.NewTimer()

    Source->>Dispatcher: ResourceEvent
    Dispatcher->>Dispatcher: iterate handlers by index

    rect rgba(200, 100, 100, 0.5)
    Dispatcher->>Handler: invoke handler[i]
    Handler-->>Dispatcher: error
    end

    Dispatcher->>Retry: ctx, event, handlerIndex, attempt, err
    Retry->>Retry: compute exponential backoff + cap
    Retry->>Queue: enqueue retryEvent{event, handlerIndex, attempt, fireAt}
    Retry->>Retry: log warning with handler index

    Queue->>Timer: wait until fireAt
    Timer-->>Queue: timer fires
    Queue->>Dispatcher: dispatchHandler(ctx, event, handlerIndex, attempt)
    Dispatcher->>Handler: invoke handler[handlerIndex] only
    Handler-->>Dispatcher: result
Loading

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Security And Secret Handling ❌ Error Kubernetes Secrets for runner tokens lack ownerReferences, causing orphaned secrets and breaking resource lifecycle management. Add ownerReferences to secret metadata in ensureSecret() linking to Pod or SessionController; verify refreshRunnerToken() preserves these references.
Docstring Coverage ⚠️ Warning Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Kubernetes Resource Safety ⚠️ Warning Child Kubernetes resources lack ownerReferences, preventing garbage collection. Pod-level security contexts (runAsNonRoot, fsGroup) are missing despite container-level configurations. Add ownerReferences to all dynamically created resources and implement pod-level securityContext with runAsNonRoot and fsGroup for proper lifecycle management and security hardening.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title follows Conventional Commits format with type 'fix' and scope 'api-server', accurately describing the core fix: authorizing runner OIDC service account in WatchSessionMessages.
Performance And Algorithmic Complexity ✅ Passed PR implements proper pagination (Size: 100), prevents timer leaks, avoids N+1 patterns. Minor inefficiency: factory.Token(ctx) called per-session instead of once, but mitigated by 30-second TTL caching and 10-minute refresh cadence.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cp-credential-rolebinding-and-project-delete
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/cp-credential-rolebinding-and-project-delete

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/ambient-cli/cmd/acpctl/apply/cmd.go`:
- Around line 25-31: The Credential overlay fields provider, token, url, and
email are being dropped during kustomize handling because strategicMerge() does
not carry those keys through, so acpctl apply -k can reconcile an incomplete
Credential; update strategicMerge() (and the code paths that call it) so that
when the resource Kind is "Credential" it copies/merges provider, token, url,
and email from the overlay into the resulting merged map/Unstructured before
applyCredential() runs (or alternatively make applyCredential() read from the
merged Unstructured returned by strategicMerge() rather than the original),
ensuring the overlay values are preserved and used during apply.
- Around line 290-319: buildCredentialPatch currently marks url, email, token,
labels and annotations as changed whenever present instead of comparing them to
existing values; update buildCredentialPatch (and use
sdktypes.NewCredentialPatchBuilder) to only call
patch.Url/Email/Token/Labels/Annotations and set changed=true when the new value
differs from the corresponding existing field (compare token after os.ExpandEnv
to existing.Token, compare URL and Email strings, and compare labels/annotations
by converting with marshalStringMap and using a map-equality check such as
reflect.DeepEqual or a helper equality function).

In `@components/ambient-cli/demo-github.sh`:
- Around line 281-287: The current credential creation passes the GitHub PAT on
the command line via the `--token "${GITHUB_TOKEN_VALUE}"` argument (in the
`CRED_JSON=$( "$ACPCTL" credential create ... )` block), which exposes the
secret in process arguments; change the `credential create` invocation to avoid
`--token` and instead provide the PAT via a secure channel such as stdin (pipe
the token into the `"$ACPCTL" credential create` command) or use the
manifest/env-expansion mechanism supported by `credential create` (i.e.,
reference an env/manifest variable rather than passing `GITHUB_TOKEN_VALUE` on
argv) so the secret is not visible in process args or logs.
- Around line 302-333: The script currently looks up and binds the wrong role
name 'credential:reader'; change the JSON lookup to find
'credential:token-reader' instead of 'credential:reader' (update the python3
snippet that sets READER_ROLE_ID to search for 'credential:token-reader' and
consider renaming READER_ROLE_ID to TOKEN_READER_ROLE_ID for clarity), and when
creating the role-binding via the "$ACPCTL" create role-binding command, use
that token-reader role ID variable (the same variable used in the dim output and
passed to --role-id) so the binding grants credential:token-reader rather than
credential:reader.

In `@components/ambient-control-plane/internal/informer/informer.go`:
- Around line 88-92: The retry logic currently requeues the entire ResourceEvent
(type retryEvent) on any handler error which causes exponential fan-out; modify
retryEvent to include a handler identifier (e.g., failedHandlerID string or
handlerIndex int) and change the dispatch/enqueue logic (the code that writes to
retryCh and the event processing loop) so only the single failing handler is
retried instead of the whole event; update handler registration/dispatch code to
pass and consume that handler identifier (and track per-handler attempt counts
in retryEvent.attempt) so retries target a specific handler and do not duplicate
dispatches to all handlers.
- Around line 157-171: The retryLoop in Informer reads a single entry from
inf.retryCh and sleeps for that entry's fireAt, which blocks newer, earlier-fire
entries behind long waits; change the loop to always determine the earliest
pending fireAt before sleeping (e.g., drain or peek the channel into a
min-heap/priority queue keyed by re.fireAt, or maintain a separate priority
structure) and only sleep until that earliest timestamp, then pop and call
inf.dispatchEvent with the popped retry's event and attempt; ensure reads from
inf.retryCh insert into the priority queue and that ctx.Done cancels the sleep
so newer short-backoff retries are not starved by older long-backoff ones.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee4f0025-0d42-4cfb-b8f3-f6dd57bd7d3b

📥 Commits

Reviewing files that changed from the base of the PR and between 281a643 and 8e11870.

📒 Files selected for processing (11)
  • components/ambient-api-server/plugins/sessions/grpc_handler.go
  • components/ambient-api-server/plugins/sessions/plugin.go
  • components/ambient-cli/cmd/acpctl/agent/cmd.go
  • components/ambient-cli/cmd/acpctl/apply/cmd.go
  • components/ambient-cli/cmd/acpctl/delete/cmd.go
  • components/ambient-cli/cmd/acpctl/describe/cmd.go
  • components/ambient-cli/cmd/acpctl/get/cmd.go
  • components/ambient-cli/demo-github.sh
  • components/ambient-control-plane/internal/informer/informer.go
  • components/ambient-control-plane/internal/reconciler/kube_reconciler.go
  • components/ambient-control-plane/internal/reconciler/project_reconciler.go
💤 Files with no reviewable changes (1)
  • components/ambient-control-plane/internal/reconciler/kube_reconciler.go

Comment on lines +157 to +171
func (inf *Informer) retryLoop(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case re := <-inf.retryCh:
wait := time.Until(re.fireAt)
if wait > 0 {
select {
case <-time.After(wait):
case <-ctx.Done():
return
}
}
inf.dispatchEvent(ctx, re.event, re.attempt)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The retry worker can starve fresh retries behind older long-backoff entries.

This loop sleeps on the first dequeued retry instead of the earliest fireAt. If a 30s retry is read before a newer 2s retry, the 2s retry sits in retryCh until the 30s sleep finishes. That violates the configured backoff timing and delays retries across all resources far longer than intended.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/ambient-control-plane/internal/informer/informer.go` around lines
157 - 171, The retryLoop in Informer reads a single entry from inf.retryCh and
sleeps for that entry's fireAt, which blocks newer, earlier-fire entries behind
long waits; change the loop to always determine the earliest pending fireAt
before sleeping (e.g., drain or peek the channel into a min-heap/priority queue
keyed by re.fireAt, or maintain a separate priority structure) and only sleep
until that earliest timestamp, then pop and call inf.dispatchEvent with the
popped retry's event and attempt; ensure reads from inf.retryCh insert into the
priority queue and that ctx.Done cancels the sleep so newer short-backoff
retries are not starved by older long-backoff ones.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/ambient-control-plane/internal/reconciler/kube_reconciler.go`:
- Around line 920-940: The refreshAllRunningTokens function currently only lists
page 1 with Size 500 so it misses sessions on later pages; change it to paginate
through all pages by looping with a page counter (e.g., page := 1; for { ...
page++ }) calling sdk.Sessions().List(ctx, &types.ListOptions{Page: page, Size:
500, Search: "phase = 'Running'"}), process each list.Items as before, and break
the loop when len(list.Items) == 0 or len(list.Items) < 500 (or when a
total/pages field indicates no more pages); ensure you continue on per-page List
errors by logging and either retrying or breaking as appropriate so
refreshRunnerToken(session.ID) is attempted for every running session across all
pages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e501c092-8ef5-42db-9418-190f720be46f

📥 Commits

Reviewing files that changed from the base of the PR and between 8e11870 and 9a15e8c.

📒 Files selected for processing (3)
  • components/ambient-control-plane/cmd/ambient-control-plane/main.go
  • components/ambient-control-plane/internal/kubeclient/kubeclient.go
  • components/ambient-control-plane/internal/reconciler/kube_reconciler.go

markturansky pushed a commit that referenced this pull request Apr 3, 2026
informer:
- Retry per failing handler (not whole event) to prevent exponential
  fan-out when multiple handlers are registered for a resource
- Use time.NewTimer per entry instead of blocking sleep on dequeued item,
  eliminating retry starvation when long-backoff entries precede short ones
- Add handler index to retryEvent; introduce dispatchHandler and
  scheduleRetry helpers

control-plane:
- Token refresh loop now paginates all running sessions (100/page)
  instead of capping at a single page of 500

cli/apply:
- strategicMerge now propagates Credential fields (provider, token,
  url, email) so acpctl apply -k overlays work correctly
- buildCredentialPatch compares each field against the existing value
  before marking changed, making repeated applies idempotent

demo-github.sh:
- Pass GitHub PAT via apply -f manifest with env-var expansion
  ($DEMO_GITHUB_PAT) rather than --token argv to avoid exposure in
  ps/crash dumps; fetch credential ID with get credentials post-apply
- Look up credential:token-reader role (not credential:reader) so the
  session can fetch tokens via GET /credentials/{id}/token

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/ambient-control-plane/internal/informer/informer.go (1)

158-175: ⚠️ Potential issue | 🟠 Major

Retry timing is still FIFO, not deadline-based.

Lines 163-172 sleep on the first dequeued item. If a long-backoff retry is read before a newer short-backoff retry, the short retry waits behind it; once retryCh backs up, the send on Line 220 blocks fresh event handling too. Pick the earliest fireAt before sleeping.

Also applies to: 219-220

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/ambient-control-plane/internal/informer/informer.go` around lines
158 - 175, The retryLoop currently sleeps on the first received retry (from
inf.retryCh) which makes timing FIFO instead of deadline-based; change retryLoop
to, after the initial receive of re from inf.retryCh, perform non-blocking reads
of inf.retryCh to collect any queued retries, compute the earliest fireAt among
them, and use that earliest fireAt to decide the wait; push the remaining
retries into a local min-heap or reinsert them into the retry queue (or replace
inf.retryCh with a priority queue stored on Informer) so dispatchHandler still
receives retries in deadline order and senders to inf.retryCh do not block.
Ensure the logic references Informer.retryLoop, inf.retryCh, re.fireAt and uses
inf.dispatchHandler to process the selected retry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/ambient-control-plane/internal/informer/informer.go`:
- Around line 158-175: The retryLoop is replaying stale events directly via
inf.dispatchHandler (function Informer.retryLoop), bypassing ordered dispatch
and current-cache checks; change retry handling to validate the resource is
still current or re-queue the retry into the ordered dispatcher instead of
calling dispatchHandler directly: in Informer.retryLoop, when pulling from
retryCh (re), either fetch the current object/revision from the informer
cache/store and compare its UID/resourceVersion (or equivalent) against re.event
metadata and only call dispatchHandler if it matches, or push re back into the
same channel/queue consumed by dispatchLoop so ordering and current-state checks
happen there (use identifiers like retryCh, dispatchCh/dispatchLoop,
dispatchHandler, and the informer cache/get method to locate where to implement
the gate).

---

Duplicate comments:
In `@components/ambient-control-plane/internal/informer/informer.go`:
- Around line 158-175: The retryLoop currently sleeps on the first received
retry (from inf.retryCh) which makes timing FIFO instead of deadline-based;
change retryLoop to, after the initial receive of re from inf.retryCh, perform
non-blocking reads of inf.retryCh to collect any queued retries, compute the
earliest fireAt among them, and use that earliest fireAt to decide the wait;
push the remaining retries into a local min-heap or reinsert them into the retry
queue (or replace inf.retryCh with a priority queue stored on Informer) so
dispatchHandler still receives retries in deadline order and senders to
inf.retryCh do not block. Ensure the logic references Informer.retryLoop,
inf.retryCh, re.fireAt and uses inf.dispatchHandler to process the selected
retry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 380b4aaa-7657-4c80-92b4-b6ea3326de4e

📥 Commits

Reviewing files that changed from the base of the PR and between 9a15e8c and f3fd99c.

📒 Files selected for processing (4)
  • components/ambient-cli/cmd/acpctl/apply/cmd.go
  • components/ambient-cli/demo-github.sh
  • components/ambient-control-plane/internal/informer/informer.go
  • components/ambient-control-plane/internal/reconciler/kube_reconciler.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/ambient-control-plane/internal/reconciler/kube_reconciler.go

Comment on lines +158 to +175
func (inf *Informer) retryLoop(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case re := <-inf.retryCh:
wait := time.Until(re.fireAt)
if wait > 0 {
timer := time.NewTimer(wait)
select {
case <-timer.C:
case <-ctx.Done():
timer.Stop()
return
}
}
inf.dispatchHandler(ctx, re.event, re.handlerIndex, re.attempt)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not replay stale events out of order.

Line 174 runs the queued snapshot from retryLoop, outside the ordered dispatchLoop, with no check that the resource is still current. An older failed add/modify can therefore run after a newer delete/update and reapply stale state; for example, components/ambient-control-plane/internal/reconciler/kube_reconciler.go:122-175 provisions from event.Object.Session without re-fetching the session, so a deleted session can be reprovisioned on retry. Gate retries on the current cached revision/object, or funnel them back through the ordered dispatcher.

Also applies to: 204-220

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/ambient-control-plane/internal/informer/informer.go` around lines
158 - 175, The retryLoop is replaying stale events directly via
inf.dispatchHandler (function Informer.retryLoop), bypassing ordered dispatch
and current-cache checks; change retry handling to validate the resource is
still current or re-queue the retry into the ordered dispatcher instead of
calling dispatchHandler directly: in Informer.retryLoop, when pulling from
retryCh (re), either fetch the current object/revision from the informer
cache/store and compare its UID/resourceVersion (or equivalent) against re.event
metadata and only call dispatchHandler if it matches, or push re back into the
same channel/queue consumed by dispatchLoop so ordering and current-state checks
happen there (use identifiers like retryCh, dispatchCh/dispatchLoop,
dispatchHandler, and the informer cache/get method to locate where to implement
the gate).

@ambient-code
Copy link
Copy Markdown
Contributor

ambient-code bot commented Apr 3, 2026

Review Queue Status

Check Status Detail
CI pass
Conflicts FAIL Branch has merge conflicts
Reviews pass

Action needed: Resolve merge conflicts (rebase on main)

Auto-generated by Review Queue workflow. Updated when PR changes.

Ambient Code Bot and others added 3 commits April 3, 2026 14:51
…nMessages

Runner's BOT_TOKEN (OIDC JWT with preferred_username=GRPC_SERVICE_ACCOUNT)
was failing ownership check because AMBIENT_API_TOKEN is not set, making
IsServiceCaller always false. Read GRPC_SERVICE_ACCOUNT env var at startup
and bypass ownership enforcement when the authenticated username matches.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…resh loop

OIDC client-credentials JWTs expire in minutes. The runner's BOT_TOKEN was
injected only as a frozen env var, so reconnect-on-UNAUTHENTICATED re-read
the same stale value in a tight loop.

- Mount the session-<id>-creds Secret as a file volume at
  /var/run/secrets/ambient/bot-token so kubelet propagates updates
  and the runner's file-read on reconnect picks up fresh tokens
- Add UpdateSecret to KubeClient
- Add StartTokenRefreshLoop to SimpleKubeReconciler: every 10 minutes,
  list running sessions and update each creds secret with a fresh OIDC token
- Wire the refresh loop into main.go runKubeMode for kube reconcilers

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
informer:
- Retry per failing handler (not whole event) to prevent exponential
  fan-out when multiple handlers are registered for a resource
- Use time.NewTimer per entry instead of blocking sleep on dequeued item,
  eliminating retry starvation when long-backoff entries precede short ones
- Add handler index to retryEvent; introduce dispatchHandler and
  scheduleRetry helpers

control-plane:
- Token refresh loop now paginates all running sessions (100/page)
  instead of capping at a single page of 500

cli/apply:
- strategicMerge now propagates Credential fields (provider, token,
  url, email) so acpctl apply -k overlays work correctly
- buildCredentialPatch compares each field against the existing value
  before marking changed, making repeated applies idempotent

demo-github.sh:
- Pass GitHub PAT via apply -f manifest with env-var expansion
  ($DEMO_GITHUB_PAT) rather than --token argv to avoid exposure in
  ps/crash dumps; fetch credential ID with get credentials post-apply
- Look up credential:token-reader role (not credential:reader) so the
  session can fetch tokens via GET /credentials/{id}/token

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@markturansky markturansky force-pushed the fix/cp-credential-rolebinding-and-project-delete branch from f3fd99c to 85c7215 Compare April 3, 2026 18:54
@markturansky markturansky merged commit d0a19d9 into alpha Apr 3, 2026
44 checks passed
@markturansky markturansky deleted the fix/cp-credential-rolebinding-and-project-delete branch April 3, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant