Skip to content

feat: add stopOnRunFinished + improve activity tracking#1191

Closed
Gkrumbach07 wants to merge 1 commit intomainfrom
feat/amber-handler-v2
Closed

feat: add stopOnRunFinished + improve activity tracking#1191
Gkrumbach07 wants to merge 1 commit intomainfrom
feat/amber-handler-v2

Conversation

@Gkrumbach07
Copy link
Copy Markdown
Contributor

@Gkrumbach07 Gkrumbach07 commented Apr 3, 2026

Summary

Adds stopOnRunFinished CRD field and improves inactivity detection. Builds on #1180.

Changes

stopOnRunFinished (new CRD field)

  • When true, the backend auto-stops the session on RUN_FINISHED event
  • Uses in-memory cache (sync.Map) to avoid k8s API call on every RUN_FINISHED for sessions without the flag
  • Wired through: CRD → backend types → session create handler → AG-UI proxy

Activity tracking improvements

  • All AG-UI events now reset the inactivity timer (was only 4 event types — RUN_STARTED, TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TOOL_CALL_START)
  • Prevents false inactivity detection during long tool calls that don't emit those specific events
  • Activity debounce reduced from 60s to 10s to support shorter inactivity timeouts

Amber GHA updates

  • Use stop-on-run-finished: 'true' with timeout: '0' (no inactivity timeout)
  • Fix/custom prompt split: @amber alone → fix prompt, @amber <text> → custom
  • Shell injection fix: comment body via env var
  • Session summary: use || fallback instead of string concatenation

Test plan

  • Create session with stopOnRunFinished: true — verify it stops on RUN_FINISHED
  • Create session without the flag — verify no change in behavior
  • Verify activity tracking keeps sessions alive during long tool calls
  • Test @amber comment on a PR — verify fix prompt runs
  • Test @amber do something on an issue — verify custom prompt runs

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Sessions can now be configured to automatically stop when agents complete their run.
  • Improvements

    • Enhanced activity tracking responsiveness with faster status updates.
    • Updated session lifecycle management behavior for improved resource handling.

New feature: spec.stopOnRunFinished — when true, the backend
auto-stops the session on RUN_FINISHED event. Enables one-shot
automation sessions that stop cleanly without inactivity timeout.

Changes:
- CRD: add stopOnRunFinished boolean to spec
- Backend types: add to AgenticSessionSpec and CreateAgenticSessionRequest
- Backend session handler: pass through to CR on create
- AG-UI proxy: detect RUN_FINISHED + check spec → trigger stop
  with in-memory cache to avoid repeated k8s API calls
- AG-UI proxy: all events now reset inactivity timer (was only 4 types)
- AG-UI proxy: reduce activity debounce from 60s to 10s
- Amber GHA: use stop-on-run-finished, fix/custom prompt split,
  shell-driven batch, session reuse, security fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

Changes introduce auto-stop functionality for agentic sessions. When a session is created with stopOnRunFinished: true, the websocket proxy detects RUN_FINISHED events and automatically stops the session by updating its annotations. Covers workflow updates, Go type definitions, handler logic, event processing, and CRD schema.

Changes

Cohort / File(s) Summary
Workflow & CI/CD
.github/workflows/amber-issue-handler.yml
Updated action invocations to set timeout: '0' and stop-on-run-finished: 'true'; modified batch PR fixer script to pass stopOnRunFinished: True in API request instead of inactivityTimeout: 60.
Type Definitions
components/backend/types/session.go
Added StopOnRunFinished field to AgenticSessionSpec (bool) and CreateAgenticSessionRequest (*bool).
Session Handler
components/backend/handlers/sessions.go
Conditionally includes spec.stopOnRunFinished in created AgenticSession CR when request flag is set.
WebSocket Proxy
components/backend/websocket/agui_proxy.go
Reduced activity debounce interval to 10s; expanded activity detection to any typed event; added checkAndStopOnRunFinished function with in-memory cache to detect RUN_FINISHED events and update session phase to Stopped.
CRD Schema
components/manifests/base/crds/agenticsessions-crd.yaml
Added optional spec.stopOnRunFinished boolean field to AgenticSession custom resource definition.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent/Client
    participant Proxy as WebSocket Proxy
    participant Cache as stopOnRunFinished<br/>Cache
    participant API as Kubernetes API
    
    Agent->>Proxy: Emit RUN_FINISHED event
    Proxy->>Proxy: persistStreamedEvent()<br/>detects event.type
    Proxy->>Cache: checkAndStopOnRunFinished(project, session)
    Cache-->>Proxy: Check if cached?
    alt Not cached
        Proxy->>API: Query AgenticSession spec
        API-->>Proxy: Return stopOnRunFinished flag
        Proxy->>Cache: Store result
    else Cached
        Cache-->>Proxy: Return cached value
    end
    alt stopOnRunFinished = true
        Proxy->>API: Update session annotations<br/>(phase: Stopped,<br/>stop-reason: run-finished)
        API-->>Proxy: Acknowledged
    end
Loading

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Performance And Algorithmic Complexity ❌ Error stopOnRunFinishedCache grows unbounded with no eviction—entries never deleted on session termination, causing indefinite memory accumulation. Cache keyed on bare sessionName instead of projectName/sessionName violates namespace-scoping, causing collisions. Add cache deletion to stale session cleanup loop; change cache key to projectName/sessionName; wrap Update() with RetryOnConflict for concurrent resource versions.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Kubernetes Resource Safety ⚠️ Warning PR introduces Kubernetes namespace isolation violation via bare sessionName cache key and silent Update() failures without conflict retry. Use projectName/sessionName as cache key and wrap Update() with RetryOnConflict error handling per existing codebase patterns.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (feat: scope), clearly summarizes the main changes (stopOnRunFinished feature + activity tracking improvements).
Security And Secret Handling ✅ Passed Shell injection properly mitigated via environment variables; auth/authz mechanisms intact; no hardcoded secrets; Python script uses secure request handling; CRD changes schema-only with no security implications.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/amber-handler-v2
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/amber-handler-v2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/amber-issue-handler.yml (1)

68-99: ⚠️ Potential issue | 🟠 Major

Pin ambient-action to commit SHA

Lines 68–99 and 218–277 use ambient-code/ambient-action@v0.0.4 (mutable tag). Per coding guidelines, GitHub Actions must be pinned to commit SHAs to prevent tag retargeting and supply-chain drift. Replace with a specific commit SHA.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/amber-issue-handler.yml around lines 68 - 99, The workflow
currently uses a mutable tag for the action (uses:
ambient-code/ambient-action@v0.0.4); replace that tag with the corresponding
immutable commit SHA (e.g., ambient-code/ambient-action@<COMMIT_SHA>) in every
occurrence (the `uses:` lines in the Create session step and the other block
around lines 218–277) so the action is pinned; update both `uses:
ambient-code/ambient-action@v0.0.4` instances to the specific commit SHA and
verify the SHA is the exact commit from the action repo before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/backend/handlers/sessions.go`:
- Around line 744-746: parseSpec() is not reading spec["stopOnRunFinished"], so
a create-time write via req.StopOnRunFinished only persists in the CR but never
surfaces through GetSession/ListSessions; update parseSpec(spec) to detect the
"stopOnRunFinished" key (handle both bool and pointer semantics as used
elsewhere), set the corresponding field in the returned session spec struct
(matching the type used by GetSession/ListSessions), and ensure parseSpec
returns the true value when the CR stores true so the API round-trips correctly.

In `@components/backend/websocket/agui_proxy.go`:
- Around line 60-63: The cache stopOnRunFinishedCache is keyed by sessionName
only which collides across namespaces; change it to use a composite key that
includes project/namespace plus session name (e.g. fmt.Sprintf("%s/%s",
session.GetNamespace() or session.Project, sessionName) or a small struct key)
wherever the map is set or read (the lazy population on first RUN_FINISHED, the
early-return check, and subsequent lookups around the RUN_FINISHED handling).
Update the comment to reflect "Key: project/sessionName" and ensure all
references that read or write stopOnRunFinishedCache (including the code paths
currently using sessionName alone) are updated to compute and use the composite
key.
- Around line 542-553: The stop-on-RUN_FINISHED flow in
checkAndStopOnRunFinished has two issues: it must retry on resource version
conflicts and must key the cache by namespace; change stopOnRunFinishedCache to
use projectName + "/" + sessionName (same form as updateLastActivityTime) and
wrap the fetch-modify-update sequence inside a RetryOnConflict (or equivalent
fetch-and-patch loop) in checkAndStopOnRunFinished so concurrent
UpdateStatus/Update races are retried instead of failing silently.

---

Outside diff comments:
In @.github/workflows/amber-issue-handler.yml:
- Around line 68-99: The workflow currently uses a mutable tag for the action
(uses: ambient-code/ambient-action@v0.0.4); replace that tag with the
corresponding immutable commit SHA (e.g.,
ambient-code/ambient-action@<COMMIT_SHA>) in every occurrence (the `uses:` lines
in the Create session step and the other block around lines 218–277) so the
action is pinned; update both `uses: ambient-code/ambient-action@v0.0.4`
instances to the specific commit SHA and verify the SHA is the exact commit from
the action repo before committing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 534cb425-4b0a-4862-ac9e-26ae6973dbf1

📥 Commits

Reviewing files that changed from the base of the PR and between f56fb3c and dcf0071.

📒 Files selected for processing (5)
  • .github/workflows/amber-issue-handler.yml
  • components/backend/handlers/sessions.go
  • components/backend/types/session.go
  • components/backend/websocket/agui_proxy.go
  • components/manifests/base/crds/agenticsessions-crd.yaml

Comment on lines +744 to +746
if req.StopOnRunFinished != nil && *req.StopOnRunFinished {
spec["stopOnRunFinished"] = true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

stopOnRunFinished is write-only right now

Line 744 adds the create-time write, but parseSpec() still never reads spec.stopOnRunFinished. GetSession, ListSessions, and the other handlers that return parseSpec(spec) will still surface this field as false/omitted even when the CR stores true, so the new API does not round-trip.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/sessions.go` around lines 744 - 746, parseSpec()
is not reading spec["stopOnRunFinished"], so a create-time write via
req.StopOnRunFinished only persists in the CR but never surfaces through
GetSession/ListSessions; update parseSpec(spec) to detect the
"stopOnRunFinished" key (handle both bool and pointer semantics as used
elsewhere), set the corresponding field in the returned session spec struct
(matching the type used by GetSession/ListSessions), and ensure parseSpec
returns the true value when the CR stores true so the API round-trips correctly.

Comment on lines +60 to +63
// stopOnRunFinishedCache tracks which sessions have stopOnRunFinished set.
// Populated lazily on first RUN_FINISHED event, avoids repeated k8s API calls.
// Key: sessionName, Value: bool
var stopOnRunFinishedCache sync.Map
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cache this by project/session, not sessionName alone

AgenticSessions are namespaced. With the current keying, a cached false for one project can make Line 1010 return early for a same-named session in another project, and that RUN_FINISHED will never even read its CR.

Also applies to: 1008-1013, 1025-1026

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/websocket/agui_proxy.go` around lines 60 - 63, The cache
stopOnRunFinishedCache is keyed by sessionName only which collides across
namespaces; change it to use a composite key that includes project/namespace
plus session name (e.g. fmt.Sprintf("%s/%s", session.GetNamespace() or
session.Project, sessionName) or a small struct key) wherever the map is set or
read (the lazy population on first RUN_FINISHED, the early-return check, and
subsequent lookups around the RUN_FINISHED handling). Update the comment to
reflect "Key: project/sessionName" and ensure all references that read or write
stopOnRunFinishedCache (including the code paths currently using sessionName
alone) are updated to compute and use the composite key.

Comment on lines +542 to +553
// Update lastActivityTime on CR for any event (debounced).
if eventType != "" {
if projectName, ok := sessionProjectMap.Load(sessionID); ok {
updateLastActivityTime(projectName.(string), sessionID, eventType == types.EventTypeRunStarted)
}
}

// Stop session on RUN_FINISHED if stopOnRunFinished is set.
if eventType == types.EventTypeRunFinished {
if projectName, ok := sessionProjectMap.Load(sessionID); ok {
go checkAndStopOnRunFinished(projectName.(string), sessionID)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '538,556p' components/backend/websocket/agui_proxy.go
sed -n '1000,1105p' components/backend/websocket/agui_proxy.go
rg -n 'RetryOnConflict|Patch\(' components/backend/websocket/agui_proxy.go || true

Repository: ambient-code/platform

Length of output: 4576


checkAndStopOnRunFinished needs conflict retry and namespaced cache key

The stop-on-RUN_FINISHED path has two functional bugs:

  1. Silent failure on resource conflict: checkAndStopOnRunFinished() fetches the object, modifies it, then calls Update() with no RetryOnConflict wrapper. Concurrent writes (e.g., from updateLastActivityTime() calling UpdateStatus()) can advance the resource version, causing Update() to fail silently with a logged error, leaving the session running.

  2. Cache key collision across namespaces: stopOnRunFinishedCache uses bare sessionName as the key, but AgenticSession resources are namespace-scoped by projectName. Sessions with identical names in different projects will collide in cache, causing incorrect stop decisions. For comparison, updateLastActivityTime correctly uses projectName + "/" + sessionName as its cache key.

Use RetryOnConflict (or fetch-and-patch) for the stop operation and key the cache as projectName + "/" + sessionName.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/websocket/agui_proxy.go` around lines 542 - 553, The
stop-on-RUN_FINISHED flow in checkAndStopOnRunFinished has two issues: it must
retry on resource version conflicts and must key the cache by namespace; change
stopOnRunFinishedCache to use projectName + "/" + sessionName (same form as
updateLastActivityTime) and wrap the fetch-modify-update sequence inside a
RetryOnConflict (or equivalent fetch-and-patch loop) in
checkAndStopOnRunFinished so concurrent UpdateStatus/Update races are retried
instead of failing silently.

@Gkrumbach07
Copy link
Copy Markdown
Contributor Author

Superseded by #1192 (backend) and #1193 (GHA).

@Gkrumbach07 Gkrumbach07 closed this Apr 3, 2026
@Gkrumbach07 Gkrumbach07 deleted the feat/amber-handler-v2 branch April 3, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant