Skip to content

feat(computer-use): coordinate-contract foundation (slice 1)#695

Merged
Yeachan-Heo merged 25 commits into
devfrom
feat/computer-use
Jun 16, 2026
Merged

feat(computer-use): coordinate-contract foundation (slice 1)#695
Yeachan-Heo merged 25 commits into
devfrom
feat/computer-use

Conversation

@Yeachan-Heo

Copy link
Copy Markdown
Owner

Native computer-use tool — Slice 1 (foundation)

Draft PR. First slice of a new model-agnostic computer tool that drives the
user's real macOS desktop via the OpenAI computer-use action set. Scoped through
GJC's deep-interview (requirements) → ralplan (Planner/Architect/Critic
consensus) workflows.

Built fresh: the open-source openai/codex repo has no GUI computer-use
source to copy (it is a proprietary Codex app/cloud feature); only the public
action schema is mirrored.

What's in this PR

  • crates/pi-natives/src/computer/coords.rs — the pure, framework-free
    coordinate contract: NormalizedDisplay maps a screenshot-space pixel to a
    macOS logical point via per-axis scale + display origin (Retina/HiDPI-safe),
    rejecting out-of-bounds and non-finite inputs. 12 unit tests (scale 1.0/2.0,
    fractional + anisotropic scale, non-zero origins, edges, out-of-bounds,
    invalid scale). No display or granted permissions required.
  • crates/pi-natives/src/computer/mod.rs + pub mod computer; in lib.rs.
  • docs/computer-use/README.md — locked decisions (ADR summary), coordinate
    contract, and the delivery roadmap.

Verification

  • cargo test -p pi-natives computer:: → 12 passed.
  • cargo clippy -p pi-natives --all-targets → no warnings from the new module.
  • Canonical Rust formatting applied (fmt:rs).

Locked decisions (full ADR in docs/computer-use/README.md)

macOS-only v1 · exact OpenAI 9-action schema (screenshot/click/double_click/
move/drag/scroll/type/keypress/wait) · any model via a generic
tool-call interface · off by default + opt-in + persistent always-on · single
normalized primary display (Rust owns transforms) · autonomous but a
daemon-enforced global kill-switch outside model control · central
execute_action state machine · SupervisorClient boundary so an out-of-process
supervisor can be swapped in later.

Why draft — remaining work (follow-ups)

The native backend, kill-switch supervisor + event-tap lifecycle, napi/TS tool
surface, and the manual macOS end-to-end acceptance require real macOS
hardware, granted TCC permissions (Accessibility + Screen Recording), and a
human-operated drill
(TextEdit all-nine + kill-switch), so they are tracked as
follow-up slices rather than landed here.

Slice Status
Coordinate contract + planning docs ✅ this PR
Native capture + input backend (capture/input/permissions/execute_action) ⬜ planned
Kill-switch supervisor + event-tap lifecycle ⬜ planned
napi bindings + TS computer tool surface (schema/gating/prompt/renderer) ⬜ planned
Manual macOS E2E acceptance ⬜ planned (macOS hardware + human operator)

Yeachan-Heo and others added 12 commits June 15, 2026 19:19
First slice of the native computer-use tool (macOS-only v1), scoped via
the deep-interview spec + ralplan consensus plan. Lands the pure,
framework-free coordinate contract: NormalizedDisplay maps normalized
screenshot pixels to macOS logical points (Retina/HiDPI-safe) with
out-of-bounds and invalid-scale rejection, plus full unit tests. Adds
docs/computer-use/ capturing locked decisions, the coordinate contract,
and the delivery roadmap.

Native capture/input backend, kill-switch supervisor, napi/TS tool
surface, and the manual macOS end-to-end acceptance are tracked
follow-ups (require macOS hardware, granted TCC, and a human-operated
drill).
Adds crates/pi-natives/src/computer/capture.rs (macOS-gated): read-only
primary-display capture via raw CoreGraphics FFI into a PNG plus the
NormalizedDisplay descriptor (scale derived from captured physical pixels
vs logical bounds). A missing Screen Recording grant surfaces
CaptureError::CaptureFailed rather than a silent black frame.

Verified live with Screen Recording granted: a real, non-uniform
primary-display capture decodes as a PNG with matching dimensions
(cargo test -p pi-natives --ignored captures_non_uniform_primary_display).
The GUI capture test is #[ignore] so CI stays deterministic.
Adds crates/pi-natives/src/computer/permissions.rs (macOS): non-prompting
preflight for Accessibility (input injection) and Screen Recording
(capture), Settings-pane openers, and require_*_for_input/capture guards
returning COMPUTER_PERMISSION_REQUIRED so callers fail closed.

This is the fail-closed gate prerequisite: input must not fire unless
Accessibility preflight passes. Live probe in this environment reports
accessibility=false (and screen_recording=false for the executing
binary, which still captured via host-process inheritance) — confirming
input injection / live E2E require Accessibility granted to the actual
executing gjc process before they can proceed.
Adds a #[napi] computerScreenshot binding over the landed CoreGraphics
capture, regenerates packages/natives bindings, and adds a TS test that
verifies a decodable PNG with matching dimensions. Verified live (bun
--cwd=packages/natives test computer: 1 pass, 13 expect()).

This wires the read-only screenshot primitive through the napi -> TS
bridge. Input primitives + kill-switch remain gated: live verification
requires Accessibility granted to the actual gjc process (a cargo/test
binary is never TCC-trusted for input injection).
Adds crates/pi-natives/src/computer/input.rs: InputController orchestrates
click/double_click/move/drag/scroll/type/keypress over an EventSink trait,
tracking held buttons so release_all cleans up after abort/error. Coords
flow through coords.rs; out-of-bounds is rejected and the drag error path
releases the held button. 9 unit tests drive a RecordingSink to verify
exact event sequences (no real OS events).

The real CGEvent-backed MacEventSink is constructed only via
guarded_controller(), which require_accessibility_for_input() gates — so
input physically cannot fire while Accessibility is ungranted. Not yet
napi/model-exposed (per the plan, input ships only after the kill-switch
is proven). Live OS-event behavior is verified in a granted gjc session;
the orchestration logic is fully unit-tested here.
Adds current_cursor_position() (CGEventGetLocation) and a guarded live
test that moves the cursor to the display center and reads it back.
First run revealed a bare kCGEventMouseMoved event does not relocate the
cursor; fixed move_cursor to use CGWarpMouseCursorPosition (then post the
moved event for hover). Live test now passes within 2 logical points with
Accessibility granted to the host app (cmux).

This proves the CGEvent input pipeline end-to-end on the real desktop:
coords.rs transform -> warp -> read-back. Input is still gated
(guarded_controller requires Accessibility) and not yet model-exposed;
click/type await the kill-switch per the plan.
Adds crates/pi-natives/src/computer/supervisor.rs: process-global
Supervisor with fail-closed input_allowed (requires hotkey_live + fresh
heartbeat + not suspended), trigger_stop latch, and user-only reset.
5 unit tests cover the gating truth table (not-live, live+fresh, stale
heartbeat, stop-latch-until-reset, lost liveness). Pure atomics so the
safety logic is deterministic; the OS hotkey listener drives it next.
Adds crates/pi-natives/src/computer/hotkey.rs: a listen-only CGEventTap on
a dedicated CFRunLoop thread that latches Supervisor::trigger_stop on the
configured global hotkey (Ctrl+Opt+Cmd+Escape), marking the supervisor
stop path live on tap creation (fails closed otherwise). Independent of
the model tool path.

Verified live: synthetic_hotkey_triggers_stop posts a synthetic hotkey
and observes the supervisor latch suspended end-to-end (Accessibility
sufficed for tap creation; no separate Input Monitoring needed). clippy
clean, fmt applied; aligned CFRunLoopGetCurrent signature with
appearance.rs to avoid the clashing-extern warning.
The `computer_screenshot` napi function lived in the macOS-gated
`computer::capture` module, so napi-rs omitted it from the generated
`index.{js,d.ts}` on non-macOS targets. CI's Linux native build
regenerated the bindings without `computerScreenshot`, breaking the
`@gajae-code/natives` type check on `test/computer.test.ts`
(TS2305: no exported member 'computerScreenshot').

Move the `ComputerScreenshot` struct and `computerScreenshot` binding
into `computer::mod` (compiled on all platforms) and gate only the macOS
CoreGraphics capture call internally, matching the `detectMacOSAppearance`
pattern. Non-macOS callers receive a clear unsupported-platform error, and
the generated TypeScript surface is now identical across platforms.
@Yeachan-Heo

Copy link
Copy Markdown
Owner Author

CI fix: Affected path validation

Root cause. The Affected path validation check failed in the Check @gajae-code/natives step:

test/computer.test.ts(2,10): error TS2305: Module '"../native/index.js"' has no exported member 'computerScreenshot'.

The computer_screenshot napi binding lived inside the macOS-gated computer::capture module (#[cfg(target_os = "macos")] pub mod capture;). On the Linux CI runner, the native build (ci-build-native.ts) regenerates packages/natives/native/index.{js,d.ts} via napi-rs, and because the binding is not compiled on Linux it was dropped from the generated typings (−34 lines in index.d.ts, computerScreenshot count → 0). The subsequent tsgo --noEmit then failed because test/computer.test.ts statically imports computerScreenshot.

Reproduced locally (Linux): bun --cwd=packages/natives run build removed computerScreenshot from the generated bindings, and bun --cwd=packages/natives run check:types reproduced the exact TS2305.

Fix. Moved the ComputerScreenshot struct and computerScreenshot binding out of the macOS-only capture module into computer::mod (compiled on all platforms), gating only the CoreGraphics capture call internally — the same pattern already used by detectMacOSAppearance. Non-macOS callers now get a clear unsupported-platform error, and the generated TypeScript surface is identical across platforms. The macOS CoreGraphics FFI and capture_primary_display stay macOS-gated.

Validation (all green locally on Linux):

  • bun --cwd=packages/natives run check (biome + tsgo --noEmit) — pass
  • bun --cwd=packages/natives test computer.test.ts — imports cleanly, skips on non-macOS
  • cargo fmt --all -- --check — pass
  • cargo clippy -p pi-natives -- -D warnings — pass
  • Native rebuild regenerates index.{js,d.ts} with computerScreenshot present and consistent on Linux

Commit: f6252cb


[repo owner's gaebal-gajae (clawdbot) 🦞]

…001)

Adds crates/pi-natives/src/computer/executor.rs: the single side-effect
authority. execute_input runs a fail-closed gate (supervisor live+fresh+
not-suspended, Accessibility granted, matching display epoch for
coordinate actions) before dispatching to InputController, and runs
release_all on any error or mid-flight suspension. Stable error codes
(COMPUTER_SUSPENDED / _SUPERVISOR_NOT_LIVE / _PERMISSION_REQUIRED /
_DISPLAY_STALE / _COORD_INVALID / _CANCELLED). DisplayContext trait
defines the display-epoch staleness contract; PermissionGate is injectable.

9 unit tests with a real Supervisor + fake perms/display + recording sink
cover every gate-rejection path, matching-epoch success, out-of-bounds
release-all, type/keypress/wait, and stable codes. 37 computer tests pass;
clippy clean. Added InputController::into_sink accessor.
Yeachan-Heo and others added 2 commits June 16, 2026 01:18
…am gate (G004)

G002 (pi-natives): capture.rs gains display_epoch (geometry hash) +
capture_id + lightweight current_display_epoch() (no Screen Recording);
executor.rs gains MacPermissionGate/MacDisplayContext providers; new
controller.rs exposes a #[napi] ComputerController whose 9 methods are
thin adapters that all route through executor::execute_input (the single
side-effect authority); bypass_guard.rs statically asserts InputController
side-effect methods are only called from input.rs/executor.rs. Bindings
regenerated. cargo test computer:: = 38 pass (incl bypass guard), clippy
clean.

G004 (ultragoal gate): ultragoal-runtime.ts gains a trusted changeSet
data-flow (checkpoint+review), computer-touching detection from trusted
changed paths (declarations additive-only), a mandatory computer
adversarial case-set (7 IDs, no not_applicable) requiring live/structural
native proof (inline/metadata/receipt-only fail with COMPUTER_REDTEAM_*),
computer/native surface tokens, docs-only tiering, and byte-for-byte
non-computer compatibility. New fixture matrix: 7 pass; ultragoal-runtime
102 pass + review 8 pass (non-regression).
@Yeachan-Heo

Copy link
Copy Markdown
Owner Author

Pushed a narrow follow-up for the latest check:rs failure:

  • Commit: 0e562c02 fix(pi-natives): collapse display epoch guard
  • Scope: collapsed the stale-display epoch guard in crates/pi-natives/src/computer/executor.rs exactly for the clippy::collapsible_if failure, with no behavior change intended.
  • Local verification after rebasing on the newer feat/computer-use head:
    • cargo fmt --check
    • cargo clippy -p pi-natives -- -D warnings

The PR remains draft; I did not mark it ready or merge it.


[repo owner's gaebal-gajae (clawdbot) 🦞]

Yeachan-Heo and others added 2 commits June 16, 2026 01:32
…log (G003)

Adds packages/coding-agent/src/tools/computer.ts: an AgentTool with the
exact OpenAI 9-action snake_case zod schema routing through the
@gajae-code/natives ComputerController, AbortSignal/timeout propagation,
and COMPUTER_* error mapping. Off-by-default + fail-closed: callable only
on macOS when computer.enabled||computer.alwaysOn; disabled returns
COMPUTER_DISABLED without constructing the controller or starting native
resources. First-class WITHOUT unsafe default exposure via a separate
BUILTIN_CAPABILITY_CATALOG (metadata-only) distinct from the callable
BUILTIN_TOOLS, so a disabled computer is documented/listable but not in
the session registry and not auto-activatable by search_tool_bm25. Adds
prompt, bounded renderer, settings-schema entries, and docs/tools/computer.md.
9 tool tests pass (exact schema, camelCase rejection, gating,
disabled->COMPUTER_DISABLED, dispatch mapping); biome clean.
@Yeachan-Heo

Copy link
Copy Markdown
Owner Author

Pushed a narrow formatting follow-up for the latest Affected path validation failure:

  • Commit: style(computer-use): format ultragoal fixtures
  • Scope: Biome formatting in packages/coding-agent/src/gjc-runtime/ultragoal-runtime.ts and packages/coding-agent/test/gjc-runtime/computer-red-team-fixtures.test.ts
  • Local verification: biome check on both files and git diff --check

The PR remains draft; no ready/merge action taken.


[repo owner's gaebal-gajae (clawdbot) 🦞]

- ComputerToolDetails: add optional meta field so it satisfies the
  ToolResultBuilder DetailsWithMeta weak-type constraint; toolResult()
  now infers ComputerToolDetails instead of falling back to DetailsWithMeta
- render.ts: type summarizeComputerDetails parts as string[] so dynamic
  summary strings are not constrained to the action literal union
- computer-red-team-fixtures.test.ts: drop removed gjcObjective input
  option from createUltragoalPlan; guard possibly-undefined stderr/stdout
@Yeachan-Heo

Copy link
Copy Markdown
Owner Author

CI fix: Affected path validation (@gajae-code/coding-agent check — TypeScript errors)

Pushed a narrow TypeScript-only repair for the latest check failure.

  • Commit: fd5dce09 fix(computer-use): repair TypeScript check errors (slice 1)
  • Head: feat/computer-use @ fd5dce09 (fast-forward from be82f07e)

Root causes & fixes (3 files, +4/-3):

  1. src/tools/computer.ts (TS2322 / TS2559 @ 206/220/227) — ToolResultBuilder's constraint DetailsWithMeta = { meta?: OutputMeta } is a weak type (all-optional). ComputerToolDetails had no meta, so it failed the weak-type check; generic inference fell back to the constraint and toolResult(details).done() returned AgentToolResult<DetailsWithMeta> instead of AgentToolResult<ComputerToolDetails>. Fix: add meta?: OutputMeta to ComputerToolDetails (matching every other tool details interface — bash.ts, ssh.ts, etc.).

  2. src/tools/computer/render.ts (TS2345 @ 33-44) — const parts = [details.action] inferred parts: ComputerActionName[] (the action literal union), so pushing dynamic summary strings (@ x,y, scroll …, screenshot WxH, …) was rejected. Fix: annotate const parts: string[] (mirrors the sibling summarizeArgs).

  3. test/gjc-runtime/computer-red-team-fixtures.test.ts (TS2353 @ 94, TS18048 @ 230) — createUltragoalPlan options no longer accept gjcObjective; removed that input. created.gjcObjective (still a field on the returned UltragoalPlan) is unchanged. result.stderr/result.stdout are optional on UltragoalCommandResult; guarded with ?? "".

Local verification:

  • bun --cwd packages/coding-agent run checkpass (biome check . + tsgo -p tsconfig.json --noEmit, exit 0). Only the two pre-existing ultragoal-runtime.ts Biome warnings remain (non-failing; left untouched to avoid unrelated cleanup).
  • bun test test/gjc-runtime/computer-red-team-fixtures.test.ts7 pass / 0 fail.
  • bun test test/tools/computer.test.ts → 6 pass / 3 fail; the 3 failures are macOS-only (isComputerSupportedPlatform === "darwin"; the suite even asserts Linux returns false) and reproduce identically on the clean be82f07e base, i.e. environmental (Linux runner) and unrelated to this change. The renderer test that exercises the render.ts change passes.

Build note: bun --cwd=packages/natives run build was run locally only to load the native addon for the focused test; the generated packages/natives/native/index.* and docs-index.generated.ts install/build artifacts were reverted and are not part of this commit.

The PR remains draft; no ready/merge action taken.


[repo owner's gaebal-gajae (clawdbot) 🦞]

@Yeachan-Heo

Copy link
Copy Markdown
Owner Author

Heads-up: the test/bridge/bridge-endpoints-parse.test.ts Biome format error that failed Affected-path validation on this PR's merge ref originates from dev/base, not from this branch. Fixed narrowly in #726 (single-quote the outer it(...) description; formatting-only, no logic). Once #726 lands on dev, re-running CI here (or rebasing on updated dev) should clear that blocker.


[repo owner's gaebal-gajae (clawdbot) 🦞]

Yeachan-Heo and others added 3 commits June 16, 2026 10:03
Adds the COMPUTER_USE_MACOS_TEXTEDIT_ALL_NINE manual-acceptance drill as an
#[ignore] live test (crates/pi-natives/src/computer/input.rs live_tests):
drives all nine primitives (screenshot/move/click/type/keypress/
double_click/drag/scroll/wait) through the production gated path
(execute_input + Supervisor + Mac providers) against the focused app, then
waits for a human Control+Option+Command+Escape press and asserts the
kill-switch latches and blocks further input until reset. Ignored by
default (needs macOS + grants + a human keypress); run with:
cargo test -p pi-natives computer::input::live_tests::all_nine_acceptance_drill -- --ignored --nocapture

Note: the drill refreshes the supervisor heartbeat per action as a
stand-in for a periodic listener heartbeat (follow-up: tick heartbeat from
the hotkey listener thread).
@Yeachan-Heo Yeachan-Heo marked this pull request as ready for review June 16, 2026 05:52

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 397d33f67d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +247 to +255
return controller.click?.(
{
x: params.x,
y: params.y,
button: params.button ?? "left",
timeoutMs: secondsToMs(params.timeout),
includeScreenshot: params.include_screenshot,
},
options,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Call native computer methods with positional arguments

The real ComputerController generated API in packages/natives/native/index.d.ts exposes positional methods such as click(expectedEpoch, x, y, button), drag(expectedEpoch, x, y, toX, toY, button), etc., but this dispatcher sends a single payload object plus an options object. In a macOS session using the native controller, the object is passed where napi expects the first numeric/null expectedEpoch, so side-effecting actions fail before any input is posted; the current tests only exercise the mock object-shaped API.

Useful? React with 👍 / 👎.

Comment on lines +148 to +150
CGEventTapEnable(tap, true);
Supervisor::global().set_hotkey_live(true);
CFRunLoopRun();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep the kill-switch heartbeat fresh while the tap runs

The supervisor treats the stop path as stale after HEARTBEAT_FRESH_MS (2s), but the hotkey listener only calls set_hotkey_live(true) once before entering CFRunLoopRun() and never calls Supervisor::heartbeat() afterward. Any computer input attempted more than two seconds after the listener starts will therefore fail the gate as SupervisorNotLive even though the event tap is still running; the listener needs a periodic heartbeat while the run loop is alive.

Useful? React with 👍 / 👎.

Comment on lines +372 to +375
const maybe = error as { code?: unknown; message?: unknown };
const rawCode = typeof maybe?.code === "string" ? maybe.code : undefined;
const code =
rawCode && (NATIVE_ERROR_CODES.has(rawCode) || rawCode.startsWith("COMPUTER_")) ? rawCode : "COMPUTER_ERROR";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve native COMPUTER_ codes when mapping errors*

Native controller errors are constructed as GenericFailure with the stable code embedded in the message (for example COMPUTER_PERMISSION_REQUIRED: ...), not as a JS error.code value, so this mapper falls back to COMPUTER_ERROR for those failures. That breaks callers/UI that rely on the documented stable COMPUTER_* codes for permission, stale-display, supervisor, and suspension cases; parse the message prefix or have the native layer set a matching JS code.

Useful? React with 👍 / 👎.

… kill-switch window

The all-nine acceptance drill now writes g005-before.png, g005-after-killswitch.png, and g005-manifest.json to .gjc/ultragoal/artifacts/g005 (override COMPUTER_USE_ACCEPTANCE_DIR), so the human-run drill produces the durable live native proof the G004 mandatory computer red-team suite requires on disk. Widen the kill-switch wait from ~10s to ~60s for manual operation and drop an unnecessary mut on the act closure.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30aee963e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +113 to +118
let first = STARTED.set(true).is_ok();
if first {
thread::Builder::new()
.name("computer-killswitch".into())
.spawn(run_listener)
.ok();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Retry hotkey startup after a failed first attempt

When the first start() happens before the event tap can be created (for example, before the user grants Accessibility/Input Monitoring), STARTED is set permanently and later calls only wait for hotkey_live instead of spawning run_listener again. That makes the documented grant-then-retry flow fail closed until the whole process is restarted, because the supervisor can never become live after permissions are fixed.

Useful? React with 👍 / 👎.

scale_y: frame.display.scale_y,
origin_x: frame.display.origin_x,
origin_y: frame.display.origin_y,
display_epoch: frame.display_epoch as f64,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep display epoch out of lossy JS numbers

When a caller uses screenshot().displayEpoch as the expectedEpoch for a coordinate action, this u64 hash is converted to a JS number and loses low bits for almost all hash values; epoch_from_f64 then casts the rounded value back and compares it to the full current_epoch(), so a fresh screenshot can be rejected as COMPUTER_DISPLAY_STALE. Return the epoch as a string/BigInt-compatible value or constrain it to a 53-bit-safe integer before exposing it to JS.

Useful? React with 👍 / 👎.

if (typeof input.scroll_x === "number" || typeof input.scroll_y === "number") {
parts.push(`scroll ${input.scroll_x ?? 0},${input.scroll_y ?? 0}`);
}
if (Array.isArray(input.keys)) parts.push(`keys ${input.keys.join("+")}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sanitize key summaries before rendering

The computer schema accepts arbitrary key strings, and this renderer writes input.keys.join("+") directly into the TUI without replaceTabs() or truncateToWidth(), despite the repo renderer contract requiring all displayed tool text to be sanitized. A malformed or very long key name in a keypress call can therefore introduce tabs/ANSI/control text or unbounded output into the TUI; sanitize and bound the joined summary before appending it.

Useful? React with 👍 / 👎.

Add isComputerLoadablePlatform (true everywhere except win32) and gate both BUILTIN_CAPABILITY_CATALOG and the BUILTIN_TOOLS computer factory on it: macOS stays callable, Linux stays listable (support planned via #712), Windows is fully absent (not registered, not advertised). Also fix a pre-existing initial-tools metadata test by constructing ComputerTool directly for loadMode coverage, mirroring how other createIf-gated tools (Ask/Ssh/Job/Recipe/Irc) are handled.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4026bad30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (screenshot) details.screenshot = screenshot;
details.status = "success";
details.message = describeComputerSuccess(details);
return toolResult(details).text(details.message).done();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return screenshots as image content

When a screenshot action succeeds, the tool returns only the text summary as tool content; the PNG is reduced to metadata in details, which provider adapters do not send as image input. In an enabled macOS computer session the model therefore sees only Computer screenshot completed (WxH) and never receives the pixels it needs to choose coordinates, making the core screenshot→act loop unusable. Return a bounded image/png content block alongside the text and enforce computer.screenshotMaxBytes.

Useful? React with 👍 / 👎.

Comment on lines +31 to +32
let frame =
capture_primary_display().map_err(|err| napi::Error::from_reason(format!("{err}")))?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preflight Screen Recording before capture

When Screen Recording is missing, this path calls capture_primary_display() directly and wraps its CaptureFailed as a plain N-API reason, so the TS tool surfaces a generic COMPUTER_ERROR and never opens the Screen Recording settings pane. In the documented grant-then-retry flow for first-time macOS users, screenshot should fail with COMPUTER_PERMISSION_REQUIRED; call the screen-recording preflight or map this capture failure to the stable permission code before returning.

Useful? React with 👍 / 👎.

Comment on lines +263 to +267
pub fn keypress(&mut self, keys: &[String]) -> Result<(), InputError> {
for name in keys {
let code = key_code_for(name).ok_or_else(|| InputError::UnknownKey(name.clone()))?;
self.sink.key(code, true);
self.sink.key(code, false);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Send modifier chords instead of sequential taps

For shortcut inputs such as the schema/tested keys: ["Meta", "K"], this loop resolves and presses each key independently, and key_code_for does not define modifier names like Meta/Command/Shift/Control/Option. On macOS those common shortcut calls either fail with UnknownKey before doing anything useful or, after adding codes, would still send separate taps rather than holding the modifier while pressing the key; modifiers need to be mapped and held through the chord.

Useful? React with 👍 / 👎.

Comment on lines +90 to +91
expect(enabledNames).toContain("computer");
expect(alwaysOnNames).toContain("computer");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard macOS-only tool exposure assertion

On non-Darwin CI, createTools() uses the real process.platform through ComputerTool.createIf, so even with computer.enabled or computer.alwaysOn set it returns no computer tool. These assertions therefore fail on Linux runners; gate this test to macOS or inject a platform-specific callable check instead of expecting the tool to be present everywhere.

Useful? React with 👍 / 👎.

@Yeachan-Heo

Copy link
Copy Markdown
Owner Author

Pushed a narrow follow-up for the latest #695 Affected path validation failure.

Root cause: the computer tool tests intended to exercise macOS-enabled behavior, but the Linux CI runner still drove createTools() / ComputerTool.execute() through real process.platform, so enabled-path tests saw COMPUTER_DISABLED. The public factory-map assertion also treated the deliberately-null disabled computer factory as missing metadata.

Fix:

  • Added a test-only computer platform override hook; runtime default remains process.platform.
  • Updated enabled/dispatch computer tests to explicitly simulate darwin.
  • Preserved non-macOS negative coverage.
  • Made the public factory-map test use static capability metadata for computer, which is public but platform-gated/null on Linux.

Validation:

  • TARGET_VARIANTS="baseline modern" bun scripts/ci-build-native.ts for local native deps
  • bun test packages/coding-agent/test/tool-discovery/initial-tools.test.ts packages/coding-agent/test/tools/computer.test.ts
  • (cd packages/coding-agent && bun run check)


[repo owner's gaebal-gajae (clawdbot) 🦞]

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e996ba4da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2948 to +2949
spawnText(["git", "diff", "--name-status"], { cwd, timeoutMs: 5000 }),
spawnText(["git", "diff", "--cached", "--name-status"], { cwd, timeoutMs: 5000 }),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include untracked files in computer gate changesets

This changeset is used to decide whether complete checkpoints must run the mandatory computer red-team cases, but the worktree side only reads git diff --name-status and git diff --cached --name-status. A newly added computer file that is still untracked, such as crates/pi-natives/src/computer/foo.rs before git add, is absent from both outputs, so trustedChangeSetRequiresComputerSuite() sees no computer path and the checkpoint can pass without the mandatory suite. Include untracked files, for example via git ls-files --others --exclude-standard, before validating.

Useful? React with 👍 / 👎.

Comment on lines +18 to +20
const clickSchema = z
.object({ action: z.literal("click"), x: z.number(), y: z.number(), button: buttonSchema.optional(), ...shared })
.strict();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Thread screenshot epochs through coordinate actions

The coordinate schemas are strict and only allow x/y/button fields; there is no displayEpoch/expectedEpoch field for click, double_click, move, drag, or scroll, and the dispatcher consequently has no value to pass to the native expected_epoch. In a session where the user acts on a screenshot after display topology or scale changes, native gate() receives None and skips the stale-display check, so the documented COMPUTER_DISPLAY_STALE protection is unreachable from the model-facing tool.

Useful? React with 👍 / 👎.

Comment on lines +223 to +225
self.cursor = end;
self.sink.move_cursor(end);
self.release(end, button);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit dragged events while the mouse button is held

During a drag, this uses the generic move_cursor path after pressing the button; on macOS that path posts kCGEventMouseMoved and warps the cursor, not a button-specific dragged event. Apps that listen for left/right/other-mouse-dragged events for selection or drag-and-drop will see a down, a plain move, and an up, so the drag primitive can fail even though the pointer moved; add a drag move event to the sink for the held button.

Useful? React with 👍 / 👎.

Comment on lines +338 to +344
fn CGEventCreateScrollWheelEvent(
source: CgEventSourceRef,
units: u32,
wheel_count: u32,
wheel1: i32,
wheel2: i32,
) -> CgEventRef;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Declare the scroll-wheel FFI as variadic

CGEventCreateScrollWheelEvent takes wheel1 followed by variadic wheel deltas, but this declaration makes wheel2 a fixed parameter. On macOS targets where variadic and non-variadic calls use different ABI rules, the horizontal delta for wheelCount = 2 is read from the wrong place, so scroll_x can be ignored or corrupted. Declare the function as C-variadic and pass the second wheel as the vararg, or use wheelCount = 1 when only vertical scrolling is needed.

Useful? React with 👍 / 👎.

@Yeachan-Heo

Copy link
Copy Markdown
Owner Author

Pushed the follow-up for the latest #695 Affected path validation failure.

Root cause: after the Linux native build, the generated @gajae-code/natives/native/index.d.ts can omit macOS-only computer bindings. packages/natives/test/computer.test.ts used describe.if(isMacOS), but the static named import still had to typecheck on Linux, so tsgo failed before runtime skip mattered.

Fix:

  • Changed the native computer binding test to dynamically import the native module only inside macOS-gated tests.
  • Kept the macOS runtime assertions intact while making Linux typecheck independent from macOS-only exports.

Validation:

  • (cd packages/natives && bun run check)
  • bun test packages/coding-agent/test/tool-discovery/initial-tools.test.ts packages/coding-agent/test/tools/computer.test.ts
  • (cd packages/coding-agent && bun run check)


[repo owner's gaebal-gajae (clawdbot) 🦞]

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4df5c6584

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

async function computeCheckpointChangeSet(cwd: string): Promise<UltragoalChangeSet | undefined> {
const inGit = await spawnText(["git", "rev-parse", "--is-inside-work-tree"], { cwd, timeoutMs: 3000 });
if (!inGit.ok || inGit.stdout.trim() !== "true") return undefined;
if (!(await Bun.file(path.join(cwd, ".git")).exists())) return undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve the git root before reading checkpoint changesets

When a completion checkpoint is run with cwd set to a repo subdirectory such as packages/coding-agent, git rev-parse --is-inside-work-tree succeeds but cwd/.git does not exist, so this returns undefined before parsing any diffs. That makes the trusted changeset unavailable and lets computer-path edits skip the mandatory computer red-team suite unless the executor manually declared matching changedPaths; use git rev-parse --show-toplevel or remove the cwd/.git guard.

Useful? React with 👍 / 👎.

Comment on lines +2203 to +2205
"computer.killSwitchHotkey": {
type: "string",
default: "Control+Option+Command+Escape",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wire the configured kill-switch hotkey before exposing it

This setting is user-facing, but no code reads computer.killSwitchHotkey; the native listener still hardcodes Control+Option+Command+Escape in crates/pi-natives/src/computer/hotkey.rs. If a user changes the setting, the UI/config can advertise a stop combo that the event tap will never match, which is especially risky for the safety kill switch; either remove the configurable setting or pass it into the native listener.

Useful? React with 👍 / 👎.

const options = { signal };
switch (params.action) {
case "screenshot":
return controller.screenshot?.(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Throw when the native computer method is missing

All dispatch branches use optional chaining, so a stale or partially built native addon that has ComputerController but lacks a specific method returns undefined; execute() then marks the action as successful and tells the model it completed even though no screenshot/input happened. Since createNativeComputerController() only checks the constructor, dispatch should require the selected method and fail with a stable unavailable error instead of reporting success.

Useful? React with 👍 / 👎.

Comment thread docs/tools/computer.md
@@ -0,0 +1,71 @@
# computer

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate embedded docs for the new computer page

Adding this docs page without updating packages/coding-agent/src/internal-urls/docs-index.generated.ts leaves it out of EMBEDDED_DOC_FILENAMES and EMBEDDED_DOCS, so the in-app gjc://tools/computer.md documentation lookup and listing cannot serve the new computer tool docs. Regenerate the embedded docs index after adding this file.

Useful? React with 👍 / 👎.

@Yeachan-Heo Yeachan-Heo merged commit a3967ff into dev Jun 16, 2026
6 checks passed
@Yeachan-Heo Yeachan-Heo deleted the feat/computer-use branch June 16, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant