feat(computer-use): coordinate-contract foundation (slice 1)#695
Conversation
First slice of the native computer-use tool (macOS-only v1), scoped via the deep-interview spec + ralplan consensus plan. Lands the pure, framework-free coordinate contract: NormalizedDisplay maps normalized screenshot pixels to macOS logical points (Retina/HiDPI-safe) with out-of-bounds and invalid-scale rejection, plus full unit tests. Adds docs/computer-use/ capturing locked decisions, the coordinate contract, and the delivery roadmap. Native capture/input backend, kill-switch supervisor, napi/TS tool surface, and the manual macOS end-to-end acceptance are tracked follow-ups (require macOS hardware, granted TCC, and a human-operated drill).
Adds crates/pi-natives/src/computer/capture.rs (macOS-gated): read-only primary-display capture via raw CoreGraphics FFI into a PNG plus the NormalizedDisplay descriptor (scale derived from captured physical pixels vs logical bounds). A missing Screen Recording grant surfaces CaptureError::CaptureFailed rather than a silent black frame. Verified live with Screen Recording granted: a real, non-uniform primary-display capture decodes as a PNG with matching dimensions (cargo test -p pi-natives --ignored captures_non_uniform_primary_display). The GUI capture test is #[ignore] so CI stays deterministic.
Adds crates/pi-natives/src/computer/permissions.rs (macOS): non-prompting preflight for Accessibility (input injection) and Screen Recording (capture), Settings-pane openers, and require_*_for_input/capture guards returning COMPUTER_PERMISSION_REQUIRED so callers fail closed. This is the fail-closed gate prerequisite: input must not fire unless Accessibility preflight passes. Live probe in this environment reports accessibility=false (and screen_recording=false for the executing binary, which still captured via host-process inheritance) — confirming input injection / live E2E require Accessibility granted to the actual executing gjc process before they can proceed.
Adds a #[napi] computerScreenshot binding over the landed CoreGraphics capture, regenerates packages/natives bindings, and adds a TS test that verifies a decodable PNG with matching dimensions. Verified live (bun --cwd=packages/natives test computer: 1 pass, 13 expect()). This wires the read-only screenshot primitive through the napi -> TS bridge. Input primitives + kill-switch remain gated: live verification requires Accessibility granted to the actual gjc process (a cargo/test binary is never TCC-trusted for input injection).
Adds crates/pi-natives/src/computer/input.rs: InputController orchestrates click/double_click/move/drag/scroll/type/keypress over an EventSink trait, tracking held buttons so release_all cleans up after abort/error. Coords flow through coords.rs; out-of-bounds is rejected and the drag error path releases the held button. 9 unit tests drive a RecordingSink to verify exact event sequences (no real OS events). The real CGEvent-backed MacEventSink is constructed only via guarded_controller(), which require_accessibility_for_input() gates — so input physically cannot fire while Accessibility is ungranted. Not yet napi/model-exposed (per the plan, input ships only after the kill-switch is proven). Live OS-event behavior is verified in a granted gjc session; the orchestration logic is fully unit-tested here.
Adds current_cursor_position() (CGEventGetLocation) and a guarded live test that moves the cursor to the display center and reads it back. First run revealed a bare kCGEventMouseMoved event does not relocate the cursor; fixed move_cursor to use CGWarpMouseCursorPosition (then post the moved event for hover). Live test now passes within 2 logical points with Accessibility granted to the host app (cmux). This proves the CGEvent input pipeline end-to-end on the real desktop: coords.rs transform -> warp -> read-back. Input is still gated (guarded_controller requires Accessibility) and not yet model-exposed; click/type await the kill-switch per the plan.
Adds crates/pi-natives/src/computer/supervisor.rs: process-global Supervisor with fail-closed input_allowed (requires hotkey_live + fresh heartbeat + not suspended), trigger_stop latch, and user-only reset. 5 unit tests cover the gating truth table (not-live, live+fresh, stale heartbeat, stop-latch-until-reset, lost liveness). Pure atomics so the safety logic is deterministic; the OS hotkey listener drives it next.
Adds crates/pi-natives/src/computer/hotkey.rs: a listen-only CGEventTap on a dedicated CFRunLoop thread that latches Supervisor::trigger_stop on the configured global hotkey (Ctrl+Opt+Cmd+Escape), marking the supervisor stop path live on tap creation (fails closed otherwise). Independent of the model tool path. Verified live: synthetic_hotkey_triggers_stop posts a synthetic hotkey and observes the supervisor latch suspended end-to-end (Accessibility sufficed for tap creation; no separate Input Monitoring needed). clippy clean, fmt applied; aligned CFRunLoopGetCurrent signature with appearance.rs to avoid the clashing-extern warning.
The `computer_screenshot` napi function lived in the macOS-gated
`computer::capture` module, so napi-rs omitted it from the generated
`index.{js,d.ts}` on non-macOS targets. CI's Linux native build
regenerated the bindings without `computerScreenshot`, breaking the
`@gajae-code/natives` type check on `test/computer.test.ts`
(TS2305: no exported member 'computerScreenshot').
Move the `ComputerScreenshot` struct and `computerScreenshot` binding
into `computer::mod` (compiled on all platforms) and gate only the macOS
CoreGraphics capture call internally, matching the `detectMacOSAppearance`
pattern. Non-macOS callers receive a clear unsupported-platform error, and
the generated TypeScript surface is now identical across platforms.
CI fix: Affected path validationRoot cause. The The Reproduced locally (Linux): Fix. Moved the Validation (all green locally on Linux):
Commit: f6252cb — |
…001) Adds crates/pi-natives/src/computer/executor.rs: the single side-effect authority. execute_input runs a fail-closed gate (supervisor live+fresh+ not-suspended, Accessibility granted, matching display epoch for coordinate actions) before dispatching to InputController, and runs release_all on any error or mid-flight suspension. Stable error codes (COMPUTER_SUSPENDED / _SUPERVISOR_NOT_LIVE / _PERMISSION_REQUIRED / _DISPLAY_STALE / _COORD_INVALID / _CANCELLED). DisplayContext trait defines the display-epoch staleness contract; PermissionGate is injectable. 9 unit tests with a real Supervisor + fake perms/display + recording sink cover every gate-rejection path, matching-epoch success, out-of-bounds release-all, type/keypress/wait, and stable codes. 37 computer tests pass; clippy clean. Added InputController::into_sink accessor.
…am gate (G004) G002 (pi-natives): capture.rs gains display_epoch (geometry hash) + capture_id + lightweight current_display_epoch() (no Screen Recording); executor.rs gains MacPermissionGate/MacDisplayContext providers; new controller.rs exposes a #[napi] ComputerController whose 9 methods are thin adapters that all route through executor::execute_input (the single side-effect authority); bypass_guard.rs statically asserts InputController side-effect methods are only called from input.rs/executor.rs. Bindings regenerated. cargo test computer:: = 38 pass (incl bypass guard), clippy clean. G004 (ultragoal gate): ultragoal-runtime.ts gains a trusted changeSet data-flow (checkpoint+review), computer-touching detection from trusted changed paths (declarations additive-only), a mandatory computer adversarial case-set (7 IDs, no not_applicable) requiring live/structural native proof (inline/metadata/receipt-only fail with COMPUTER_REDTEAM_*), computer/native surface tokens, docs-only tiering, and byte-for-byte non-computer compatibility. New fixture matrix: 7 pass; ultragoal-runtime 102 pass + review 8 pass (non-regression).
|
Pushed a narrow follow-up for the latest
The PR remains draft; I did not mark it ready or merge it. — |
…log (G003) Adds packages/coding-agent/src/tools/computer.ts: an AgentTool with the exact OpenAI 9-action snake_case zod schema routing through the @gajae-code/natives ComputerController, AbortSignal/timeout propagation, and COMPUTER_* error mapping. Off-by-default + fail-closed: callable only on macOS when computer.enabled||computer.alwaysOn; disabled returns COMPUTER_DISABLED without constructing the controller or starting native resources. First-class WITHOUT unsafe default exposure via a separate BUILTIN_CAPABILITY_CATALOG (metadata-only) distinct from the callable BUILTIN_TOOLS, so a disabled computer is documented/listable but not in the session registry and not auto-activatable by search_tool_bm25. Adds prompt, bounded renderer, settings-schema entries, and docs/tools/computer.md. 9 tool tests pass (exact schema, camelCase rejection, gating, disabled->COMPUTER_DISABLED, dispatch mapping); biome clean.
|
Pushed a narrow formatting follow-up for the latest Affected path validation failure:
The PR remains draft; no ready/merge action taken. — |
- ComputerToolDetails: add optional meta field so it satisfies the ToolResultBuilder DetailsWithMeta weak-type constraint; toolResult() now infers ComputerToolDetails instead of falling back to DetailsWithMeta - render.ts: type summarizeComputerDetails parts as string[] so dynamic summary strings are not constrained to the action literal union - computer-red-team-fixtures.test.ts: drop removed gjcObjective input option from createUltragoalPlan; guard possibly-undefined stderr/stdout
CI fix: Affected path validation (
|
|
Heads-up: the — |
Adds the COMPUTER_USE_MACOS_TEXTEDIT_ALL_NINE manual-acceptance drill as an #[ignore] live test (crates/pi-natives/src/computer/input.rs live_tests): drives all nine primitives (screenshot/move/click/type/keypress/ double_click/drag/scroll/wait) through the production gated path (execute_input + Supervisor + Mac providers) against the focused app, then waits for a human Control+Option+Command+Escape press and asserts the kill-switch latches and blocks further input until reset. Ignored by default (needs macOS + grants + a human keypress); run with: cargo test -p pi-natives computer::input::live_tests::all_nine_acceptance_drill -- --ignored --nocapture Note: the drill refreshes the supervisor heartbeat per action as a stand-in for a periodic listener heartbeat (follow-up: tick heartbeat from the hotkey listener thread).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 397d33f67d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return controller.click?.( | ||
| { | ||
| x: params.x, | ||
| y: params.y, | ||
| button: params.button ?? "left", | ||
| timeoutMs: secondsToMs(params.timeout), | ||
| includeScreenshot: params.include_screenshot, | ||
| }, | ||
| options, |
There was a problem hiding this comment.
Call native computer methods with positional arguments
The real ComputerController generated API in packages/natives/native/index.d.ts exposes positional methods such as click(expectedEpoch, x, y, button), drag(expectedEpoch, x, y, toX, toY, button), etc., but this dispatcher sends a single payload object plus an options object. In a macOS session using the native controller, the object is passed where napi expects the first numeric/null expectedEpoch, so side-effecting actions fail before any input is posted; the current tests only exercise the mock object-shaped API.
Useful? React with 👍 / 👎.
| CGEventTapEnable(tap, true); | ||
| Supervisor::global().set_hotkey_live(true); | ||
| CFRunLoopRun(); |
There was a problem hiding this comment.
Keep the kill-switch heartbeat fresh while the tap runs
The supervisor treats the stop path as stale after HEARTBEAT_FRESH_MS (2s), but the hotkey listener only calls set_hotkey_live(true) once before entering CFRunLoopRun() and never calls Supervisor::heartbeat() afterward. Any computer input attempted more than two seconds after the listener starts will therefore fail the gate as SupervisorNotLive even though the event tap is still running; the listener needs a periodic heartbeat while the run loop is alive.
Useful? React with 👍 / 👎.
| const maybe = error as { code?: unknown; message?: unknown }; | ||
| const rawCode = typeof maybe?.code === "string" ? maybe.code : undefined; | ||
| const code = | ||
| rawCode && (NATIVE_ERROR_CODES.has(rawCode) || rawCode.startsWith("COMPUTER_")) ? rawCode : "COMPUTER_ERROR"; |
There was a problem hiding this comment.
Preserve native COMPUTER_ codes when mapping errors*
Native controller errors are constructed as GenericFailure with the stable code embedded in the message (for example COMPUTER_PERMISSION_REQUIRED: ...), not as a JS error.code value, so this mapper falls back to COMPUTER_ERROR for those failures. That breaks callers/UI that rely on the documented stable COMPUTER_* codes for permission, stale-display, supervisor, and suspension cases; parse the message prefix or have the native layer set a matching JS code.
Useful? React with 👍 / 👎.
… kill-switch window The all-nine acceptance drill now writes g005-before.png, g005-after-killswitch.png, and g005-manifest.json to .gjc/ultragoal/artifacts/g005 (override COMPUTER_USE_ACCEPTANCE_DIR), so the human-run drill produces the durable live native proof the G004 mandatory computer red-team suite requires on disk. Widen the kill-switch wait from ~10s to ~60s for manual operation and drop an unnecessary mut on the act closure.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30aee963e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let first = STARTED.set(true).is_ok(); | ||
| if first { | ||
| thread::Builder::new() | ||
| .name("computer-killswitch".into()) | ||
| .spawn(run_listener) | ||
| .ok(); |
There was a problem hiding this comment.
Retry hotkey startup after a failed first attempt
When the first start() happens before the event tap can be created (for example, before the user grants Accessibility/Input Monitoring), STARTED is set permanently and later calls only wait for hotkey_live instead of spawning run_listener again. That makes the documented grant-then-retry flow fail closed until the whole process is restarted, because the supervisor can never become live after permissions are fixed.
Useful? React with 👍 / 👎.
| scale_y: frame.display.scale_y, | ||
| origin_x: frame.display.origin_x, | ||
| origin_y: frame.display.origin_y, | ||
| display_epoch: frame.display_epoch as f64, |
There was a problem hiding this comment.
Keep display epoch out of lossy JS numbers
When a caller uses screenshot().displayEpoch as the expectedEpoch for a coordinate action, this u64 hash is converted to a JS number and loses low bits for almost all hash values; epoch_from_f64 then casts the rounded value back and compares it to the full current_epoch(), so a fresh screenshot can be rejected as COMPUTER_DISPLAY_STALE. Return the epoch as a string/BigInt-compatible value or constrain it to a 53-bit-safe integer before exposing it to JS.
Useful? React with 👍 / 👎.
| if (typeof input.scroll_x === "number" || typeof input.scroll_y === "number") { | ||
| parts.push(`scroll ${input.scroll_x ?? 0},${input.scroll_y ?? 0}`); | ||
| } | ||
| if (Array.isArray(input.keys)) parts.push(`keys ${input.keys.join("+")}`); |
There was a problem hiding this comment.
Sanitize key summaries before rendering
The computer schema accepts arbitrary key strings, and this renderer writes input.keys.join("+") directly into the TUI without replaceTabs() or truncateToWidth(), despite the repo renderer contract requiring all displayed tool text to be sanitized. A malformed or very long key name in a keypress call can therefore introduce tabs/ANSI/control text or unbounded output into the TUI; sanitize and bound the joined summary before appending it.
Useful? React with 👍 / 👎.
Add isComputerLoadablePlatform (true everywhere except win32) and gate both BUILTIN_CAPABILITY_CATALOG and the BUILTIN_TOOLS computer factory on it: macOS stays callable, Linux stays listable (support planned via #712), Windows is fully absent (not registered, not advertised). Also fix a pre-existing initial-tools metadata test by constructing ComputerTool directly for loadMode coverage, mirroring how other createIf-gated tools (Ask/Ssh/Job/Recipe/Irc) are handled.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4026bad30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (screenshot) details.screenshot = screenshot; | ||
| details.status = "success"; | ||
| details.message = describeComputerSuccess(details); | ||
| return toolResult(details).text(details.message).done(); |
There was a problem hiding this comment.
Return screenshots as image content
When a screenshot action succeeds, the tool returns only the text summary as tool content; the PNG is reduced to metadata in details, which provider adapters do not send as image input. In an enabled macOS computer session the model therefore sees only Computer screenshot completed (WxH) and never receives the pixels it needs to choose coordinates, making the core screenshot→act loop unusable. Return a bounded image/png content block alongside the text and enforce computer.screenshotMaxBytes.
Useful? React with 👍 / 👎.
| let frame = | ||
| capture_primary_display().map_err(|err| napi::Error::from_reason(format!("{err}")))?; |
There was a problem hiding this comment.
Preflight Screen Recording before capture
When Screen Recording is missing, this path calls capture_primary_display() directly and wraps its CaptureFailed as a plain N-API reason, so the TS tool surfaces a generic COMPUTER_ERROR and never opens the Screen Recording settings pane. In the documented grant-then-retry flow for first-time macOS users, screenshot should fail with COMPUTER_PERMISSION_REQUIRED; call the screen-recording preflight or map this capture failure to the stable permission code before returning.
Useful? React with 👍 / 👎.
| pub fn keypress(&mut self, keys: &[String]) -> Result<(), InputError> { | ||
| for name in keys { | ||
| let code = key_code_for(name).ok_or_else(|| InputError::UnknownKey(name.clone()))?; | ||
| self.sink.key(code, true); | ||
| self.sink.key(code, false); |
There was a problem hiding this comment.
Send modifier chords instead of sequential taps
For shortcut inputs such as the schema/tested keys: ["Meta", "K"], this loop resolves and presses each key independently, and key_code_for does not define modifier names like Meta/Command/Shift/Control/Option. On macOS those common shortcut calls either fail with UnknownKey before doing anything useful or, after adding codes, would still send separate taps rather than holding the modifier while pressing the key; modifiers need to be mapped and held through the chord.
Useful? React with 👍 / 👎.
| expect(enabledNames).toContain("computer"); | ||
| expect(alwaysOnNames).toContain("computer"); |
There was a problem hiding this comment.
Guard macOS-only tool exposure assertion
On non-Darwin CI, createTools() uses the real process.platform through ComputerTool.createIf, so even with computer.enabled or computer.alwaysOn set it returns no computer tool. These assertions therefore fail on Linux runners; gate this test to macOS or inject a platform-specific callable check instead of expecting the tool to be present everywhere.
Useful? React with 👍 / 👎.
|
Pushed a narrow follow-up for the latest #695 Affected path validation failure. Root cause: the computer tool tests intended to exercise macOS-enabled behavior, but the Linux CI runner still drove Fix:
Validation:
— |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0e996ba4da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| spawnText(["git", "diff", "--name-status"], { cwd, timeoutMs: 5000 }), | ||
| spawnText(["git", "diff", "--cached", "--name-status"], { cwd, timeoutMs: 5000 }), |
There was a problem hiding this comment.
Include untracked files in computer gate changesets
This changeset is used to decide whether complete checkpoints must run the mandatory computer red-team cases, but the worktree side only reads git diff --name-status and git diff --cached --name-status. A newly added computer file that is still untracked, such as crates/pi-natives/src/computer/foo.rs before git add, is absent from both outputs, so trustedChangeSetRequiresComputerSuite() sees no computer path and the checkpoint can pass without the mandatory suite. Include untracked files, for example via git ls-files --others --exclude-standard, before validating.
Useful? React with 👍 / 👎.
| const clickSchema = z | ||
| .object({ action: z.literal("click"), x: z.number(), y: z.number(), button: buttonSchema.optional(), ...shared }) | ||
| .strict(); |
There was a problem hiding this comment.
Thread screenshot epochs through coordinate actions
The coordinate schemas are strict and only allow x/y/button fields; there is no displayEpoch/expectedEpoch field for click, double_click, move, drag, or scroll, and the dispatcher consequently has no value to pass to the native expected_epoch. In a session where the user acts on a screenshot after display topology or scale changes, native gate() receives None and skips the stale-display check, so the documented COMPUTER_DISPLAY_STALE protection is unreachable from the model-facing tool.
Useful? React with 👍 / 👎.
| self.cursor = end; | ||
| self.sink.move_cursor(end); | ||
| self.release(end, button); |
There was a problem hiding this comment.
Emit dragged events while the mouse button is held
During a drag, this uses the generic move_cursor path after pressing the button; on macOS that path posts kCGEventMouseMoved and warps the cursor, not a button-specific dragged event. Apps that listen for left/right/other-mouse-dragged events for selection or drag-and-drop will see a down, a plain move, and an up, so the drag primitive can fail even though the pointer moved; add a drag move event to the sink for the held button.
Useful? React with 👍 / 👎.
| fn CGEventCreateScrollWheelEvent( | ||
| source: CgEventSourceRef, | ||
| units: u32, | ||
| wheel_count: u32, | ||
| wheel1: i32, | ||
| wheel2: i32, | ||
| ) -> CgEventRef; |
There was a problem hiding this comment.
Declare the scroll-wheel FFI as variadic
CGEventCreateScrollWheelEvent takes wheel1 followed by variadic wheel deltas, but this declaration makes wheel2 a fixed parameter. On macOS targets where variadic and non-variadic calls use different ABI rules, the horizontal delta for wheelCount = 2 is read from the wrong place, so scroll_x can be ignored or corrupted. Declare the function as C-variadic and pass the second wheel as the vararg, or use wheelCount = 1 when only vertical scrolling is needed.
Useful? React with 👍 / 👎.
|
Pushed the follow-up for the latest #695 Affected path validation failure. Root cause: after the Linux native build, the generated Fix:
Validation:
— |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4df5c6584
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| async function computeCheckpointChangeSet(cwd: string): Promise<UltragoalChangeSet | undefined> { | ||
| const inGit = await spawnText(["git", "rev-parse", "--is-inside-work-tree"], { cwd, timeoutMs: 3000 }); | ||
| if (!inGit.ok || inGit.stdout.trim() !== "true") return undefined; | ||
| if (!(await Bun.file(path.join(cwd, ".git")).exists())) return undefined; |
There was a problem hiding this comment.
Resolve the git root before reading checkpoint changesets
When a completion checkpoint is run with cwd set to a repo subdirectory such as packages/coding-agent, git rev-parse --is-inside-work-tree succeeds but cwd/.git does not exist, so this returns undefined before parsing any diffs. That makes the trusted changeset unavailable and lets computer-path edits skip the mandatory computer red-team suite unless the executor manually declared matching changedPaths; use git rev-parse --show-toplevel or remove the cwd/.git guard.
Useful? React with 👍 / 👎.
| "computer.killSwitchHotkey": { | ||
| type: "string", | ||
| default: "Control+Option+Command+Escape", |
There was a problem hiding this comment.
Wire the configured kill-switch hotkey before exposing it
This setting is user-facing, but no code reads computer.killSwitchHotkey; the native listener still hardcodes Control+Option+Command+Escape in crates/pi-natives/src/computer/hotkey.rs. If a user changes the setting, the UI/config can advertise a stop combo that the event tap will never match, which is especially risky for the safety kill switch; either remove the configurable setting or pass it into the native listener.
Useful? React with 👍 / 👎.
| const options = { signal }; | ||
| switch (params.action) { | ||
| case "screenshot": | ||
| return controller.screenshot?.( |
There was a problem hiding this comment.
Throw when the native computer method is missing
All dispatch branches use optional chaining, so a stale or partially built native addon that has ComputerController but lacks a specific method returns undefined; execute() then marks the action as successful and tells the model it completed even though no screenshot/input happened. Since createNativeComputerController() only checks the constructor, dispatch should require the selected method and fail with a stable unavailable error instead of reporting success.
Useful? React with 👍 / 👎.
| @@ -0,0 +1,71 @@ | |||
| # computer | |||
There was a problem hiding this comment.
Regenerate embedded docs for the new computer page
Adding this docs page without updating packages/coding-agent/src/internal-urls/docs-index.generated.ts leaves it out of EMBEDDED_DOC_FILENAMES and EMBEDDED_DOCS, so the in-app gjc://tools/computer.md documentation lookup and listing cannot serve the new computer tool docs. Regenerate the embedded docs index after adding this file.
Useful? React with 👍 / 👎.
Native computer-use tool — Slice 1 (foundation)
Draft PR. First slice of a new model-agnostic
computertool that drives theuser's real macOS desktop via the OpenAI computer-use action set. Scoped through
GJC's deep-interview (requirements) → ralplan (Planner/Architect/Critic
consensus) workflows.
What's in this PR
crates/pi-natives/src/computer/coords.rs— the pure, framework-freecoordinate contract:
NormalizedDisplaymaps a screenshot-space pixel to amacOS logical point via per-axis scale + display origin (Retina/HiDPI-safe),
rejecting out-of-bounds and non-finite inputs. 12 unit tests (scale 1.0/2.0,
fractional + anisotropic scale, non-zero origins, edges, out-of-bounds,
invalid scale). No display or granted permissions required.
crates/pi-natives/src/computer/mod.rs+pub mod computer;inlib.rs.docs/computer-use/README.md— locked decisions (ADR summary), coordinatecontract, and the delivery roadmap.
Verification
cargo test -p pi-natives computer::→ 12 passed.cargo clippy -p pi-natives --all-targets→ no warnings from the new module.fmt:rs).Locked decisions (full ADR in
docs/computer-use/README.md)macOS-only v1 · exact OpenAI 9-action schema (
screenshot/click/double_click/move/drag/scroll/type/keypress/wait) · any model via a generictool-call interface · off by default + opt-in + persistent always-on · single
normalized primary display (Rust owns transforms) · autonomous but a
daemon-enforced global kill-switch outside model control · central
execute_actionstate machine ·SupervisorClientboundary so an out-of-processsupervisor can be swapped in later.
Why draft — remaining work (follow-ups)
The native backend, kill-switch supervisor + event-tap lifecycle, napi/TS tool
surface, and the manual macOS end-to-end acceptance require real macOS
hardware, granted TCC permissions (Accessibility + Screen Recording), and a
human-operated drill (TextEdit all-nine + kill-switch), so they are tracked as
follow-up slices rather than landed here.
capture/input/permissions/execute_action)computertool surface (schema/gating/prompt/renderer)