Skip to content

Add opt-in browser tools for desktop tabs#1359

Open
boudra wants to merge 22 commits into
mainfrom
expose-browser-tools
Open

Add opt-in browser tools for desktop tabs#1359
boudra wants to merge 22 commits into
mainfrom
expose-browser-tools

Conversation

@boudra
Copy link
Copy Markdown
Collaborator

@boudra boudra commented Jun 5, 2026

Linked issue

None.

Type of change

  • Bug fix
  • New feature (with prior issue + design alignment)
  • Refactor / code improvement
  • Docs

What does this PR do

Adds an explicit opt-in for browser tools and lets local agents work with Paseo desktop browser tabs through the daemon MCP surface.

When browser tools are disabled, browser_* tools are absent from newly-created MCP tool lists instead of appearing and returning an error. MCP sessions that were already initialized may retain their previous tool list until reinitialized, but execution is still blocked by browser_disabled.

Architecture review follow-up: browser automation is now advertised through one explicit app-provided capability, not client-package Electron detection or split read/interaction caps. The app mounts browser automation on the host-runtime client lifecycle, outside React session rendering. The settings opt-in uses a React Query mutation with pending and visible error states.

When enabled, agents can create and target desktop browser tabs, inspect page state, interact with refs, navigate, capture screenshots/PDFs, read logs/storage, set viewport/geolocation, set page background color, and exercise upload/download flows. Browser ownership is scoped to the workspace so inactive workspace tabs stay targetable without relying on global focus.

Post-test fixes made the product path usable for agents instead of requiring them to guess a tab target:

  • browser_new_tab creates and opens a usable browser tab for the calling workspace.
  • Browser tool output and guidance expose real browserId values and tell agents not to use default.
  • Empty tab states now tell agents to call browser_new_tab.
  • browser_set_background supports page background changes.
  • The fallback browser registration path now returns retryable browser_timeout if both visible-tab and fallback registration fail, instead of returning success with a dead browserId.
  • browser_screenshot and browser_full_page_screenshot now expose PNG image content to agents, not just dimensions/status text. Viewport screenshots use the same CDP capture path as full-page screenshots when available.

How did you verify it

Local validation passed:

  • npm run format
  • npm run typecheck
  • npm run lint
  • Targeted browser-tools Vitest coverage for protocol schemas, desktop automation service/snapshot engine, server MCP tools/broker, app handler, config persistence, and the settings opt-in pure config seam.
  • Targeted app handler coverage for browser_new_tab visible registration success, fallback registration success, and fallback registration timeout failure.

Real desktop QA completed through an isolated daemon and a dedicated QA Electron, without touching the main daemon or user app:

  • Phase D workspace targeting: A/B workspace browser ownership and parked inactive webviews stayed targetable.
  • Phase E interaction tools: snapshot/refs, click/fill, wait, type/key, navigation, and screenshot worked through real MCP calls.
  • Phase F advanced tools: focus/clear/check/select, hover, drag, logs, storage, viewport/geolocation, full-page screenshot, PDF, upload, and download worked through real MCP calls.
  • Real-agent product acceptance passed through product-exposed Paseo browser MCP tools:
    • Google world cup: the direct search URL hit Google /sorry, but the Google homepage UI flow reached a real results page titled world cup - Tìm trên Google with World Cup, Wikipedia, and FIFA links.
    • GitHub stars were read as 7.8k.
    • example.com background was changed to red.
    • localhost:8081 console logs were captured.
  • Focused post-fix audits passed, including the fallback-registration negative path.
  • Screenshot-content QA passed after launching the dedicated QA Electron/Metro stack with PASEO_WEB_PLATFORM=electron: the in-app Browser panel showed the https://example.com/ URL bar/webview chrome instead of Browser is desktop-only, and a fresh real agent listed the registered https://example.com/ tab, called browser_screenshot, received image content, and described the visible Example Domain page from the screenshot pixels.

Risk surface

  • Desktop/Electron automation: this is the highest-risk surface. The real QA path covered the desktop webview bridge and MCP routing, but platform-specific Electron behavior can still vary by OS/window state.
  • Fallback webview lifecycle: retrying browser_new_tab after registration timeout may leave hidden fallback webviews or extra browser tabs until normal app cleanup handles them.
  • Google search path: direct Google search URLs can hit a bot page; the homepage UI flow passed in product QA.
  • Protocol compatibility: new browser automation messages, command fields, and capability flags are additive/backward-compatible. Older clients should continue parsing; new browser tooling requires the new desktop capability.
  • Opt-in/persisted config: browser tools stay behind daemon.browserTools.enabled, persisted in daemon config, and exposed in settings. The pure config seam is unit-tested; the full settings click path was not re-run during PR body update.
  • Cross-platform caveats: browser automation is Electron desktop-only. Mobile and plain web should not attempt to run the desktop browser bridge.
  • Real-click opt-in QA gap: the browser automation flows were real-QA’d after opt-in, but there is no fresh visual recording in this PR step of manually clicking the settings toggle end to end.

Checklist

  • One focused change. Unrelated cleanups split out.
  • npm run typecheck passes
  • npm run lint passes
  • npm run format ran (Biome)
  • UI changes include screenshots or video for every affected platform
  • Tests added or updated where it made sense

Adds the daemon opt-in, desktop tab routing, MCP tools, and real browser automation surfaces for Paseo desktop browser tabs.
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 5, 2026

Greptile Summary

Adds an explicit opt-in for desktop browser automation, surfacing a full suite of Paseo browser MCP tools through the daemon to local agents. Browser tools are gated behind daemon.browserTools.enabled (persisted config, toggled via a new settings card) and are absent from MCP tool lists when disabled.

  • New end-to-end path: BrowserToolsBroker (server) ↔ WebSocket capability flag desktop_browser_automation ↔ desktop client ↔ Electron IPC ↔ executeAutomationCommand (service + snapshot engine). The broker uses safeParse throughout and always resolves — never rejects — so every failure surfaces as a structured BrowserToolsResponsePayload.
  • App-side handler (handler.ts) mounts on the host-runtime client lifecycle (outside React session rendering) and properly wraps both the new_tab path and the fallback-webview registration path in try/catch.
  • Browser tab ownership is scoped per workspace; file upload paths are restricted to the agent workspace; download filenames are sanitized with basename(); navigate/download URLs are restricted to http:/https:.

Confidence Score: 5/5

Safe to merge. All major risks from earlier review rounds — missing try/catch on new_tab, Zod rejection in broker.execute, path-traversal in downloads, file exfiltration via upload, unsafe URL schemes, and the capability-upgrade freeze — are addressed in the commits referenced in the previous-thread replies.

All previous blocking findings have been fixed. The remaining finding is a naming/export style concern on the parking-lot module that does not affect correctness or security. The core broker, handler, WebSocket wiring, and desktop service are all in good shape.

packages/app/src/components/browser-webview-parking.electron.ts — minor test-export naming issue, no functional impact.

Important Files Changed

Filename Overview
packages/server/src/server/browser-tools/broker.ts New BrowserToolsBroker: uses safeParse for request validation (no rejection path), resolves all pending requests on client disconnect, and properly handles send-side errors. Clean class with a small public surface.
packages/app/src/browser-automation/handler.ts new_tab and all other command paths are now wrapped in try/catch; fallback webview path correctly guarded; dependency-injected options make it testable.
packages/server/src/server/websocket-server.ts syncBrowserToolsClientRegistration is called both inside the capability-change block and unconditionally on resume; the function's early-return guard handles the double-call correctly. Unregister on disconnect and server close are both wired.
packages/desktop/src/features/browser-automation/service.ts Large command-handler table. Upload paths are workspace-scoped, download filenames are basename()-sanitized, URL schemes are validated. Tab resolution logic is consistent across all commands.
packages/app/src/components/browser-webview-parking.electron.ts Module-level Map state with a clearParkedBrowserWebviewsForTests export — a test-support function exported from production code, which violates the project's test-discipline rules.
packages/server/src/server/browser-tools/mcp-tools.ts Comprehensive MCP tool registrations with workspace-scoped context resolution, structured output schema, and image content for screenshot results. Policy enforcement is correctly delegated to the broker.
packages/server/src/server/bootstrap.ts BrowserToolsBroker wired from bootstrap into both WebSocket server and MCP server. browserToolsEnabled is snapshotted at agent creation time (documented, intentional behavior).
packages/protocol/src/messages.ts BrowserAutomationExecuteRequest/Response added to outbound/inbound message discriminated unions; WSHelloMessageSchema updated with desktopBrowserAutomation capability; browserTools added to MutableDaemonConfigSchema with default.

Sequence Diagram

sequenceDiagram
    participant Agent as MCP Agent
    participant MCPTools as mcp-tools.ts (server)
    participant Broker as BrowserToolsBroker
    participant WSServer as WebSocketServer
    participant AppClient as App (host-runtime)
    participant Handler as handler.ts
    participant IPC as Electron IPC
    participant Service as service.ts

    Agent->>MCPTools: "browser_* tool call"
    MCPTools->>Broker: execute(command)
    Broker->>Broker: policy.isEnabled()?
    alt disabled
        Broker-->>MCPTools: browser_disabled
    else enabled
        Broker->>WSServer: sendBrowserAutomationRequest via registered client
        WSServer->>AppClient: browser.automation.execute.request (WS message)
        AppClient->>Handler: handleBrowserAutomationRequest
        Handler->>IPC: executeAutomationCommand (Electron IPC)
        IPC->>Service: executeAutomationCommand(request, registry)
        Service-->>IPC: AutomationCommandPayload
        IPC-->>Handler: response
        Handler->>AppClient: sendBrowserAutomationExecuteResponse
        AppClient->>WSServer: browser.automation.execute.response (WS message)
        WSServer->>Broker: receiveResponse
        Broker-->>MCPTools: BrowserToolsResponsePayload
        MCPTools-->>Agent: CallToolResult
    end
Loading

Reviews (15): Last reviewed commit: "Return browser failure when send fails" | Re-trigger Greptile

Comment thread packages/server/src/server/websocket-server.ts
Comment thread packages/server/src/server/browser-tools/index.ts Outdated
Comment thread packages/desktop/src/features/browser-automation/ipc.ts Outdated
Comment thread packages/desktop/src/features/browser-automation/service.ts
Comment thread packages/desktop/src/features/browser-automation/service.ts
Comment thread packages/app/src/browser-automation/handler.ts Outdated
@boudra
Copy link
Copy Markdown
Collaborator Author

boudra commented Jun 6, 2026

Fixed the remaining Greptile blocker in 38c8918: broker response handling now uses safeParse, clears the pending request, and returns a structured browser_unknown_error when the desktop sends an invalid browser automation response. Added regression coverage in broker.test.ts.

@boudra
Copy link
Copy Markdown
Collaborator Author

boudra commented Jun 6, 2026

Fixed the latest Greptile blocker in 7db3153: BrowserToolsBroker now resolves WebSocket send failures as structured browser_unknown_error payloads and clears pending state instead of rejecting into raw MCP errors. Added regression coverage in broker.test.ts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant