Add opt-in browser tools for desktop tabs#1359
Conversation
Adds the daemon opt-in, desktop tab routing, MCP tools, and real browser automation surfaces for Paseo desktop browser tabs.
|
| Filename | Overview |
|---|---|
| packages/server/src/server/browser-tools/broker.ts | New BrowserToolsBroker: uses safeParse for request validation (no rejection path), resolves all pending requests on client disconnect, and properly handles send-side errors. Clean class with a small public surface. |
| packages/app/src/browser-automation/handler.ts | new_tab and all other command paths are now wrapped in try/catch; fallback webview path correctly guarded; dependency-injected options make it testable. |
| packages/server/src/server/websocket-server.ts | syncBrowserToolsClientRegistration is called both inside the capability-change block and unconditionally on resume; the function's early-return guard handles the double-call correctly. Unregister on disconnect and server close are both wired. |
| packages/desktop/src/features/browser-automation/service.ts | Large command-handler table. Upload paths are workspace-scoped, download filenames are basename()-sanitized, URL schemes are validated. Tab resolution logic is consistent across all commands. |
| packages/app/src/components/browser-webview-parking.electron.ts | Module-level Map state with a clearParkedBrowserWebviewsForTests export — a test-support function exported from production code, which violates the project's test-discipline rules. |
| packages/server/src/server/browser-tools/mcp-tools.ts | Comprehensive MCP tool registrations with workspace-scoped context resolution, structured output schema, and image content for screenshot results. Policy enforcement is correctly delegated to the broker. |
| packages/server/src/server/bootstrap.ts | BrowserToolsBroker wired from bootstrap into both WebSocket server and MCP server. browserToolsEnabled is snapshotted at agent creation time (documented, intentional behavior). |
| packages/protocol/src/messages.ts | BrowserAutomationExecuteRequest/Response added to outbound/inbound message discriminated unions; WSHelloMessageSchema updated with desktopBrowserAutomation capability; browserTools added to MutableDaemonConfigSchema with default. |
Sequence Diagram
sequenceDiagram
participant Agent as MCP Agent
participant MCPTools as mcp-tools.ts (server)
participant Broker as BrowserToolsBroker
participant WSServer as WebSocketServer
participant AppClient as App (host-runtime)
participant Handler as handler.ts
participant IPC as Electron IPC
participant Service as service.ts
Agent->>MCPTools: "browser_* tool call"
MCPTools->>Broker: execute(command)
Broker->>Broker: policy.isEnabled()?
alt disabled
Broker-->>MCPTools: browser_disabled
else enabled
Broker->>WSServer: sendBrowserAutomationRequest via registered client
WSServer->>AppClient: browser.automation.execute.request (WS message)
AppClient->>Handler: handleBrowserAutomationRequest
Handler->>IPC: executeAutomationCommand (Electron IPC)
IPC->>Service: executeAutomationCommand(request, registry)
Service-->>IPC: AutomationCommandPayload
IPC-->>Handler: response
Handler->>AppClient: sendBrowserAutomationExecuteResponse
AppClient->>WSServer: browser.automation.execute.response (WS message)
WSServer->>Broker: receiveResponse
Broker-->>MCPTools: BrowserToolsResponsePayload
MCPTools-->>Agent: CallToolResult
end
Reviews (15): Last reviewed commit: "Return browser failure when send fails" | Re-trigger Greptile
|
Fixed the remaining Greptile blocker in 38c8918: broker response handling now uses safeParse, clears the pending request, and returns a structured browser_unknown_error when the desktop sends an invalid browser automation response. Added regression coverage in broker.test.ts. |
|
Fixed the latest Greptile blocker in 7db3153: BrowserToolsBroker now resolves WebSocket send failures as structured browser_unknown_error payloads and clears pending state instead of rejecting into raw MCP errors. Added regression coverage in broker.test.ts. |
Linked issue
None.
Type of change
What does this PR do
Adds an explicit opt-in for browser tools and lets local agents work with Paseo desktop browser tabs through the daemon MCP surface.
When browser tools are disabled,
browser_*tools are absent from newly-created MCP tool lists instead of appearing and returning an error. MCP sessions that were already initialized may retain their previous tool list until reinitialized, but execution is still blocked bybrowser_disabled.Architecture review follow-up: browser automation is now advertised through one explicit app-provided capability, not client-package Electron detection or split read/interaction caps. The app mounts browser automation on the host-runtime client lifecycle, outside React session rendering. The settings opt-in uses a React Query mutation with pending and visible error states.
When enabled, agents can create and target desktop browser tabs, inspect page state, interact with refs, navigate, capture screenshots/PDFs, read logs/storage, set viewport/geolocation, set page background color, and exercise upload/download flows. Browser ownership is scoped to the workspace so inactive workspace tabs stay targetable without relying on global focus.
Post-test fixes made the product path usable for agents instead of requiring them to guess a tab target:
browser_new_tabcreates and opens a usable browser tab for the calling workspace.browserIdvalues and tell agents not to usedefault.browser_new_tab.browser_set_backgroundsupports page background changes.browser_timeoutif both visible-tab and fallback registration fail, instead of returning success with a deadbrowserId.browser_screenshotandbrowser_full_page_screenshotnow expose PNG image content to agents, not just dimensions/status text. Viewport screenshots use the same CDP capture path as full-page screenshots when available.How did you verify it
Local validation passed:
npm run formatnpm run typechecknpm run lintbrowser_new_tabvisible registration success, fallback registration success, and fallback registration timeout failure.Real desktop QA completed through an isolated daemon and a dedicated QA Electron, without touching the main daemon or user app:
world cup: the direct search URL hit Google/sorry, but the Google homepage UI flow reached a real results page titledworld cup - Tìm trên Googlewith World Cup, Wikipedia, and FIFA links.7.8k.example.combackground was changed to red.localhost:8081console logs were captured.PASEO_WEB_PLATFORM=electron: the in-app Browser panel showed thehttps://example.com/URL bar/webview chrome instead ofBrowser is desktop-only, and a fresh real agent listed the registeredhttps://example.com/tab, calledbrowser_screenshot, received image content, and described the visibleExample Domainpage from the screenshot pixels.Risk surface
browser_new_tabafter registration timeout may leave hidden fallback webviews or extra browser tabs until normal app cleanup handles them.daemon.browserTools.enabled, persisted in daemon config, and exposed in settings. The pure config seam is unit-tested; the full settings click path was not re-run during PR body update.Checklist
npm run typecheckpassesnpm run lintpassesnpm run formatran (Biome)