Skip to content

feat: low-level interaction endpoints (mouse-wheel, init-script, capture-network, capture-requests)#4210

Open
nayrosk wants to merge 3 commits into
jo-inc:masterfrom
nayrosk:feat/mouse-wheel-endpoint
Open

feat: low-level interaction endpoints (mouse-wheel, init-script, capture-network, capture-requests)#4210
nayrosk wants to merge 3 commits into
jo-inc:masterfrom
nayrosk:feat/mouse-wheel-endpoint

Conversation

@nayrosk

@nayrosk nayrosk commented May 24, 2026

Copy link
Copy Markdown

Summary

Four new endpoints that fill gaps in the existing interaction surface, all needed in practice to drive modern anti-automation web apps (e.g. Instagram DMs):

  1. POST /tabs/:tabId/mouse-wheel — real page.mouse.wheel() dispatched at a specific element or coordinate
  2. POST /tabs/:tabId/init-script — wraps page.addInitScript() so a hook runs on every navigation before any page script
  3. POST /tabs/:tabId/capture-network — wraps page.on("response") for a bounded duration, captures matching response bodies at the browser network layer (above the Service Worker, above any in-page closure)
  4. POST /tabs/:tabId/capture-requests — wraps page.on("request") for the symmetric request side, exposing URL + method + POST body + headers (essential when the page bundle hides outgoing payloads behind closure-cached primitives)

Each addresses a distinct interception level. Together they cover the full stack: DOM events → page lifecycle → network (both directions).


1. /tabs/:tabId/mouse-wheel

POST /tabs/:tabId/mouse-wheel
{
  "userId": "...",
  "ref": "e22",          // optional: element ref → wheel at bbox centre
  "x": 810, "y": 363,    // optional: explicit page coords (ignored if ref is set)
  "deltaX": 0,
  "deltaY": -1500
}
→ { "ok": true, "x": 810, "y": 363, "deltaX": 0, "deltaY": -1500 }

The existing /scroll endpoint calls mouse.wheel() without prior cursor positioning, so it targets wherever the cursor happens to be — too coarse for nested scrollable containers. Some sites (notably Instagram DMs) virtualise their message lists and ignore both programmatic scrollTop and JS-dispatched WheelEvents; only a real wheel at the inner container's coordinates triggers their lazy load.

Three coordinate modes (priority): ref > (x, y) > viewport centre.

  • Ref handling: same path as /clickrefToLocator, falls back to refreshTabRefs with pre_wheel reason, throws StaleRefsError if still unresolvable
  • Concurrency: wrapped in withTabLock
  • Settle delays: 50 ms after move, 300 ms after wheel
  • Plugin event: emits tab:mouse-wheel
  • OpenAPI: annotated

2. /tabs/:tabId/init-script (authMiddleware)

POST /tabs/:tabId/init-script
{
  "userId": "...",
  "script": "window.__caps = []; const origFetch = window.fetch; window.fetch = ..."
}
→ { "ok": true, "scriptLen": 815 }

Wraps page.addInitScript({ content: script }). The script is evaluated in the page world before any other script on every navigation in the tab. Useful for hooks that must beat first-byte JS (e.g. install a fetch wrapper before the bundle imports it).

  • Auth-gated (arbitrary JS in the page world)
  • 256 KB body limit
  • Single-script per call; multiple calls stack

3. /tabs/:tabId/capture-network (authMiddleware)

POST /tabs/:tabId/capture-network
{
  "userId": "...",
  "urlPattern": "graphql",    // optional, default "graphql" (regex, case-insensitive)
  "durationMs": 15000,        // capped at 60000
  "maxBodyBytes": 1000000,    // per capture
  "maxCaptures": 100
}
→ { "ok": true, "captureCount": 35, "captures": [ { "url", "status", "len", "body" }, ... ] }

Attaches page.on("response") for durationMs, then detaches and returns every matching response. Operates at the browser network layer, so it captures:

  • Fetches from window.fetch references the page bundle cached before any in-page hook had a chance
  • Fetches routed through a Service Worker
  • XHR fetches that bypass monkey-patched prototypes

In practice this is the only reliable way to observe outgoing API traffic of SPAs that aggressively cache primitives at bundle init time.

  • Auth-gated (response bodies may contain sensitive data)
  • Per-capture and total-count caps prevent unbounded memory use

4. /tabs/:tabId/capture-requests (authMiddleware)

POST /tabs/:tabId/capture-requests
{
  "userId": "...",
  "urlPattern": "graphql",    // optional, default "graphql" (regex, case-insensitive)
  "durationMs": 15000,        // capped at 60000
  "maxBodyBytes": 200000,     // per capture
  "maxCaptures": 100,
  "includeHeaders": true      // optional, default true
}
→ {
    "ok": true,
    "captureCount": 36,
    "captures": [
      { "url", "method", "len", "body", "headers": { ... } },
      ...
    ]
  }

Symmetric counterpart to /capture-network: page.on("request") instead of page.on("response"). Returns POST body + headers for each matching request, captured at the browser network layer.

Motivation: /capture-network reveals what data the page receives, but is silent on what the page sends. Without the outgoing payload (CSRF tokens, doc IDs, pagination cursors, signed query params), it's impossible to replay or extend a captured GraphQL operation from outside the page. Hook-based approaches (window.fetch override, XMLHttpRequest.prototype.send override) fail against bundles that cache the original references at module init — verified empirically against Instagram's web client. Only a Playwright-level listener catches every outbound request.

  • Auth-gated (request bodies / cookies / signed tokens are sensitive)
  • headers returns the resolved request headers including cookie, x-csrftoken, x-fb-lsd, x-fb-friendly-name, etc. Set includeHeaders: false to omit them
  • Per-capture and total-count caps prevent unbounded memory use
  • Mirror code path of /capture-network (same handler shape, same lifecycle, same cleanup) for review symmetry

Verification

All four endpoints exercised on an Instagram DM thread that:

  • Virtualises its message list (defeats /scroll's page-level wheel)
  • Bundles React with a fetch reference captured at module init (defeats any in-page hook on either window.fetch or XMLHttpRequest.prototype)
  • Uses a Service Worker for /api/graphql (defeats most network observers)

Results:

  • /mouse-wheel at the container's centre coords with deltaY=-1500: scrollHeight grew from 1230 → 5382 in 8 batches (lazy load triggered)
  • /init-script installed a fetch wrapper that captured a manual fetch("/api/graphql") call (verified hook installation), but missed all 35 of the page's own GraphQL fetches (confirms the closure-cache problem)
  • /capture-network for 25 s while navigating to the thread: captured all 35 GraphQL responses (827 KB), including the one carrying the message list
  • /capture-requests for 15 s while refreshing the thread: captured 36 GraphQL requests with full POST bodies + headers, including one IGDMessageListOffMsysQuery with its doc_id, fb_dtsg, lsd and pagination variables. The captured body was then used as a template — replacing only the variables.after cursor — to drive a full 14-batch pagination loop through the same /api/graphql endpoint, yielding 286 unique messages (the complete thread history). With only /capture-network, the same extraction would have stopped at the first 20 messages.

Also verified:

  • node --check server.js passes
  • All four endpoints return 400 on missing userId
  • 404 on unknown tabId

nayrosk added 2 commits May 24, 2026 23:54
Some sites (notably Instagram DMs) virtualise nested scrollable
containers and ignore both programmatic scrollTop and dispatched
WheelEvents -- only a real OS-level wheel event at the container's
coordinates triggers their lazy load. The existing /scroll endpoint
dispatches at the page level via mouse.wheel without prior cursor
positioning, which is too coarse for this case.

This adds POST /tabs/:tabId/mouse-wheel with three coordinate modes:
  - ref:     element ref resolved to its bounding-box centre
  - x, y:    explicit page coordinates
  - default: viewport centre

Mirrors the conventions of /scroll and /click (no auth middleware,
withTabLock, refToLocator + refreshTabRefs fallback, StaleRefsError,
pluginEvents.emit) and ships with the matching @openapi annotation
so it appears in /docs and /openapi.json.
…ption

Modern SPA frameworks (React/Next/etc.) often capture `fetch` and
`XMLHttpRequest` references at bundle init time, before any user
script can run. In-page monkey-patches via /evaluate are bypassed
because the bundle holds its own references. Even page.addInitScript
hooks installed before the document loads can be defeated by
Service Workers that intercept fetches before they reach the page.

Two endpoints address these gaps:

POST /tabs/:tabId/init-script
  Wraps Playwright's page.addInitScript(). The script is evaluated
  before any other script on every navigation in the tab's page
  context. Useful for hooks that must beat first-byte JS.

POST /tabs/:tabId/capture-network
  Wraps page.on("response") for a bounded duration, returning every
  response whose URL matches a regex (default /graphql/i). Operates
  at the browser network layer, above the Service Worker and above
  any in-page JS, so it cannot be bypassed by either. Bodies are
  capped per-capture and per-count to keep responses manageable.

Both endpoints follow the existing conventions:
  - authMiddleware (sensitive: arbitrary script / response bodies)
  - withTabLock for the navigate-equivalent operations
  - emit tab:init-script / tab:capture-network plugin events
  - standard error handling via handleRouteError
@nayrosk nayrosk changed the title feat: POST /tabs/:tabId/mouse-wheel for element-scoped wheel events feat: low-level interaction endpoints (mouse-wheel, init-script, capture-network) May 24, 2026
…capture

Mirrors /capture-network but uses page.on("request") instead of
page.on("response"), exposing the request URL, method, POST body, and
headers. Useful when a page bundle captures fetch/XHR references in a
closure before any user JS runs (e.g. Instagram), bypassing window-level
hooks installed via evaluate.

Body params identical to /capture-network plus an includeHeaders flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nayrosk nayrosk changed the title feat: low-level interaction endpoints (mouse-wheel, init-script, capture-network) feat: low-level interaction endpoints (mouse-wheel, init-script, capture-network, capture-requests) May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant