Skip to content

feat(core): 2-stage fetch with output handles for large-output tools (apify-mcp adoption A) #887

@shaun0927

Description

@shaun0927

Tier: core (P2-compliant additive: existing response shape preserved by default)
PR target: develop
Series: apify-mcp adoption A

Background

apify-mcp-server invokes Actors with call-actor, returns a run-id + truncated preview, and the agent retrieves the full payload via a separate get-actor-output tool with offset/limit. The pattern exists for one reason: prevent large tool responses from blowing up the host LLM's context window.

OpenChrome has several tools whose response can exceed 50–500 KB of JSON in one shot:

  • oc_evidence_bundle (src/tools/oc-evidence-bundle.ts) — bundle descriptor + (currently) inlined bytes when small.
  • crawl (src/tools/crawl.ts) — full crawl tree.
  • network (src/tools/network.ts) — network slice JSON.
  • read_page (src/tools/read-page.ts) — sanitized DOM/text.
  • extract_data (src/tools/extract-data.ts) — extracted records.

When the agent only needs a summary or first N records, the entire payload still occupies the LLM context for the rest of the conversation. With prompt-cache TTL of 5 minutes and Sonnet/Haiku models being the common deployment target, this is the single highest leverage cost-cutting opportunity OpenChrome has not yet taken. Apify's pattern translates directly: the data already lives in ~/.openchrome/trace/ (JSONL trace storage, src/core/trace/storage.ts) — we just need an opt-in handle response mode plus a fetch tool.

Portability-Harness Contract alignment:

  • P1 (tool-server identity): handles are deterministic file paths; no background work added.
  • P2 (zero-impact extension): default mode remains inline; v1.11.0 callers see byte-identical responses.
  • P3 (anywhere-compatible): no new native deps, no network calls.
  • P4 (facts not decisions): pure storage/retrieval; no LLM judgment.

Proposed Implementation

Tool surface changes

Add an optional output_mode input parameter to the five tools above:

output_mode?: 'inline' | 'handle' | 'auto'; // default 'inline' (v1.11.0-identical)
output_inline_limit_bytes?: number;          // only honored when output_mode='auto'; default 32768

Semantics:

  • inline (default): current behavior, byte-identical to v1.11.0.
  • handle: response payload is written to trace storage; the tool returns a small descriptor only.
  • auto: behaves like inline if the serialized payload ≤ output_inline_limit_bytes, otherwise spills to handle.

Handle response shape (uniform across all five tools):

{
  "output_handle": "oh_<base32-12>",
  "mime_type": "application/json" | "application/gzip" | "text/markdown",
  "size_bytes": 184320,
  "item_count": 142,
  "preview": "<first ≤ 2048 bytes of UTF-8-safe content OR null for binary>",
  "expires_at": "2026-05-19T12:34:56Z",
  "fetch_with": "oc_output_fetch"
}

New tool: oc_output_fetch

Add src/tools/oc-output-fetch.ts exposing:

{
  name: 'oc_output_fetch',
  inputSchema: {
    output_handle: string,         // REQUIRED
    offset?: number,               // default 0
    limit?: number,                // default 65536 bytes for blobs, 200 items for JSON arrays
    format?: 'bytes' | 'items' | 'auto'  // default 'auto' — JSON array → items, blob → bytes
  }
}

Returns:

{
  "output_handle": "oh_...",
  "offset": 0,
  "limit": 200,
  "returned": 200,
  "total": 1420,
  "next_offset": 200 | null,
  "content": <items | base64-bytes>,
  "eof": false
}

Storage

  • Handles live under ~/.openchrome/output/<YYYY-MM-DD>/<output_handle>.{json|bin} (atomic write via existing src/utils/atomic-file.ts).
  • TTL: 24 hours (configurable via --output-handle-ttl-hours, default 24).
  • DiskMonitor sweep extended to prune output/ directory (reuse existing 5-minute sweeper in src/core/trace/storage.ts).
  • Trace storage records every handle creation as a output_handle_created event so handles are discoverable via oc_journal.

Tier registration

oc_output_fetch is Tier 1 (always exposed) — without it, agents have no way to redeem handles.

Files touched

  • New: src/tools/oc-output-fetch.ts, src/core/output/handle-store.ts, tests/core/output-handles.test.ts.
  • Modified: src/tools/oc-evidence-bundle.ts, src/tools/crawl.ts, src/tools/network.ts, src/tools/read-page.ts, src/tools/extract-data.ts — add output_mode plumbing in input schema and result branching.
  • Modified: src/tools/index.ts — register oc_output_fetch in registerAllTools() (after the existing core tools at the head of the registry).
  • Modified: src/config/tool-tiers.ts — add oc_output_fetch at tier 1 in TOOL_TIERS (the type is 1 | 2 | 3).
  • Modified: src/index.ts — add two new CLI flags: --output-handle-ttl-hours <hours> (default 24) and --output-handle-sweep-interval-seconds <seconds> (default 300, lowerable for tests). Both flags are net-new and shipped by this PR; no fallback is required.
  • Modified: src/journal/task-journal.ts (and src/tools/journal.ts if needed) — extend the journal to record output_handle_created events alongside the existing tool-call entries. This is a small but real extension: today the journal records tool invocations, not arbitrary events. Acceptance criteria below cover the schema.

Acceptance Criteria

  • oc_output_fetch registered as Tier 1; TOOL_TIERS snapshot test updated.
  • All five target tools accept output_mode and output_inline_limit_bytes inputs without breaking existing callers (defaults preserve byte-identical v1.11.0 responses).
  • When output_mode='handle', response strictly matches the handle response shape above (validated by a Zod schema kept in src/tools/_shared/output-handle.schema.ts).
  • Handles persist under ~/.openchrome/output/<date>/ with atomic writes; DiskMonitor prunes expired handles in the next 5-minute sweep tick after expiry.
  • oc_output_fetch supports offset/limit pagination for both JSON-array payloads (item-based) and blob payloads (byte-range); next_offset is null exactly when eof=true.
  • Default-mode snapshot test in tests/core/registration-default.snapshot.ts (or extension thereof) shows that tools/list differs from v1.11.0 only by the addition of oc_output_fetch.
  • Journal infrastructure extended to record output_handle_created events. New event schema: {event: "output_handle_created", handle, source_tool, size_bytes, mime_type, ts}. oc_journal action="recent" surfaces them in the same list as tool-call entries.
  • Handle redaction: any handle created from a tool that had content-sanitization applied (--no-sanitize-content not set) preserves the sanitized payload on disk; raw content is never stored unsanitized.
  • npm run build && npm test && npm run lint:tier green.
  • PR body documents tier:core and lists every modified tool name.

Verification (post-merge, via openchrome MCP)

A tester instance of openchrome (instance T) drives a target instance (instance A) through MCP tools/list + tool calls.

Setup

npm ci && npm run build
INSTANCE_A_PORT=9881
node dist/index.js --http "$INSTANCE_A_PORT" &
A_PID=$!
mcpA() {
  curl -s -H 'content-type: application/json' \
    -d "$1" "http://localhost:$INSTANCE_A_PORT/mcp"
}

Scenario 1 — default mode is byte-identical to v1.11.0

# Navigate then read_page WITHOUT output_mode
mcpA '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"navigate","arguments":{"url":"https://example.com"}}}'
RESP=$(mcpA '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"read_page","arguments":{}}}')
echo "$RESP" | jq -e '.result.content[0].text | fromjson | has("output_handle") | not' >/dev/null \
  && echo OK || { echo "FAIL: default mode unexpectedly returned a handle"; exit 1; }

Pass: OK. Default callers never see the new envelope.

Scenario 2 — output_mode='handle' returns a redeemable descriptor

RESP=$(mcpA '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"read_page","arguments":{"output_mode":"handle"}}}')
HANDLE=$(echo "$RESP" | jq -r '.result.content[0].text | fromjson | .output_handle')
SIZE=$(echo "$RESP" | jq -r '.result.content[0].text | fromjson | .size_bytes')
PREVIEW_LEN=$(echo "$RESP" | jq -r '.result.content[0].text | fromjson | .preview | length')
[[ "$HANDLE" =~ ^oh_[A-Z0-9]{12}$ ]] || { echo "FAIL: handle format"; exit 1; }
[[ "$SIZE" -gt 0 ]] || { echo "FAIL: size_bytes zero"; exit 1; }
[[ "$PREVIEW_LEN" -le 2048 ]] || { echo "FAIL: preview exceeds 2048 bytes"; exit 1; }
echo "OK handle=$HANDLE size=$SIZE preview_len=$PREVIEW_LEN"

Pass: handle matches regex, size > 0, preview ≤ 2048 bytes.

Scenario 3 — oc_output_fetch paginates correctly

# Crawl a known small site, then redeem the handle in pages of 10 items
mcpA '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"crawl","arguments":{"url":"https://example.com","max_pages":3,"output_mode":"handle"}}}' > /tmp/crawl.json
H=$(jq -r '.result.content[0].text | fromjson | .output_handle' /tmp/crawl.json)
PAGE1=$(mcpA "{\"jsonrpc\":\"2.0\",\"id\":5,\"method\":\"tools/call\",\"params\":{\"name\":\"oc_output_fetch\",\"arguments\":{\"output_handle\":\"$H\",\"offset\":0,\"limit\":10}}}")
RETURNED=$(echo "$PAGE1" | jq -r '.result.content[0].text | fromjson | .returned')
TOTAL=$(echo "$PAGE1"   | jq -r '.result.content[0].text | fromjson | .total')
NEXT=$(echo "$PAGE1"    | jq -r '.result.content[0].text | fromjson | .next_offset')
[[ "$RETURNED" -le 10 ]] && [[ "$TOTAL" -ge "$RETURNED" ]] && echo "OK returned=$RETURNED total=$TOTAL next=$NEXT" \
  || { echo "FAIL pagination math"; exit 1; }

Pass: returned ≤ limit, total ≥ returned, next_offset consistent with eof.

Scenario 4 — unknown handle returns structured error, not a stack trace

RESP=$(mcpA '{"jsonrpc":"2.0","id":6,"method":"tools/call","params":{"name":"oc_output_fetch","arguments":{"output_handle":"oh_DEADBEEF0000"}}}')
echo "$RESP" | jq -e '.result.isError == true and (.result.content[0].text | fromjson | .error.code == "output_handle_not_found")' >/dev/null \
  && echo OK || { echo "FAIL: missing or malformed not-found error"; exit 1; }

Pass: OK. Error is structured (error.code='output_handle_not_found').

Scenario 5 — handle response size invariant

The real invariant is that the handle response itself is small and bounded, regardless of underlying payload size. Test against a fixture site committed at tests/fixtures/sites/many-pages/ (a local httpbin-style server serving ≥ 30 internal pages) so the test is hermetic.

# Start local fixture server (port 9991)
node tests/fixtures/sites/many-pages/serve.mjs &
FIX_PID=$!
sleep 0.5

# Inline mode (baseline for documentation only)
INLINE_BYTES=$(mcpA '{"jsonrpc":"2.0","id":7,"method":"tools/call","params":{"name":"crawl","arguments":{"url":"http://localhost:9991/","max_pages":30}}}' | wc -c)

# Handle mode
HANDLE_BODY=$(mcpA '{"jsonrpc":"2.0","id":8,"method":"tools/call","params":{"name":"crawl","arguments":{"url":"http://localhost:9991/","max_pages":30,"output_mode":"handle"}}}')
HANDLE_BYTES=$(echo "$HANDLE_BODY" | wc -c)

kill $FIX_PID; wait $FIX_PID 2>/dev/null
echo "inline=$INLINE_BYTES handle=$HANDLE_BYTES"
[[ "$HANDLE_BYTES" -le 4096 ]] || { echo "FAIL: handle response exceeds 4 KB"; exit 1; }
echo OK

Record both numbers in scripts/verify/A-output-handles-bytes.txt.
Pass: handle-mode response body ≤ 4 KB. (Descriptor + 2 KB preview cap means typical handle responses are 2.5–3.5 KB; the 4 KB ceiling is the actual invariant. The inline number is recorded for documentation only — it depends on workload and is not a pass/fail gate.)

Scenario 6 — TTL eviction (uses the two new sweep flags shipped by this PR)

kill $A_PID; wait $A_PID 2>/dev/null
node dist/index.js --http "$INSTANCE_A_PORT" \
  --output-handle-ttl-hours 0 \
  --output-handle-sweep-interval-seconds 2 &
A_PID=$!
sleep 1

# Create a handle
HANDLE=$(mcpA '{"jsonrpc":"2.0","id":11,"method":"tools/call","params":{"name":"read_page","arguments":{"output_mode":"handle"}}}' \
  | jq -r '.result.content[0].text | fromjson | .output_handle')

# Wait for one full sweep cycle
sleep 3

# Redeem should now fail with structured not-found
RESP=$(mcpA "{\"jsonrpc\":\"2.0\",\"id\":12,\"method\":\"tools/call\",\"params\":{\"name\":\"oc_output_fetch\",\"arguments\":{\"output_handle\":\"$HANDLE\"}}}")
echo "$RESP" | jq -e '.result.isError == true and (.result.content[0].text | fromjson | .error.code == "output_handle_not_found")' >/dev/null \
  && echo OK || { echo "FAIL: handle not evicted after TTL"; exit 1; }

Pass: redeem after TTL returns output_handle_not_found. Handle file is gone from ~/.openchrome/output/. Both flags above are new in this PR.

Scenario 7 — oc_journal records handle creation

The journal tool today accepts {action: "summary" | "recent", count}. This PR extends action="recent" to include output_handle_created entries alongside tool-call entries (see Acceptance Criteria).

JOURNAL=$(mcpA '{"jsonrpc":"2.0","id":9,"method":"tools/call","params":{"name":"oc_journal","arguments":{"action":"recent","count":50}}}')
# The journal returns either text or structured JSON depending on the action; for "recent" with the
# extension this PR ships, it returns a JSON array. Either parse from .result.content[0].text directly
# or via fromjson if the server text-wraps it (handled by both branches below).
TEXT=$(echo "$JOURNAL" | jq -r '.result.content[0].text')
echo "$TEXT" | jq -e 'if type=="array" then .[] else . end | select(.event=="output_handle_created")' >/dev/null 2>&1 \
  || (echo "$TEXT" | grep -q 'output_handle_created') \
  && echo OK || { echo "FAIL: no output_handle_created event in journal"; exit 1; }

Pass: OK.

Issue closure criteria

Scenarios 1–4 and 7 pass strictly; Scenario 5 ratio recorded and ≤ 0.10; Scenario 6 confirmed manually. Reproducer script at scripts/verify/A-output-handles.mjs.

Out of scope

  • Streaming responses (chunked transfer over MCP). The handle pattern is pull-based; streaming is a separate concern tracked under feat(observability): emit replay scripts as a tool-call byproduct (--codegen puppeteer|playwright|mcp-replay) #836-family observability work.
  • Cross-host handle exchange (HTTP daemon serving multiple clients sharing a handle). For v1.x, handles are scoped to the instance that created them.
  • Compression of inline responses. output_mode='handle' is the prescribed lever; inline-mode bytes are unchanged.
  • Auto-conversion of read_page to handle mode for sanitized small pages — mode='auto' covers it.

Dependencies

References

  • apify/apify-mcp-servercall-actor + get-actor-output pattern (Apache-2.0).
  • docs/roadmap/portability-harness-contract.md — P1, P2, P4 alignment.
  • src/core/trace/storage.ts — existing JSONL store and DiskMonitor sweep loop.
  • src/utils/atomic-file.ts — atomic write primitive reused for handle persistence.
  • Internal comparison analysis (chat thread, 2026-05-12).

Revision history

  • 2026-05-12 r1: Initial draft.

OpenChrome 실검증 체크리스트

2026-05-14 재검증 완료. 최신 origin/develop 코드, targeted Jest/lint, OpenChrome CLI 실호출, localhost fixture 산출물로 직접 확인 가능한 항목만 close 근거로 사용했다.

검증 대상

검증 증거

  • npm run build 통과.
  • npm run lint:tier 통과: 521 modules / 1239 dependencies, no dependency violations.
  • npm run lint:tool-schemas 통과: 82 baselined violations, 0 new.
  • targeted Jest 통과: 38 passed / 1 skipped suites, 436 passed / 1 skipped tests.
  • OpenChrome CLI 실호출: oc_connection_health connected, localhost fixture navigate 성공.
  • OpenChrome tools/list introspection에서 관련 default 또는 pilot-gated tool surface 존재 확인.
  • 대표 bounded diagnostic 호출이 구조화된 성공/오류 응답을 반환함을 확인.

이슈별 코드/테스트 근거

  • 관련 구현/문서/테스트 파일이 최신 트리에 존재하고 targeted 검증에 포함됨:
    • src/tools/oc-output-fetch.ts
    • src/core/output/handle-store.ts
    • src/tools/_shared/output-handle.schema.ts
    • tests/core/output-handles.test.ts

산출물

  • 증거 로그: .omx/reverify-evidence/targeted-jest.log
  • 증거 로그: .omx/reverify-evidence/lint-tier.log
  • 증거 로그: .omx/reverify-evidence/lint-tool-schemas.log
  • 증거 로그: .omx/reverify-evidence/openchrome-live-smoke.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requesthost-integrationWires module cores into host (CDP, MCP, tools, transports, OS APIs)performancePerformance, latency, throughput, or resource-use improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions