Tier: pilot / optional integration (external service only; core stays dependency-light)
PR target: develop
Series: OmniParser adoption B
Priority: P2 — useful for canvas/native-like UIs after the perception snapshot contract exists
Depends on: OmniParser adoption A (PerceptionSnapshot provider contract)
Related / Sequencing
Background
OmniParser's strongest transferable capability is screenshot-only UI grounding: given a screenshot, it returns structured text/icon elements with bounding boxes and captions. OpenChrome should support that capability as an optional provider, not as a bundled dependency.
This issue adds an OmniParser-compatible HTTP adapter that talks to an already-running parser service. The adapter must be safe-by-default: disabled unless configured, bounded by timeout/budget, and never required for normal OpenChrome operation.
Proposed Implementation
Add an optional perception provider:
src/vision/providers/omniparser-http-provider.ts
- config/env support in existing config surfaces:
OPENCHROME_VISION_PROVIDER=dom|omniparser-http
OPENCHROME_OMNIPARSER_URL=http://127.0.0.1:8000/parse/
OPENCHROME_OMNIPARSER_TIMEOUT_MS=3000 default or lower if global deadline is tighter
OPENCHROME_OMNIPARSER_MAX_ELEMENTS=200
- provider health/diagnostic metadata in
vision_find(format='snapshot'|'both')
Adapter behavior
- Capture screenshot through existing guarded screenshot path.
- POST
{ "base64_image": "..." } to the configured OmniParser-compatible endpoint.
- Accept response shapes compatible with OmniParser:
parsed_content_list[]
- optional
som_image_base64
- optional
latency
- Convert
bbox ratio coordinates to PerceptionElement.bbox CSS pixels and bboxRatio.
- Map element types:
- OmniParser
text -> text, interactive=false unless response says otherwise
- OmniParser
icon -> icon or control when interactivity is true/known
- Truncate labels and element count.
- On timeout/unavailable/malformed response, return a warning and fall back to the DOM provider when fallback is enabled.
Fallback policy
- Default provider remains
dom.
- If
omniparser-http is configured and fails, vision_find should return a clear warning and optionally include DOM fallback results.
- A failing external parser must not fail unrelated MCP tools or session creation.
Non-goals
- Do not vendor Microsoft OmniParser code, model weights, Python, Torch, or Docker setup.
- Do not make network calls to public third-party parser services by default.
- Do not add a built-in Windows VM or OmniTool clone.
- Do not use visual-only candidates for automatic clicking in this issue.
Acceptance Criteria
Verification (post-merge, via OpenChrome MCP)
Record artifacts under scripts/verify/omniparser-adoption-B-http-provider/.
Setup
npm ci
npm run build
mkdir -p scripts/verify/omniparser-adoption-B-http-provider
node tests/fixtures/sites/vision-perception/serve.mjs &
FIX_PID=$!
node tests/fixtures/omniparser-mock/server.mjs --port 9901 --mode success > /tmp/omniparser-mock.log 2>&1 &
MOCK_PID=$!
PORT=9892
OPENCHROME_VISION_PROVIDER=omniparser-http \
OPENCHROME_OMNIPARSER_URL=http://127.0.0.1:9901/parse/ \
node dist/index.js --http "$PORT" > /tmp/openchrome-omniparser-provider.log 2>&1 &
OC_PID=$!
sleep 1
mcp() { curl -s -H 'content-type: application/json' -d "$1" "http://localhost:$PORT/mcp"; }
Mock parser requirements:
/parse/ returns two elements: Search icon and Continue button, with ratio bboxes.
- It can switch to timeout/malformed mode through a test endpoint or restart flag.
Scenario 1 — configured provider returns OmniParser-sourced elements
mcp '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"navigate","arguments":{"url":"http://localhost:9991/perception.html"}}}' >/tmp/oc-nav-B.json
TAB=$(jq -r '.result.content[0].text | fromjson | .tabId' /tmp/oc-nav-B.json)
RESP=$(mcp "$(jq -nc --arg tab "$TAB" '{jsonrpc:"2.0",id:2,method:"tools/call",params:{name:"vision_find",arguments:{tabId:$tab,format:"snapshot",includeImage:false}}}')")
echo "$RESP" | tee scripts/verify/omniparser-adoption-B-http-provider/omniparser-success.json
BODY=$(echo "$RESP" | jq -r '.result.content[0].text | fromjson? // .result.content[0].text')
echo "$BODY" | jq -e '.provider == "omniparser-http" and any(.elements[]; .source == "omniparser-http" and (.label | test("Continue|Search")))' >/dev/null
Pass: snapshot is provider-marked and includes mock OmniParser elements.
Scenario 2 — parser timeout falls back without crashing
curl -s -X POST http://127.0.0.1:9901/mode/timeout >/dev/null
RESP2=$(mcp "$(jq -nc --arg tab "$TAB" '{jsonrpc:"2.0",id:3,method:"tools/call",params:{name:"vision_find",arguments:{tabId:$tab,format:"snapshot",includeImage:false}}}')")
echo "$RESP2" | tee scripts/verify/omniparser-adoption-B-http-provider/omniparser-timeout-fallback.json
BODY2=$(echo "$RESP2" | jq -r '.result.content[0].text | fromjson? // .result.content[0].text')
echo "$BODY2" | jq -e '(.warnings | length) >= 1 and (.provider == "dom-annotator" or any(.elements[]; .source == "dom-annotator"))' >/dev/null
Pass: timeout is visible in warnings and OpenChrome returns a fallback result instead of a server failure.
Scenario 3 — provider is opt-in
kill $OC_PID
wait $OC_PID 2>/dev/null || true
PORT=9893
node dist/index.js --http "$PORT" > /tmp/openchrome-omniparser-provider-default.log 2>&1 &
OC_PID=$!
sleep 1
RESP3=$(curl -s -H 'content-type: application/json' -d "$(jq -nc --arg tab "$TAB" '{jsonrpc:"2.0",id:4,method:"tools/call",params:{name:"vision_find",arguments:{tabId:$tab,format:"snapshot",includeImage:false}}}')" "http://localhost:$PORT/mcp")
echo "$RESP3" | tee scripts/verify/omniparser-adoption-B-http-provider/default-provider.json
! grep -q "omniparser-http" scripts/verify/omniparser-adoption-B-http-provider/default-provider.json
Pass: without env/config, the default provider is not OmniParser.
Cleanup
kill $MOCK_PID $FIX_PID $OC_PID
wait $MOCK_PID $FIX_PID $OC_PID 2>/dev/null || true
Directionality / Fit Check
This adds only an adapter boundary. It intentionally avoids making OmniParser a transitive dependency or changing OpenChrome's default lightweight browser harness behavior.
Curated scope, overlap handling, and verification checklist
Scope classification
Overlap and conflict resolution
Implementation checklist
Success criteria
Post-merge OpenChrome live verification checklist
Tier:
pilot/ optional integration (external service only; core stays dependency-light)PR target:
developSeries: OmniParser adoption B
Priority: P2 — useful for canvas/native-like UIs after the perception snapshot contract exists
Depends on: OmniParser adoption A (
PerceptionSnapshotprovider contract)Related / Sequencing
PerceptionSnapshotcontract.Background
OmniParser's strongest transferable capability is screenshot-only UI grounding: given a screenshot, it returns structured text/icon elements with bounding boxes and captions. OpenChrome should support that capability as an optional provider, not as a bundled dependency.
This issue adds an OmniParser-compatible HTTP adapter that talks to an already-running parser service. The adapter must be safe-by-default: disabled unless configured, bounded by timeout/budget, and never required for normal OpenChrome operation.
Proposed Implementation
Add an optional perception provider:
src/vision/providers/omniparser-http-provider.tsOPENCHROME_VISION_PROVIDER=dom|omniparser-httpOPENCHROME_OMNIPARSER_URL=http://127.0.0.1:8000/parse/OPENCHROME_OMNIPARSER_TIMEOUT_MS=3000default or lower if global deadline is tighterOPENCHROME_OMNIPARSER_MAX_ELEMENTS=200vision_find(format='snapshot'|'both')Adapter behavior
{ "base64_image": "..." }to the configured OmniParser-compatible endpoint.parsed_content_list[]som_image_base64latencybboxratio coordinates toPerceptionElement.bboxCSS pixels andbboxRatio.text->text,interactive=falseunless response says otherwiseicon->iconorcontrolwhen interactivity is true/knownFallback policy
dom.omniparser-httpis configured and fails,vision_findshould return a clear warning and optionally include DOM fallback results.Non-goals
Acceptance Criteria
OPENCHROME_OMNIPARSER_TIMEOUT_MS.parsed_content_listentries are converted into validPerceptionSnapshotelements.warnings/metadata.package.json.npm run build && npm test -- --runInBand omniparser visionpass, plus fullnpm run build && npm test && npm run lint:tierbefore PR completion.Verification (post-merge, via OpenChrome MCP)
Record artifacts under
scripts/verify/omniparser-adoption-B-http-provider/.Setup
Mock parser requirements:
/parse/returns two elements:Search iconandContinue button, with ratio bboxes.Scenario 1 — configured provider returns OmniParser-sourced elements
Pass: snapshot is provider-marked and includes mock OmniParser elements.
Scenario 2 — parser timeout falls back without crashing
Pass: timeout is visible in warnings and OpenChrome returns a fallback result instead of a server failure.
Scenario 3 — provider is opt-in
Pass: without env/config, the default provider is not OmniParser.
Cleanup
Directionality / Fit Check
This adds only an adapter boundary. It intentionally avoids making OmniParser a transitive dependency or changing OpenChrome's default lightweight browser harness behavior.
Curated scope, overlap handling, and verification checklist
Scope classification
PerceptionSnapshotrequests to an OmniParser-compatible HTTP service.feat/1053-omniparser-http). Use that PR as the implementation vehicle.Overlap and conflict resolution
Implementation checklist
PerceptionSnapshotelements.Success criteria
PerceptionSnapshotcontract from feat(vision): provider-neutral perception snapshots for vision_find (OmniParser adoption A) #1052.Post-merge OpenChrome live verification checklist