feat(core): add fast and standard extract_data modes (#989)#1104
Conversation
Add a bounded fast/standard mode contract to extract_data so callers can request a broader local DOM pass only when fast extraction misses fields. Constraint: Preserve OpenChrome's local-first extraction design and keep missing mode on the current fast strategy set. Rejected: Silently escalating fast extraction to standard | hides latency cost and makes benchmark evidence harder to compare. Confidence: high Scope-risk: moderate Directive: Keep standard mode bounded and explicit; future extraction strategies must report modeUsed and metrics before widening defaults. Tested: npm test -- tests/tools/extract-data-modes.test.ts --runInBand; npm run build; npm run lint:changed; npm run lint -- --quiet; npm run lint:tier; git diff --check Not-tested: Real OpenChrome MCP median-latency fixture was not run in this pass. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
…nager The 'rejects invalid mode with an actionable error before touching the browser' test asserts that an unknown mode value (e.g. 'deep') is rejected without ever calling getSessionManager(). The current code reaches getPage() first, which causes the mocked test session manager to throw "Cannot read properties of undefined" and the assertion to fail before reaching the mode-validation branch. Wire parseExtractionMode() ahead of the session-manager call so: - invalid mode → deterministic "Error: Invalid mode. Use \"fast\" or \"standard\"." - no session lookup is triggered - valid mode (default 'fast') keeps the existing path
…seline Develop's tests/tools/act.test.ts asserts PR #1098 behavior — the "instruction or steps is required" wording and `steps: [...]` inputs — but PR #1098 (act-structured-steps) has not been merged. As a result, every PR that merges develop now inherits 4 broken assertions that develop's act.ts source cannot satisfy. Since #1105's charter is "restore develop CI signal", apply the temporary unblock here: - relax the missing/empty-instruction message regex to accept the current "instruction is required" OR the post-#1098 "instruction or steps is required" wording - skip the two structured-steps assertions ("executes structured steps", "rejects empty structured steps") until #1098 lands and develop's act.ts adopts the `steps:` field Each downstream PR that previously inherited this regression (#945, #1077, #1101, #1104, …) gets a green test pass once #1105 merges, and the change reverts cleanly when #1098 lands.
|
Forthcoming review note: content review pass complete. The PR scope is narrowly contained to |
* Restore develop CI signal after contract drift Align tests with current tab context metadata, doctor check thresholds, and tool description budget so unrelated feature PRs can rely on CI again. Constraint: This is a base-branch unblocker only; avoid carrying these fixes inside feature PRs. Rejected: Duplicating CI fixes into each feature branch | it would pollute narrowly scoped Stagehand-inspired PRs. Confidence: high Scope-risk: narrow Directive: Keep future contract changes paired with tests in the same PR to avoid hidden develop breakage. Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npm run lint; npx jest --config jest.config.js --runInBand tests/tools/tabs.test.ts tests/tool-descriptions.test.ts tests/cli/doctor/checks/chrome-binary.test.ts tests/cli/doctor/checks/disk-space.test.ts tests/cli/doctor/checks/home-writable.test.ts tests/cli/doctor/checks/network-local.test.ts Not-tested: Full npm test matrix locally * fix: strip leaked conflict markers * fix(1105): add TOOL_CAPABILITIES + capability + optional annotations * test(act): relax structured-steps assertions to restore develop CI baseline Develop's tests/tools/act.test.ts asserts PR #1098 behavior — the "instruction or steps is required" wording and `steps: [...]` inputs — but PR #1098 (act-structured-steps) has not been merged. As a result, every PR that merges develop now inherits 4 broken assertions that develop's act.ts source cannot satisfy. Since #1105's charter is "restore develop CI signal", apply the temporary unblock here: - relax the missing/empty-instruction message regex to accept the current "instruction is required" OR the post-#1098 "instruction or steps is required" wording - skip the two structured-steps assertions ("executes structured steps", "rejects empty structured steps") until #1098 lands and develop's act.ts adopts the `steps:` field Each downstream PR that previously inherited this regression (#945, #1077, #1101, #1104, …) gets a green test pass once #1105 merges, and the change reverts cleanly when #1098 lands. * fix(test): restore disk-space boundary — exactly 100 MB is warn not fail FAIL_THRESHOLD_MB uses strict < so the boundary value (100 MB) falls into the warn bucket, matching the test contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): restore capability-filter — add oc_task_run_* to TOOL_CAPABILITY_MAP and snapshot 7 oc_task_run_* tools registered by registerTaskRunTools() were missing from TOOL_CAPABILITY_MAP (failing lint:tools-capabilities) and from the v1.11 baseline snapshot (failing snapshot equality test). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): restore summarizeMcpResultForJournal export from mcp-server journal.test.ts imports summarizeMcpResultForJournal from mcp-server. The export was lost in a merge; restore it so the suite loads and the hint-stripping contract is verified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): restore intent label — generateSummary appends [intent: "..."] suffix issue #894 contract: when intent is provided and non-empty, generateSummary must append [intent: "..."] to the base summary string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): restore interact.ts — TOOL_ANNOTATIONS ref, ref fast-path, intent field Three test contracts restored: 1. tool-annotations: interact.ts now references TOOL_ANNOTATIONS.interact so the reverse-direction orphan check passes. 2. snapshot-refs: add ref param + STALE_REF fast-path so interact accepts a snapshot ref and clicks via backendDOMNodeId without AX re-resolution. 3. intent: add intent field (maxLength:120) with INVALID_INTENT validation per issue #894 schema contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): restore oc-skill-replay contract — envelope, DISABLED gate, replay_artifacts oc-skill-replay.test.ts and no-network.test.ts expect: - Tool always registered (gate moved from registerOcSkillReplayTool to handler) - DISABLED only when OPENCHROME_SKILL_REPLAY is explicitly falsy (absent = enabled) - Response envelope { ok, failure: { code } } instead of { error } - tabId optional (auto-detected from active session target) - SKILL_NOT_FOUND, ARTIFACT_MISSING, ARTIFACT_RESOLUTION_FAILED codes oc-skill-record.ts: add replay_artifacts input/output field; return null when feature explicitly disabled, array when enabled or absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(health-endpoint): poll for bind instead of fixed sleep The 'transport=both' scenario intermittently failed on slow ubuntu-20 runners because bind() landed slightly after the SelfHealing log line, so the 500 ms fixed sleep + 500 ms probe arrived during a window when the listener wasn't yet accepting connections. Replace the fixed sleep with a 10 s deadline polling loop that retries probePort every 150 ms until 'connected' (or the deadline expires). No behavior change for disabled scenarios — those still use the single-shot 500 ms probe and the same 'not connected' assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: shaun0927 <junghwan1912@gmail.com>
Develop CI restored, but this PR still has PR-specific test failuresPR #1105 has merged to Addressed in this session
Author follow-up
|
Restore the PR #1104 mode contract so fast remains the default response mode and standard mode performs the additional bounded DOM recovery pass only when the fast layered extractors leave fields missing. Constraint: PR #1104 CI was failing only in tests/tools/extract-data-modes.test.ts after the develop rebase; the fix must stay inside extract_data behavior. Rejected: Changing tests or broad extraction strategy rewrites | the feature implementation was missing response telemetry and the standard-only pass wiring. Confidence: high Scope-risk: narrow Directive: Keep invalid mode validation before any browser/session lookup and keep fast mode free of standard DOM scans. Tested: npm run build; npm run lint:changed; npm test -- --runTestsByPath tests/tools/extract-data-modes.test.ts --runInBand Not-tested: Full local jest was run before this fix and isolated the two failing mode tests; GitHub build-and-test remains the merge gate.
Integrate the merged output-handle changes into PR #1104 so extract_data keeps both contracts: fast/standard extraction mode telemetry and standard DOM recovery from this PR, plus waitForReady and output_handle response-mode support from develop. Constraint: PR #1104 became DIRTY after #938 merged into develop and GitHub requires the branch to be mergeable against the current base. Rejected: Dropping either branch's extract_data behavior | both features are independently valid and covered by targeted tests. Confidence: high Scope-risk: moderate Directive: Preserve separate extractionMode and outputMode names to avoid future mode-shadowing regressions. Tested: npm run build; npm test -- --runTestsByPath tests/tools/extract-data-modes.test.ts tests/tools/extract-data-ready.test.ts tests/core/output-handles.test.ts tests/journal/task-journal.test.ts --runInBand Not-tested: Full GitHub matrix after merge commit is pending and remains the merge gate.
|
Merge readiness review complete. This PR adds explicit Additional hardening was applied before merge:
Validation evidence:
Codex did not leave actionable code-review findings for this PR beyond the earlier CI-failure note. The PR-specific failures are resolved, the branch is cleanly mergeable into the current |
Progress / Review status
Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.
feat/989-extraction-modes→develop2ced91e— Make extraction effort explicit without changing the default pathOwner comment cleanup: 0 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.
Summary
mode: "fast" | "standard"toextract_data.modeUsed, duration, output chars, fields found/total, budget summary).standardmode with a broader bounded local DOM pass for fields that fast metadata/CSS heuristics miss.Direction / overlap review
fastnever silently escalates tostandard; callers must opt in viamode: "standard".Validation
npm test -- tests/tools/extract-data-modes.test.ts --runInBandnpm run buildnpm run lint:changednpm run lint -- --quietnpm run lint:tiergit diff --checkMerge-time real OpenChrome verification checklist
Run after merge with a local fixture containing JSON-LD title/price and visible card fields that require DOM scanning:
navigateto fixture.extract_datawith omitted mode ormode: "fast".extract_datawithmode: "standard".metrics.fast.modeUsed === "fast",standard.modeUsed === "standard", standard finds at least as many fields, and neither response includes a full DOM dump.Known gap
Closes #989.