You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduce an explicit mode: "fast" | "standard" policy for structured extraction paths so callers can choose between low-latency/local-heuristic extraction and deeper, more expensive page analysis.
This issue covers the mode contract and the first implementation target: extract_data query/schema extraction. It should not broaden into all browser tools in the first PR.
Why
AgentQL's fast vs standard distinction is useful because it makes the latency/accuracy tradeoff explicit. OpenChrome already follows a similar philosophy in places such as read_page AX overflow fallback being opt-in. This issue makes that pattern consistent for structured extraction.
Expected impact:
Reduce accidental heavy page traversals for simple extraction.
Give agents a deterministic way to request deeper analysis only when needed.
Make performance benchmarks easier to compare across PRs.
Scope
Add mode to extract_data with the following semantics:
fast mode
Default for query-based extraction.
Allowed work:
JSON-LD, Microdata, OpenGraph.
Bounded visible DOM scan.
Existing CSS heuristic extraction with strict node/time budgets.
No full AX traversal.
No screenshot or visual analysis.
standard mode
Opt-in when fast mode misses required fields.
Allowed additional work:
Broader DOM scan with higher but bounded node/time limits.
Parent/container context for lists.
Existing pagination/list heuristics if already available locally.
Still no outbound vendor/LLM calls.
Error/output contract
Response must include modeUsed.
If mode: "fast" misses required fields, response should suggest retrying with mode: "standard" only when evidence supports that suggestion.
Invalid mode returns an actionable error.
Non-goals
Applying mode to act, find, read_page, or vision_find in this issue.
Adding LLM-based semantic extraction.
Making standard the default.
Implementation milestones
Define shared ExtractionMode constants and per-mode time/node budgets.
Gate extraction strategies by mode without changing existing schema-call defaults.
Add modeUsed, duration, output-size, and fields-found metrics to extraction responses or benchmark artifacts.
Add tests that prove fast-only and standard-only strategy paths are distinct.
Add real OpenChrome median-latency and output-size comparison evidence.
Acceptance criteria
extract_data accepts mode: "fast" | "standard"; missing mode uses current behavior for schema calls and fast for query calls unless that would break existing tests.
modeUsed is present in successful responses.
fast mode has explicit max node and timeout budgets documented in code constants.
standard mode has explicit max node and timeout budgets documented in code constants.
The implementation has a kill-switch or safe fallback path if a mode-specific strategy fails.
Benchmark output records mode, duration, output size, and fields found.
Required tests
Unit tests for mode validation and defaulting.
Handler tests proving fast does not invoke standard-only strategy hooks.
Handler tests proving standard can recover a field intentionally omitted from fast-only metadata.
Regression test ensuring existing extract_data schema calls without mode still pass.
Real OpenChrome verification after implementation
Add a fixture and real MCP scenario with a page where:
JSON-LD contains title/price but not all visible card fields.
Some fields require visible DOM scanning.
Run OpenChrome real mode and perform:
navigate to the fixture.
extract_data with mode: "fast".
extract_data with mode: "standard".
Compare responses.
Required verification assertions:
fast.modeUsed === "fast".
standard.modeUsed === "standard".
fast completes faster than or equal to standard on median of 5 runs, or the PR explains why the fixture is too small to show a difference.
standard finds at least as many fields as fast.
Neither mode emits a full read_page-sized DOM dump.
Both modes work without external API keys.
The PR should include the real verification command and a small JSON summary with median latency and output character count.
Repo-fit check
This is a small, explicit performance contract. It avoids a broad refactor by starting with extract_data only and preserves OpenChrome's local-first, opt-in-expensive-work design.
AgentQL comparison review hardening (2026-05-12)
This issue is intentionally scoped to extraction modes. Interaction/query resolver modes for oc_query are part of #1045 and should reuse the same naming only if the behavior is documented consistently.
Additional merge-verification requirements:
The PR must publish a small benchmark artifact with 5-run median latency, p95 latency, output character count, and fields-found count for both fast and standard.
fast must never silently escalate to standard; any escalation must require an explicit option or be reported as modeUsed / escalatedFrom.
standard must remain bounded by documented node/time budgets and must settle within the configured MCP tool timeout.
If standard finds no additional data over fast, the response should explain that retrying with a broader selector or query may be needed rather than encouraging blind retries.
Curated scope, overlap handling, and verification checklist
Summary
Introduce an explicit
mode: "fast" | "standard"policy for structured extraction paths so callers can choose between low-latency/local-heuristic extraction and deeper, more expensive page analysis.This issue covers the mode contract and the first implementation target:
extract_dataquery/schema extraction. It should not broaden into all browser tools in the first PR.Why
AgentQL's
fastvsstandarddistinction is useful because it makes the latency/accuracy tradeoff explicit. OpenChrome already follows a similar philosophy in places such asread_pageAX overflow fallback being opt-in. This issue makes that pattern consistent for structured extraction.Expected impact:
Scope
Add
modetoextract_datawith the following semantics:fastmodeDefault for query-based extraction.
Allowed work:
standardmodeOpt-in when fast mode misses required fields.
Allowed additional work:
Error/output contract
modeUsed.mode: "fast"misses required fields, response should suggest retrying withmode: "standard"only when evidence supports that suggestion.Non-goals
act,find,read_page, orvision_findin this issue.standardthe default.Implementation milestones
ExtractionModeconstants and per-mode time/node budgets.modeUsed, duration, output-size, and fields-found metrics to extraction responses or benchmark artifacts.Acceptance criteria
extract_dataacceptsmode: "fast" | "standard"; missing mode uses current behavior for schema calls andfastfor query calls unless that would break existing tests.modeUsedis present in successful responses.fastmode has explicit max node and timeout budgets documented in code constants.standardmode has explicit max node and timeout budgets documented in code constants.Required tests
fastdoes not invoke standard-only strategy hooks.standardcan recover a field intentionally omitted from fast-only metadata.extract_dataschema calls without mode still pass.Real OpenChrome verification after implementation
Add a fixture and real MCP scenario with a page where:
Run OpenChrome real mode and perform:
navigateto the fixture.extract_datawithmode: "fast".extract_datawithmode: "standard".Required verification assertions:
fast.modeUsed === "fast".standard.modeUsed === "standard".fastcompletes faster than or equal tostandardon median of 5 runs, or the PR explains why the fixture is too small to show a difference.standardfinds at least as many fields asfast.read_page-sized DOM dump.The PR should include the real verification command and a small JSON summary with median latency and output character count.
Repo-fit check
This is a small, explicit performance contract. It avoids a broad refactor by starting with
extract_dataonly and preserves OpenChrome's local-first, opt-in-expensive-work design.AgentQL comparison review hardening (2026-05-12)
This issue is intentionally scoped to extraction modes. Interaction/query resolver modes for
oc_queryare part of #1045 and should reuse the same naming only if the behavior is documented consistently.Additional merge-verification requirements:
fastandstandard.fastmust never silently escalate tostandard; any escalation must require an explicit option or be reported asmodeUsed/escalatedFrom.standardmust remain bounded by documented node/time budgets and must settle within the configured MCP tool timeout.standardfinds no additional data overfast, the response should explain that retrying with a broader selector or query may be needed rather than encouraging blind retries.Curated scope, overlap handling, and verification checklist
Scope classification
mode: "fast" | "standard"support forextract_dataquery/schema paths with benchmark evidence.feat/989-extraction-modes). Continue there; do not duplicate the PR.Overlap and conflict resolution
Implementation checklist
Success criteria
extract_data.Post-merge OpenChrome live verification checklist