You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a lightweight query/debug trace surface for structured extraction and context-aware element resolution so agents and developers can see why a query matched, missed, or selected a particular candidate.
This should be a local observability feature, not a cloud debugger.
Why
AgentQL's debugger/playground experience is valuable because query failures are easier to fix when the user can inspect the interpreted query, candidates, and result. OpenChrome already emphasizes harness guidance and observability; adding a small debug trace can reduce ambiguous failures and repeated probing.
Expected impact:
Faster diagnosis of extraction/resolver failures.
Better benchmark artifacts for semantic extraction work.
Less token waste from repeated read_page calls just to understand why a match failed.
Scope
Add a bounded per-session/tab debug record for the most recent structured query or context-aware element resolution.
Add a bounded in-memory debug store keyed by session/tab/kind with explicit eviction.
Add oc_query_debug tool and tests for empty, extraction, and element-debug records.
Instrument extraction first; instrument element resolution when the context-aware resolver work is present.
Apply sanitization/redaction and size caps before storing or returning records.
Add real OpenChrome evidence with representative capped debug JSON.
Acceptance criteria
A caller can retrieve the latest extraction debug record after extract_data query/schema mode.
A caller can retrieve the latest element-resolution debug record after a context-aware resolver attempt, if that resolver work is implemented.
Debug records are size-capped and never include an uncapped full DOM/HTML dump.
Debug records include enough information to distinguish parser failure, no candidates, low confidence/tie, and strategy timeout.
Normal extract_data / act output remains unchanged unless a debug flag is explicitly requested.
Debug storage has bounded memory behavior and is cleared on session/tab cleanup.
Required tests
Unit tests for record creation, cap enforcement, and eviction.
Sanitization/redaction test with suspicious instruction-like page text and credential-like values.
Handler/resource test that retrieves latest debug after a simulated extraction.
Negative test proving no debug record returns for unknown tab/session.
Real OpenChrome verification after implementation
Use a local fixture with repeated buttons and partial structured data.
Real verification steps:
Start OpenChrome through the real MCP harness.
Navigate to the fixture.
Run an extract_data query that intentionally asks for one missing field.
Retrieve oc_query_debug or the equivalent resource.
Verify debug payload includes:
parsed fields,
mode used,
found/missing fields,
strategy attempts,
capped output size.
Run a context-aware act scenario if implemented.
Retrieve element debug and verify top candidates include scores and context snippets.
The PR must include representative debug JSON with sensitive values absent/redacted.
Repo-fit check
This supports OpenChrome's observability and harness-guidance direction. It should stay lightweight and bounded so it does not become a parallel tracing system or token-heavy default response.
AgentQL comparison review hardening (2026-05-12)
The debug surface must support merge-time verification without becoming a default token-heavy response.
Additional requirements:
debug / oc_query_debug output must include accepted candidate, rejected candidate top N, and rejection reasons when available: role mismatch, hidden/disabled, container mismatch, low confidence, strategy timeout.
captureEvidence or equivalent must be explicit opt-in and should integrate with oc_evidence_bundle rather than inventing a separate persistent artifact format.
Debug output must redact password/token-like values and cap text snippets.
Normal debug:false tool responses must stay compact and must not include rejected candidate lists.
Additional real OpenChrome verification:
Run ambiguous fixture with debug off and debug on; record output size difference.
Verify debug on explains why header/footer/hidden candidates were rejected.
Verify password fixture does not expose input values in debug JSON or evidence metadata.
Additional dependency boundary:
Do not add AgentQL, Playwright, a browser extension, or any cloud debugger dependency for this issue. The debug trace must be produced from OpenChrome's existing local resolver/extraction/evidence data.
Curated scope, overlap handling, and verification checklist
Summary
Add a lightweight query/debug trace surface for structured extraction and context-aware element resolution so agents and developers can see why a query matched, missed, or selected a particular candidate.
This should be a local observability feature, not a cloud debugger.
Why
AgentQL's debugger/playground experience is valuable because query failures are easier to fix when the user can inspect the interpreted query, candidates, and result. OpenChrome already emphasizes harness guidance and observability; adding a small debug trace can reduce ambiguous failures and repeated probing.
Expected impact:
read_pagecalls just to understand why a match failed.Scope
Add a bounded per-session/tab debug record for the most recent structured query or context-aware element resolution.
Required v1 surface:
oc_query_debugwith{ tabId, kind?: "extract" | "element" }.Debug payload should include:
kind:extractorelement,fast/standard) when applicable,Safety constraints:
Non-goals
Implementation milestones
oc_query_debugtool and tests for empty, extraction, and element-debug records.Acceptance criteria
extract_dataquery/schema mode.extract_data/actoutput remains unchanged unless a debug flag is explicitly requested.Required tests
Real OpenChrome verification after implementation
Use a local fixture with repeated buttons and partial structured data.
Real verification steps:
extract_dataquery that intentionally asks for one missing field.oc_query_debugor the equivalent resource.actscenario if implemented.The PR must include representative debug JSON with sensitive values absent/redacted.
Repo-fit check
This supports OpenChrome's observability and harness-guidance direction. It should stay lightweight and bounded so it does not become a parallel tracing system or token-heavy default response.
AgentQL comparison review hardening (2026-05-12)
The debug surface must support merge-time verification without becoming a default token-heavy response.
Additional requirements:
debug/oc_query_debugoutput must include accepted candidate, rejected candidate top N, and rejection reasons when available: role mismatch, hidden/disabled, container mismatch, low confidence, strategy timeout.captureEvidenceor equivalent must be explicit opt-in and should integrate withoc_evidence_bundlerather than inventing a separate persistent artifact format.debug:falsetool responses must stay compact and must not include rejected candidate lists.Additional real OpenChrome verification:
Additional dependency boundary:
Curated scope, overlap handling, and verification checklist
Scope classification
oc_query_debugor equivalent bounded debug trace for recent extraction and element-resolution decisions.feat/992-query-debug). Continue there; do not create duplicate implementation work.Overlap and conflict resolution
Implementation checklist
Success criteria
read_pagecalls.Post-merge OpenChrome live verification checklist