Skip to content

feat(core): bounded query debug trace for extraction and element matching #992

@shaun0927

Description

@shaun0927

Summary

Add a lightweight query/debug trace surface for structured extraction and context-aware element resolution so agents and developers can see why a query matched, missed, or selected a particular candidate.

This should be a local observability feature, not a cloud debugger.

Why

AgentQL's debugger/playground experience is valuable because query failures are easier to fix when the user can inspect the interpreted query, candidates, and result. OpenChrome already emphasizes harness guidance and observability; adding a small debug trace can reduce ambiguous failures and repeated probing.

Expected impact:

  • Faster diagnosis of extraction/resolver failures.
  • Better benchmark artifacts for semantic extraction work.
  • Less token waste from repeated read_page calls just to understand why a match failed.

Scope

Add a bounded per-session/tab debug record for the most recent structured query or context-aware element resolution.

Required v1 surface:

  • MCP tool: oc_query_debug with { tabId, kind?: "extract" | "element" }.
  • Optional resource mirrors can be added in a later issue; do not block this issue on a resource implementation.

Debug payload should include:

  • kind: extract or element,
  • normalized query/instruction/context,
  • parsed query AST or schema summary,
  • mode used (fast / standard) when applicable,
  • strategy attempts and durations,
  • fields found/missing for extraction,
  • top N element candidates with capped metadata and scores,
  • rejection reasons where available,
  • output sizes and truncation flags.

Safety constraints:

  • Store only last N records per session/tab with small defaults.
  • Redact/sanitize text using existing content sanitization/redaction utilities where appropriate.
  • Do not include full page HTML/DOM by default.
  • Debug surface must be opt-in to call; normal tool responses should stay compact.

Non-goals

  • Browser extension debugger UI.
  • Persistent long-term trace storage.
  • Recording credentials, full DOM snapshots, screenshots, or full network bodies.
  • Replacing existing logs/metrics/recording subsystems.

Implementation milestones

  1. Add a bounded in-memory debug store keyed by session/tab/kind with explicit eviction.
  2. Add oc_query_debug tool and tests for empty, extraction, and element-debug records.
  3. Instrument extraction first; instrument element resolution when the context-aware resolver work is present.
  4. Apply sanitization/redaction and size caps before storing or returning records.
  5. Add real OpenChrome evidence with representative capped debug JSON.

Acceptance criteria

  • A caller can retrieve the latest extraction debug record after extract_data query/schema mode.
  • A caller can retrieve the latest element-resolution debug record after a context-aware resolver attempt, if that resolver work is implemented.
  • Debug records are size-capped and never include an uncapped full DOM/HTML dump.
  • Debug records include enough information to distinguish parser failure, no candidates, low confidence/tie, and strategy timeout.
  • Normal extract_data / act output remains unchanged unless a debug flag is explicitly requested.
  • Debug storage has bounded memory behavior and is cleared on session/tab cleanup.

Required tests

  • Unit tests for record creation, cap enforcement, and eviction.
  • Sanitization/redaction test with suspicious instruction-like page text and credential-like values.
  • Handler/resource test that retrieves latest debug after a simulated extraction.
  • Negative test proving no debug record returns for unknown tab/session.

Real OpenChrome verification after implementation

Use a local fixture with repeated buttons and partial structured data.

Real verification steps:

  1. Start OpenChrome through the real MCP harness.
  2. Navigate to the fixture.
  3. Run an extract_data query that intentionally asks for one missing field.
  4. Retrieve oc_query_debug or the equivalent resource.
  5. Verify debug payload includes:
    • parsed fields,
    • mode used,
    • found/missing fields,
    • strategy attempts,
    • capped output size.
  6. Run a context-aware act scenario if implemented.
  7. Retrieve element debug and verify top candidates include scores and context snippets.

The PR must include representative debug JSON with sensitive values absent/redacted.

Repo-fit check

This supports OpenChrome's observability and harness-guidance direction. It should stay lightweight and bounded so it does not become a parallel tracing system or token-heavy default response.

AgentQL comparison review hardening (2026-05-12)

The debug surface must support merge-time verification without becoming a default token-heavy response.

Additional requirements:

  • debug / oc_query_debug output must include accepted candidate, rejected candidate top N, and rejection reasons when available: role mismatch, hidden/disabled, container mismatch, low confidence, strategy timeout.
  • captureEvidence or equivalent must be explicit opt-in and should integrate with oc_evidence_bundle rather than inventing a separate persistent artifact format.
  • Debug output must redact password/token-like values and cap text snippets.
  • Normal debug:false tool responses must stay compact and must not include rejected candidate lists.

Additional real OpenChrome verification:

  • Run ambiguous fixture with debug off and debug on; record output size difference.
  • Verify debug on explains why header/footer/hidden candidates were rejected.
  • Verify password fixture does not expose input values in debug JSON or evidence metadata.

Additional dependency boundary:

  • Do not add AgentQL, Playwright, a browser extension, or any cloud debugger dependency for this issue. The debug trace must be produced from OpenChrome's existing local resolver/extraction/evidence data.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: bounded query/matching observability.
  • Primary deliverable: oc_query_debug or equivalent bounded debug trace for recent extraction and element-resolution decisions.
  • Open PR: feat(core): bounded query debug trace (#992) #1115 (feat/992-query-debug). Continue there; do not create duplicate implementation work.
  • Non-goal: cloud debugger, unbounded candidate dumps, changing resolver decisions solely for debug, or exposing secrets in traces.

Overlap and conflict resolution

Implementation checklist

  • Add bounded per-session/tab debug record for most recent structured query or context-aware resolution.
  • Expose interpreted query, candidate count, top candidates/reasons, selected/missed reason, filters applied, and truncation status.
  • Redact sensitive text/attributes and enforce max candidates/bytes.
  • Add tests for successful match, miss, ambiguous candidates, truncation, redaction, and trace expiration/clear behavior.
  • Document how developers should call the debug surface after a failed query.

Success criteria

  • Developers can see why a query matched or missed without repeating large read_page calls.
  • Debug traces are bounded, local, and redacted.
  • Enabling debug does not change selected element/result unless explicitly documented for tracing.
  • Missing trace returns a clear no-data response.

Post-merge OpenChrome live verification checklist

  • Run a local extraction/resolver query that succeeds and verify debug trace shows selected candidate reasons.
  • Run an ambiguous or failed query and verify missed/ambiguous reasons are visible.
  • Use a fixture containing sensitive-looking values and verify redaction/truncation.
  • Record debug JSON excerpts and byte/candidate bounds in merge notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions