feat(core): bounded query debug trace for extraction and element matching

## Summary

Add a lightweight query/debug trace surface for structured extraction and context-aware element resolution so agents and developers can see why a query matched, missed, or selected a particular candidate.

This should be a local observability feature, not a cloud debugger.

## Why

AgentQL's debugger/playground experience is valuable because query failures are easier to fix when the user can inspect the interpreted query, candidates, and result. OpenChrome already emphasizes harness guidance and observability; adding a small debug trace can reduce ambiguous failures and repeated probing.

Expected impact:

- Faster diagnosis of extraction/resolver failures.
- Better benchmark artifacts for semantic extraction work.
- Less token waste from repeated `read_page` calls just to understand why a match failed.

## Scope

Add a bounded per-session/tab debug record for the most recent structured query or context-aware element resolution.

Required v1 surface:

- MCP tool: `oc_query_debug` with `{ tabId, kind?: "extract" | "element" }`.
- Optional resource mirrors can be added in a later issue; do not block this issue on a resource implementation.

Debug payload should include:

- `kind`: `extract` or `element`,
- normalized query/instruction/context,
- parsed query AST or schema summary,
- mode used (`fast` / `standard`) when applicable,
- strategy attempts and durations,
- fields found/missing for extraction,
- top N element candidates with capped metadata and scores,
- rejection reasons where available,
- output sizes and truncation flags.

Safety constraints:

- Store only last N records per session/tab with small defaults.
- Redact/sanitize text using existing content sanitization/redaction utilities where appropriate.
- Do not include full page HTML/DOM by default.
- Debug surface must be opt-in to call; normal tool responses should stay compact.

## Non-goals

- Browser extension debugger UI.
- Persistent long-term trace storage.
- Recording credentials, full DOM snapshots, screenshots, or full network bodies.
- Replacing existing logs/metrics/recording subsystems.

## Implementation milestones

1. Add a bounded in-memory debug store keyed by session/tab/kind with explicit eviction.
2. Add `oc_query_debug` tool and tests for empty, extraction, and element-debug records.
3. Instrument extraction first; instrument element resolution when the context-aware resolver work is present.
4. Apply sanitization/redaction and size caps before storing or returning records.
5. Add real OpenChrome evidence with representative capped debug JSON.

## Acceptance criteria

- [ ] A caller can retrieve the latest extraction debug record after `extract_data` query/schema mode.
- [ ] A caller can retrieve the latest element-resolution debug record after a context-aware resolver attempt, if that resolver work is implemented.
- [ ] Debug records are size-capped and never include an uncapped full DOM/HTML dump.
- [ ] Debug records include enough information to distinguish parser failure, no candidates, low confidence/tie, and strategy timeout.
- [ ] Normal `extract_data` / `act` output remains unchanged unless a debug flag is explicitly requested.
- [ ] Debug storage has bounded memory behavior and is cleared on session/tab cleanup.

## Required tests

- Unit tests for record creation, cap enforcement, and eviction.
- Sanitization/redaction test with suspicious instruction-like page text and credential-like values.
- Handler/resource test that retrieves latest debug after a simulated extraction.
- Negative test proving no debug record returns for unknown tab/session.

## Real OpenChrome verification after implementation

Use a local fixture with repeated buttons and partial structured data.

Real verification steps:

1. Start OpenChrome through the real MCP harness.
2. Navigate to the fixture.
3. Run an `extract_data` query that intentionally asks for one missing field.
4. Retrieve `oc_query_debug` or the equivalent resource.
5. Verify debug payload includes:
   - parsed fields,
   - mode used,
   - found/missing fields,
   - strategy attempts,
   - capped output size.
6. Run a context-aware `act` scenario if implemented.
7. Retrieve element debug and verify top candidates include scores and context snippets.

The PR must include representative debug JSON with sensitive values absent/redacted.

## Repo-fit check

This supports OpenChrome's observability and harness-guidance direction. It should stay lightweight and bounded so it does not become a parallel tracing system or token-heavy default response.

## AgentQL comparison review hardening (2026-05-12)

The debug surface must support merge-time verification without becoming a default token-heavy response.

Additional requirements:

- `debug` / `oc_query_debug` output must include accepted candidate, rejected candidate top N, and rejection reasons when available: role mismatch, hidden/disabled, container mismatch, low confidence, strategy timeout.
- `captureEvidence` or equivalent must be explicit opt-in and should integrate with `oc_evidence_bundle` rather than inventing a separate persistent artifact format.
- Debug output must redact password/token-like values and cap text snippets.
- Normal `debug:false` tool responses must stay compact and must not include rejected candidate lists.

Additional real OpenChrome verification:

- Run ambiguous fixture with debug off and debug on; record output size difference.
- Verify debug on explains why header/footer/hidden candidates were rejected.
- Verify password fixture does not expose input values in debug JSON or evidence metadata.

Additional dependency boundary:

- Do not add AgentQL, Playwright, a browser extension, or any cloud debugger dependency for this issue. The debug trace must be produced from OpenChrome's existing local resolver/extraction/evidence data.



## Curated scope, overlap handling, and verification checklist

### Scope classification
- **Canonical lane:** bounded query/matching observability.
- **Primary deliverable:** `oc_query_debug` or equivalent bounded debug trace for recent extraction and element-resolution decisions.
- **Open PR:** #1115 (`feat/992-query-debug`). Continue there; do not create duplicate implementation work.
- **Non-goal:** cloud debugger, unbounded candidate dumps, changing resolver decisions solely for debug, or exposing secrets in traces.

### Overlap and conflict resolution
- [ ] Coordinate with #991: resolver scoring can emit trace details, but this issue owns the debug surface and bounds.
- [ ] Keep extraction and element-matching traces local/session-scoped; do not add remote telemetry.
- [ ] Use token metrics (#990) if available but do not require it for v1.

### Implementation checklist
- [ ] Add bounded per-session/tab debug record for most recent structured query or context-aware resolution.
- [ ] Expose interpreted query, candidate count, top candidates/reasons, selected/missed reason, filters applied, and truncation status.
- [ ] Redact sensitive text/attributes and enforce max candidates/bytes.
- [ ] Add tests for successful match, miss, ambiguous candidates, truncation, redaction, and trace expiration/clear behavior.
- [ ] Document how developers should call the debug surface after a failed query.

### Success criteria
- [ ] Developers can see why a query matched or missed without repeating large `read_page` calls.
- [ ] Debug traces are bounded, local, and redacted.
- [ ] Enabling debug does not change selected element/result unless explicitly documented for tracing.
- [ ] Missing trace returns a clear no-data response.

### Post-merge OpenChrome live verification checklist
- [ ] Run a local extraction/resolver query that succeeds and verify debug trace shows selected candidate reasons.
- [ ] Run an ambiguous or failed query and verify missed/ambiguous reasons are visible.
- [ ] Use a fixture containing sensitive-looking values and verify redaction/truncation.
- [ ] Record debug JSON excerpts and byte/candidate bounds in merge notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): bounded query debug trace for extraction and element matching #992

Summary

Why

Scope

Non-goals

Implementation milestones

Acceptance criteria

Required tests

Real OpenChrome verification after implementation

Repo-fit check

AgentQL comparison review hardening (2026-05-12)

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(core): bounded query debug trace for extraction and element matching #992

Description

Summary

Why

Scope

Non-goals

Implementation milestones

Acceptance criteria

Required tests

Real OpenChrome verification after implementation

Repo-fit check

AgentQL comparison review hardening (2026-05-12)

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions