Skip to content

feat(dom): enhanced actionable-element annotations and safe occlusion filtering (browser-use adoption G3) #977

@shaun0927

Description

@shaun0927

Background

browser-use invests heavily in actionable-element quality: JS click listeners, AX roles/properties, label/span wrappers, paint order, and bounding-box filtering. OpenChrome already has compact DOM serialization, but its interactiveness detection is still mostly tag/role/attribute based. Improving element annotations should reduce wrong clicks and unnecessary retry/token loops.

Related existing work: #866 (oc_observe), #853 (occlusion/iframe/tiling for vision_find), #831 (canonical read_page refs), #828 (unify refs). This issue is limited to safe annotation and optional filtering in the DOM/read pipeline.

Goal

Improve actionable element quality by adding enhanced clickable and occlusion annotations before considering aggressive filtering.

Scope

Add an elementQuality or equivalent metadata layer for DOM/read outputs:

{
  "ref": "@e12",
  "tag": "div",
  "role": "button",
  "text": "Submit",
  "interactiveConfidence": "high",
  "signals": ["role:button", "js-click-listener", "visible-bounds"],
  "occlusion": "none|partial|covered|unknown",
  "filterDecision": "emit|annotate-only|suppress"
}

Required behavior:

  • Start with annotate mode as the safe default for new metadata.
  • Add optional elementFiltering=safe|aggressive|off only after annotation tests pass.
  • Detect at least:
    • JS click listener or equivalent browser-side signal where feasible
    • AX interactive roles/properties where available
    • label/span wrappers around form controls
    • obviously covered/hidden elements via paint-order or bounding-box evidence where feasible
  • Never suppress elements with uncertain evidence in safe mode.
  • Preserve existing refs unless a mode explicitly asks for filtering.

Non-goals

  • Do not require screenshot/vision to classify every element.
  • Do not make aggressive filtering the default.
  • Do not break canonical ref behavior.

Implementation notes

Success criteria

  • Existing read_page output remains compatible by default.
  • New annotation mode identifies non-native clickable controls that the current tag/role path would under-describe.
  • Safe filtering never suppresses uncertain or interactive descendants.
  • Tests cover nested label/span form controls, role-based controls, covered elements, and unknown iframe/shadow cases.

Real OpenChrome validation after implementation

Use a local fixture plus one real-world page.

  1. Create a temporary local HTML fixture with:
    • <div role="button">
    • <span><input type="checkbox"></span> inside a label-like wrapper
    • an element covered by an overlay
    • a normal link and button
  2. Serve it locally:
    python3 -m http.server 8765 /path/to/fixture-dir
    npm run build
    node dist/cli/index.js serve
  3. In an MCP client:
    • navigate to http://127.0.0.1:8765/fixture.html
    • read_page or inspect with enhanced annotation enabled
    • click the annotated role button and wrapped checkbox using OpenChrome refs
    • verify the covered element is annotated as covered/partial/unknown and is not the top recommended click target in safe mode
  4. Also validate on https://github.com/browser-use/browser-use:
    • read_page with annotation enabled
    • confirm main repo controls are still present and clickable refs work
  5. Pass condition:
    • fixture interactions succeed using emitted refs.
    • safe mode does not hide native button/link refs.
    • no increase in default output size unless annotation mode is requested.

Review checklist

  • Conservative default; no aggressive suppression by default.
  • Ref stability preserved.
  • Shadow/iframe uncertainty handled explicitly.
  • Improves action reliability rather than adding decorative metadata.

Self-review clarifications (added before implementation)

  • First mergeable slice should ship annotate mode only unless suppression has dedicated regression coverage. Filtering can be enabled in a follow-up PR.
  • interactiveConfidence allowed values are high | medium | low | unknown; do not invent free-form confidence strings.
  • occlusion allowed values are none | partial | covered | unknown.
  • Any element with unknown occlusion and any positive interactive signal must still be emitted in safe mode.
  • If JS click listener detection is not available through the chosen CDP path, the implementation must explicitly mark that signal as unknown, not silently omit the category.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: interaction/action reliability.
  • Primary deliverable: enhanced actionable-element annotations and safe occlusion filtering (browser-use adoption G3).
  • Open PR: none currently linked in the active priority map; verify GitHub again before implementation.
  • Detected labels: enhancement, P1, performance, adversarial-robust.
  • Affected OpenChrome surfaces from issue text: read_page, vision_find, act, interact, find, navigate, oc_observe.
  • Non-goal: breaking existing tool response compatibility, changing defaults without opt-in, or adding server-side autonomous planning.

Overlap and conflict resolution

Implementation checklist

  • Restate the exact contract for enhanced actionable-element annotations and safe occlusion filtering (browser-use adoption G3) in code/docs before changing behavior.
  • Implement the narrow surface named by this issue before broadening to adjacent systems.
  • Preserve existing behavior by default and gate new behavior with explicit config/tool arguments where appropriate.
  • Add targeted unit/integration tests for success, failure, compatibility, and bounded output.
  • Add regression coverage for the issue-specific happy path, failure path, default/disabled path, and artifact/output bounds.
  • Update user-facing docs or inline tool descriptions when hosts must choose a new flag, mode, policy, or workflow.

Success criteria

  • The implementation satisfies the primary deliverable without broadening into non-goals.
  • Existing default behavior remains backward-compatible or the issue explicitly documents the compatibility break.
  • Failure cases return bounded, actionable diagnostics rather than silent fallback or unbounded dumps.
  • Tests/benchmarks cover the concrete surface named in this issue, not only helper utilities.
  • Any produced artifact is deterministic, redacted, and small enough for merge review or stored behind handles.

Post-merge OpenChrome live verification checklist

  • Run the documented local OpenChrome fixture or smoke path for enhanced actionable-element annotations and safe occlusion filtering (browser-use adoption G3) and capture the exact command/tool calls.
  • Verify read_page behavior matches the issue goal in both the enabled path and the default/disabled compatibility path.
  • Inspect generated artifacts/logs/responses for bounded size, redaction, source links, and clear failure diagnostics.
  • Record sanitized output excerpts, artifact paths, and any benchmark/latency/payload numbers in merge verification notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highadversarial-robustDefense against DOM cloaking and perception spoofing (Q3)enhancementNew feature or requestperformancePerformance, latency, throughput, or resource-use improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions