Skip to content

feat(contracts): add deterministic browser task signatures #1049

@shaun0927

Description

@shaun0927

Problem

OpenChrome has outcome contracts, plan execution, and hinting, but task boundaries are still spread across natural language instructions, tool arguments, and optional success checks. This makes it harder for agents to know when to stop, when a tool sequence is outside the intended action space, or whether repeated observation calls are wandering.

DSPy-style lesson to adopt: separate task interface from prompt/instruction wording. OpenChrome should add a deterministic, typed browser task signature that declares inputs, allowed tools, success contract, stop conditions, and loop guards.

Direction / fit

  • Tier: core for schema validation and deterministic evaluation; pilot only if later wired to policy/retry execution.
  • No LLM judgment, no DSPy dependency, no prompt optimizer.
  • Builds on existing src/contracts/** assertions rather than replacing them.
  • Should reduce wandering by making success and stop conditions explicit in machine-readable form.

Goal

Add a BrowserTaskSignature schema and validator that can be attached to compiled plans, workflow runs, or explicit tool calls to guide deterministic progress evaluation and response metadata.

Proposed implementation

  1. Define a schema, e.g. src/contracts/task-signature.ts:
    interface BrowserTaskSignature {
      version: 1;
      id: string;
      description: string;
      inputs: Record<string, { type: 'string' | 'number' | 'boolean'; required: boolean; redaction?: 'secret' | 'none' }>;
      allowedTools: string[];
      success: Assertion;
      stopWhen?: Assertion[];
      failureWhen?: Assertion[];
      loopGuards?: Array<{
        kind: 'max_same_tool' | 'max_observation_calls' | 'max_non_progress_calls';
        limit: number;
        window: number;
      }>;
      budgets?: {
        maxToolCalls?: number;
        maxWallMs?: number;
      };
    }
  2. Add validator with batched errors, consistent with existing contract validator style.
  3. Add deterministic evaluator helper that consumes:
    • task signature;
    • current EvalContext;
    • recent tool call summary;
    • elapsed time/tool count.
  4. Evaluator returns structured status and performs a preflight check before executing a signature-bound compiled plan. If a planned step uses a tool outside allowedTools, the plan must not execute and must return failure with a disallowed-tool reason:
    type TaskSignatureStatus =
      | { status: 'continue'; reasons: string[] }
      | { status: 'success'; evidence: unknown }
      | { status: 'stop'; reasons: string[] }
      | { status: 'failure'; reasons: string[] }
      | { status: 'budget_exhausted'; reasons: string[] };
  5. Integrate minimally where it creates clear value without changing default behavior:
    • execute_plan may accept an optional signature ID/object and include status in result metadata.
    • workflow_status may surface signature progress if a workflow was initialized with one.
    • Existing calls without a signature behave exactly as before.
  6. Add docs with examples for safe tasks and stop conditions.

Acceptance criteria

  • BrowserTaskSignature schema is implemented and exported from a stable module.
  • Validator returns all structural errors at once and rejects unknown assertion kinds via existing contract validation.
  • Evaluator detects success, failure, explicit stop, max tool calls, max wall time, and loop guard violations.
  • Optional integration with execute_plan or workflow status is backward-compatible: no signature means no behavior change except absent metadata.
  • Tests cover valid signature, invalid schema, success assertion pass, failure assertion pass, stop condition pass, budget exhaustion, loop guard violation, and redacted secret inputs.
  • Docs explain that signatures are deterministic task boundaries, not LLM prompts.
  • No outbound LLM calls, Python, or DSPy dependency is introduced.

Required OpenChrome real-validation after implementation

Use local fixtures and a built OpenChrome server.

Setup

npm ci
npm run build
node tests/fixtures/sites/task-signature/serve.mjs --port 9995 >/tmp/oc-task-signature-fixtures.log 2>&1 &
FIX_PID=$!
OPENCHROME_AUTH_MODE=disabled node dist/cli/index.js serve --http 9878 >/tmp/oc-task-signature.log 2>&1 &
OC_PID=$!
sleep 2
mcp() { curl -s -H 'content-type: application/json' -d "$1" http://localhost:9878/mcp; }

Validation A — success boundary

Create/use a fixture signature equivalent to:

{
  "version": 1,
  "id": "fixture.search.success",
  "description": "Search form reaches result state",
  "inputs": { "query": { "type": "string", "required": true, "redaction": "none" } },
  "allowedTools": ["navigate", "find", "interact", "read_page"],
  "success": { "kind": "dom_text", "selector": "#result", "contains": "Searched: cats" },
  "loopGuards": [{ "kind": "max_observation_calls", "limit": 2, "window": 4 }],
  "budgets": { "maxToolCalls": 8, "maxWallMs": 30000 }
}

Run execute_plan or the implemented signature-aware workflow against http://localhost:9995/search.html.

# Exact tool payload may differ; PR must include the final payload used.
# The result must expose taskSignature.status == "success".
cat /tmp/task-signature-success-response.json | jq -e '.result.content[0].text | fromjson | .taskSignature.status == "success"'

Pass: signature status becomes success once the DOM assertion is true.

Validation B — loop guard catches wandering

Run a fixture sequence that performs repeated read_page/screenshot observations without progress.

cat /tmp/task-signature-loop-response.json | jq -e '.result.content[0].text | fromjson | .taskSignature.status == "stop" and (.taskSignature.reasons | tostring | test("max_observation_calls|max_non_progress_calls"))'

Pass: loop guard returns a deterministic stop status before the global timeout.

Validation C — allowed tool boundary

Attempt to run a plan/signature that uses a tool outside allowedTools, e.g. javascript_tool when not allowed.

cat /tmp/task-signature-tool-boundary.json | jq -e '.result.content[0].text | fromjson | .taskSignature.status == "failure" and (.taskSignature.reasons | tostring | test("not allowed"; "i"))'

Pass: response identifies the disallowed tool and the disallowed step is not executed.

Validation D — backward compatibility

Run the same execute_plan or workflow without a task signature.

cat /tmp/no-signature-response.json | jq -e '.result.content[0].text | fromjson | has("taskSignature") | not'

Pass: existing behavior remains unchanged when no signature is supplied.

Cleanup

kill $OC_PID $FIX_PID
wait $OC_PID $FIX_PID 2>/dev/null || true

Non-goals

  • Do not replace src/contracts/**.
  • Do not create an autonomous task runner.
  • Do not infer signatures from natural language in the server.
  • Do not add new irreversible-action policy beyond existing hooks; this issue only enforces the declared allowedTools boundary for signature-bound execution.

Self-review checklist for implementer

  • Is each status deterministic from tool calls and contract evaluators?
  • Does this reduce ambiguity for agents without adding server-side AI decisions?
  • Are no-signature paths byte/shape-compatible with existing clients?
  • Are secret-marked inputs excluded from logs and reports?

Related DSPy-inspired harness hardening set

These issues are intentionally scoped to OpenChrome-native deterministic harnessing. They must not introduce DSPy/Python runtime dependencies or server-side LLM decisions.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: deterministic task contracts / typed task boundary metadata.
  • Primary deliverable: BrowserTaskSignature, a typed description of allowed task inputs, action space, expected evidence, and stop conditions.
  • Open PR: feat(contracts): add deterministic browser task signatures #1114 (feat/1049-task-signature). Amend that PR rather than duplicating work.
  • Non-goal: LLM prompt generation, DSPy optimization, autonomous planning, or replacing existing Outcome Contracts.

Overlap and conflict resolution

Implementation checklist

  • Define BrowserTaskSignature types/schema with explicit fields for task identity, allowed tools/actions, required evidence, success/failure boundaries, and optional progress requirements.
  • Add validation/normalization utilities that fail closed on unknown or ambiguous signature fields.
  • Integrate only at contract boundaries needed to make the signature useful, without changing tool execution defaults.
  • Add examples/docs showing how a host or harness would attach a signature to a task.
  • Add tests for valid signatures, invalid signatures, stable serialization, action-space constraints, and compatibility with existing Outcome Contracts.

Success criteria

  • A task signature can be serialized, validated, and used as deterministic metadata independent of prompt wording.
  • Existing Outcome Contract behavior continues to pass without requiring signatures.
  • Invalid or ambiguous signatures are rejected with clear diagnostics.
  • The implementation reduces ambiguity about stop/evidence requirements without introducing a new runner.

Post-merge OpenChrome live verification checklist

  • Start OpenChrome with a sample signed task fixture and verify the signature can be loaded/validated.
  • Execute a minimal local-browser task whose allowed evidence and stop boundary are described by the signature.
  • Try an intentionally invalid signature and verify a bounded validation error is returned before task execution.
  • Record the sample signature, validation result, and any contract evidence path in merge verification notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityenhancementNew feature or requestoutcome-contractsVerifiable execution via pre/post-condition contracts (Q2)reliabilityReliability and stability improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions