feat(contracts): add deterministic browser task signatures

## Problem

OpenChrome has outcome contracts, plan execution, and hinting, but task boundaries are still spread across natural language instructions, tool arguments, and optional success checks. This makes it harder for agents to know when to stop, when a tool sequence is outside the intended action space, or whether repeated observation calls are wandering.

DSPy-style lesson to adopt: **separate task interface from prompt/instruction wording**. OpenChrome should add a deterministic, typed browser task signature that declares inputs, allowed tools, success contract, stop conditions, and loop guards.

## Direction / fit

- Tier: `core` for schema validation and deterministic evaluation; `pilot` only if later wired to policy/retry execution.
- No LLM judgment, no DSPy dependency, no prompt optimizer.
- Builds on existing `src/contracts/**` assertions rather than replacing them.
- Should reduce wandering by making success and stop conditions explicit in machine-readable form.

## Goal

Add a `BrowserTaskSignature` schema and validator that can be attached to compiled plans, workflow runs, or explicit tool calls to guide deterministic progress evaluation and response metadata.

## Proposed implementation

1. Define a schema, e.g. `src/contracts/task-signature.ts`:
   ```ts
   interface BrowserTaskSignature {
     version: 1;
     id: string;
     description: string;
     inputs: Record<string, { type: 'string' | 'number' | 'boolean'; required: boolean; redaction?: 'secret' | 'none' }>;
     allowedTools: string[];
     success: Assertion;
     stopWhen?: Assertion[];
     failureWhen?: Assertion[];
     loopGuards?: Array<{
       kind: 'max_same_tool' | 'max_observation_calls' | 'max_non_progress_calls';
       limit: number;
       window: number;
     }>;
     budgets?: {
       maxToolCalls?: number;
       maxWallMs?: number;
     };
   }
   ```
2. Add validator with batched errors, consistent with existing contract validator style.
3. Add deterministic evaluator helper that consumes:
   - task signature;
   - current EvalContext;
   - recent tool call summary;
   - elapsed time/tool count.
4. Evaluator returns structured status and performs a preflight check before executing a signature-bound compiled plan. If a planned step uses a tool outside `allowedTools`, the plan must not execute and must return `failure` with a disallowed-tool reason:
   ```ts
   type TaskSignatureStatus =
     | { status: 'continue'; reasons: string[] }
     | { status: 'success'; evidence: unknown }
     | { status: 'stop'; reasons: string[] }
     | { status: 'failure'; reasons: string[] }
     | { status: 'budget_exhausted'; reasons: string[] };
   ```
5. Integrate minimally where it creates clear value without changing default behavior:
   - `execute_plan` may accept an optional signature ID/object and include status in result metadata.
   - `workflow_status` may surface signature progress if a workflow was initialized with one.
   - Existing calls without a signature behave exactly as before.
6. Add docs with examples for safe tasks and stop conditions.

## Acceptance criteria

- [ ] `BrowserTaskSignature` schema is implemented and exported from a stable module.
- [ ] Validator returns all structural errors at once and rejects unknown assertion kinds via existing contract validation.
- [ ] Evaluator detects success, failure, explicit stop, max tool calls, max wall time, and loop guard violations.
- [ ] Optional integration with `execute_plan` or workflow status is backward-compatible: no signature means no behavior change except absent metadata.
- [ ] Tests cover valid signature, invalid schema, success assertion pass, failure assertion pass, stop condition pass, budget exhaustion, loop guard violation, and redacted secret inputs.
- [ ] Docs explain that signatures are deterministic task boundaries, not LLM prompts.
- [ ] No outbound LLM calls, Python, or DSPy dependency is introduced.

## Required OpenChrome real-validation after implementation

Use local fixtures and a built OpenChrome server.

### Setup

```bash
npm ci
npm run build
node tests/fixtures/sites/task-signature/serve.mjs --port 9995 >/tmp/oc-task-signature-fixtures.log 2>&1 &
FIX_PID=$!
OPENCHROME_AUTH_MODE=disabled node dist/cli/index.js serve --http 9878 >/tmp/oc-task-signature.log 2>&1 &
OC_PID=$!
sleep 2
mcp() { curl -s -H 'content-type: application/json' -d "$1" http://localhost:9878/mcp; }
```

### Validation A — success boundary

Create/use a fixture signature equivalent to:

```json
{
  "version": 1,
  "id": "fixture.search.success",
  "description": "Search form reaches result state",
  "inputs": { "query": { "type": "string", "required": true, "redaction": "none" } },
  "allowedTools": ["navigate", "find", "interact", "read_page"],
  "success": { "kind": "dom_text", "selector": "#result", "contains": "Searched: cats" },
  "loopGuards": [{ "kind": "max_observation_calls", "limit": 2, "window": 4 }],
  "budgets": { "maxToolCalls": 8, "maxWallMs": 30000 }
}
```

Run `execute_plan` or the implemented signature-aware workflow against `http://localhost:9995/search.html`.

```bash
# Exact tool payload may differ; PR must include the final payload used.
# The result must expose taskSignature.status == "success".
cat /tmp/task-signature-success-response.json | jq -e '.result.content[0].text | fromjson | .taskSignature.status == "success"'
```

**Pass:** signature status becomes `success` once the DOM assertion is true.

### Validation B — loop guard catches wandering

Run a fixture sequence that performs repeated `read_page`/screenshot observations without progress.

```bash
cat /tmp/task-signature-loop-response.json | jq -e '.result.content[0].text | fromjson | .taskSignature.status == "stop" and (.taskSignature.reasons | tostring | test("max_observation_calls|max_non_progress_calls"))'
```

**Pass:** loop guard returns a deterministic stop status before the global timeout.

### Validation C — allowed tool boundary

Attempt to run a plan/signature that uses a tool outside `allowedTools`, e.g. `javascript_tool` when not allowed.

```bash
cat /tmp/task-signature-tool-boundary.json | jq -e '.result.content[0].text | fromjson | .taskSignature.status == "failure" and (.taskSignature.reasons | tostring | test("not allowed"; "i"))'
```

**Pass:** response identifies the disallowed tool and the disallowed step is not executed.

### Validation D — backward compatibility

Run the same `execute_plan` or workflow without a task signature.

```bash
cat /tmp/no-signature-response.json | jq -e '.result.content[0].text | fromjson | has("taskSignature") | not'
```

**Pass:** existing behavior remains unchanged when no signature is supplied.

### Cleanup

```bash
kill $OC_PID $FIX_PID
wait $OC_PID $FIX_PID 2>/dev/null || true
```

## Non-goals

- Do not replace `src/contracts/**`.
- Do not create an autonomous task runner.
- Do not infer signatures from natural language in the server.
- Do not add new irreversible-action policy beyond existing hooks; this issue only enforces the declared `allowedTools` boundary for signature-bound execution.

## Self-review checklist for implementer

- [ ] Is each status deterministic from tool calls and contract evaluators?
- [ ] Does this reduce ambiguity for agents without adding server-side AI decisions?
- [ ] Are no-signature paths byte/shape-compatible with existing clients?
- [ ] Are secret-marked inputs excluded from logs and reports?

## Related DSPy-inspired harness hardening set

- #1047 — metric-driven reliability certification suite
- #1048 — recovery feedback bundles
- #1049 — deterministic browser task signatures
- #1050 — recovery/hint candidate evaluation
- #1051 — bounded parallel runner with straggler reporting

These issues are intentionally scoped to OpenChrome-native deterministic harnessing. They must not introduce DSPy/Python runtime dependencies or server-side LLM decisions.



## Curated scope, overlap handling, and verification checklist

### Scope classification
- **Canonical lane:** deterministic task contracts / typed task boundary metadata.
- **Primary deliverable:** `BrowserTaskSignature`, a typed description of allowed task inputs, action space, expected evidence, and stop conditions.
- **Open PR:** #1114 (`feat/1049-task-signature`). Amend that PR rather than duplicating work.
- **Non-goal:** LLM prompt generation, DSPy optimization, autonomous planning, or replacing existing Outcome Contracts.

### Overlap and conflict resolution
- [ ] Keep separate from #1031: final verification gates may consume signatures, but this issue defines the signature contract itself.
- [ ] Keep separate from #1041: bulk progress can be represented in a signature, but the bulk progress runtime guard remains #1041.
- [ ] Keep separate from #1060: progress diagnostics can reference signatures, but status reporting is not implemented here.
- [ ] Keep separate from #1058/#1047: runners/certification may load signatures, but benchmark policy is out of scope.

### Implementation checklist
- [ ] Define `BrowserTaskSignature` types/schema with explicit fields for task identity, allowed tools/actions, required evidence, success/failure boundaries, and optional progress requirements.
- [ ] Add validation/normalization utilities that fail closed on unknown or ambiguous signature fields.
- [ ] Integrate only at contract boundaries needed to make the signature useful, without changing tool execution defaults.
- [ ] Add examples/docs showing how a host or harness would attach a signature to a task.
- [ ] Add tests for valid signatures, invalid signatures, stable serialization, action-space constraints, and compatibility with existing Outcome Contracts.

### Success criteria
- [ ] A task signature can be serialized, validated, and used as deterministic metadata independent of prompt wording.
- [ ] Existing Outcome Contract behavior continues to pass without requiring signatures.
- [ ] Invalid or ambiguous signatures are rejected with clear diagnostics.
- [ ] The implementation reduces ambiguity about stop/evidence requirements without introducing a new runner.

### Post-merge OpenChrome live verification checklist
- [ ] Start OpenChrome with a sample signed task fixture and verify the signature can be loaded/validated.
- [ ] Execute a minimal local-browser task whose allowed evidence and stop boundary are described by the signature.
- [ ] Try an intentionally invalid signature and verify a bounded validation error is returned before task execution.
- [ ] Record the sample signature, validation result, and any contract evidence path in merge verification notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(contracts): add deterministic browser task signatures #1049

Problem

Direction / fit

Goal

Proposed implementation

Acceptance criteria

Required OpenChrome real-validation after implementation

Setup

Validation A — success boundary

Validation B — loop guard catches wandering

Validation C — allowed tool boundary

Validation D — backward compatibility

Cleanup

Non-goals

Self-review checklist for implementer

Related DSPy-inspired harness hardening set

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(contracts): add deterministic browser task signatures #1049

Description

Problem

Direction / fit

Goal

Proposed implementation

Acceptance criteria

Required OpenChrome real-validation after implementation

Setup

Validation A — success boundary

Validation B — loop guard catches wandering

Validation C — allowed tool boundary

Validation D — backward compatibility

Cleanup

Non-goals

Self-review checklist for implementer

Related DSPy-inspired harness hardening set

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions