Skip to content

feat(judges): Add harness context to judge API#46

Merged
dcramer merged 1 commit into
mainfrom
codex/judge-harness-context
May 3, 2026
Merged

feat(judges): Add harness context to judge API#46
dcramer merged 1 commit into
mainfrom
codex/judge-harness-context

Conversation

@dcramer
Copy link
Copy Markdown
Member

@dcramer dcramer commented May 3, 2026

Give every judge the same JudgeContext populated from the configured harness. LLM-backed judges now reuse the suite model seam through the required harness.prompt(...) method, while adapter-specific runtime objects stay scoped to app execution internals such as tools and events.

Single Judge API

Automatic judges, explicit toSatisfyJudge(...) calls, and built-in deterministic judges all receive the same normalized context. The harness-specific judge context and judge-facing runtime object are gone, so custom judges read run data, metadata, tool calls, and the configured harness from one place.

Implicit Matcher Context

Fixture-backed runs register their run, session, and output objects so matcher calls can infer input, metadata, tool calls, and harness without repetitive options. Exact registered objects win over the latest-run fallback, which keeps expect(result.output).toSatisfyJudge(...) concise without hardcoding one narrow case.

Required Harness Prompt

Harness.prompt is required across the root type and first-party harness constructors. Rubric and factuality judges call harness.prompt(...); harness.run(...) remains the explicit escape hatch for intentionally running the application again.

API Policy

Add a small API design policy for this lesson: prefer one shared contextual API, keep owned capabilities mandatory, put capabilities on the object that owns their configuration, and avoid parallel public objects with overlapping lifecycle names such as harness and runtime.

Fixes #45

@dcramer dcramer force-pushed the codex/judge-harness-context branch 2 times, most recently from bc202a9 to 7897a6d Compare May 3, 2026 21:56
Pass configured harness context into automatic and explicit judge calls so rubric judges can reuse the suite prompt seam without duplicating provider setup.

Register fixture run context for matcher assertions, including raw output and session objects, while keeping explicit matcher overrides available for manual values.

Make harness prompt configuration required and keep judge prompting on context.harness.prompt(...) so the API does not split judge capabilities across harness and runtime objects.

Fixes GH-45

Co-Authored-By: OpenAI Codex <codex@openai.com>
@dcramer dcramer force-pushed the codex/judge-harness-context branch from 7897a6d to df19255 Compare May 3, 2026 22:09
@dcramer dcramer marked this pull request as ready for review May 3, 2026 22:22
@dcramer dcramer merged commit 760ea18 into main May 3, 2026
8 checks passed
@dcramer dcramer deleted the codex/judge-harness-context branch May 3, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a provider-agnostic LLM rubric judge adapter

1 participant