Add helpers for constructing HarnessRun objects

## Problem

Custom harnesses currently need to hand-assemble `HarnessRun` objects. For realistic app/integration harnesses, that means every consumer repeats low-level normalization code:

- convert arbitrary app artifacts into `JsonValue`
- build normalized `messages`
- attach assistant output and `session.outputText`
- attach tool calls to a synthetic assistant/tool message
- preserve metadata
- fill `usage`, `errors`, `artifacts`, and optional timings

The primitives exist (`HarnessRun`, `NormalizedMessage`, `ToolCallRecord`, `toJsonValue`, etc.), but the core package does not provide a higher-level construction helper. This makes custom harnesses verbose and easier to get subtly wrong, especially around reporter-facing output and judge-facing text.

## Desired Shape

Add a small helper for constructing a normalized `HarnessRun` from common app-harness artifacts.

Example:

```ts
import { createHarnessRun } from "vitest-evals/harness";

return createHarnessRun({
  output: {
    assistant_posts: posts,
    channel_posts: channelPosts,
    reactions,
  },
  messages: posts.map((post) => ({
    role: "assistant",
    content: post.text,
    metadata: {
      channel: post.channel,
      thread_ts: post.thread_ts,
      files: post.files,
    },
  })),
  toolCalls,
  usage: {
    toolCalls: toolCalls.length,
  },
});
```

The helper should normalize values into the existing `JsonValue` contract and fill the fields judges/reporters expect.

## Possible API

Minimal option:

```ts
type CreateHarnessRunOptions = {
  output?: unknown;
  outputText?: string;
  messages?: Array<{
    role: NormalizedMessage["role"];
    content?: unknown;
    toolCalls?: ToolCallRecord[];
    metadata?: Record<string, unknown>;
  }>;
  toolCalls?: ToolCallRecord[];
  usage?: UsageSummary;
  timings?: TimingSummary;
  artifacts?: Record<string, unknown>;
  errors?: Array<Record<string, unknown>>;
  metadata?: Record<string, unknown>;
};

function createHarnessRun(options: CreateHarnessRunOptions): HarnessRun;
```

Useful defaults:

- `output` is normalized with `toJsonValue`.
- `session.outputText` defaults to `outputText`, then string output, then pretty JSON for object/array output.
- `messages` are normalized with `normalizeContent` / `normalizeMetadata`.
- `toolCalls` can either be appended to an existing assistant/tool message or added as a synthetic assistant tool-call message when no message includes them.
- `usage.toolCalls` defaults to `toolCalls.length` when omitted.
- `errors` defaults to `[]`.

Potential convenience helpers:

```ts
assistantMessage(content, metadata?)
userMessage(content, metadata?)
toolCall(name, args?, result?)
```

Those should only be added if they keep the API smaller in practice; a single `createHarnessRun` may be enough.

## Design Constraints

- Do not make app harnesses opaque. The helper should build a standard `HarnessRun`, not introduce a new abstraction layer around `Harness.run`.
- Preserve explicit consumer values. If callers pass `session.outputText`, `usage`, `errors`, or `artifacts`, the helper should not silently replace them.
- Keep normalization predictable and JSON-safe.
- Avoid app-specific concepts like Slack posts, web requests, or agent decisions.

## Why This Helps

- Reduces boilerplate in custom harnesses that are not covered by first-party runtime adapters.
- Makes reporter-facing and judge-facing output more consistent.
- Lowers the chance that custom harnesses omit `errors`, forget `outputText`, or attach tool calls in a shape built-in judges do not read.
- Gives docs a canonical way to teach custom harness authoring.

## Acceptance Criteria

- A custom app harness can build a correct `HarnessRun` with one helper call.
- The helper covers output, messages, tool calls, metadata, usage, artifacts, timings, and errors.
- Tests cover non-JSON values, omitted fields/defaults, tool-call attachment, and explicit override behavior.
- Existing lower-level helpers remain available for advanced harnesses.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add helpers for constructing HarnessRun objects #49

Problem

Desired Shape

Possible API

Design Constraints

Why This Helps

Acceptance Criteria

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add helpers for constructing HarnessRun objects #49

Description

Problem

Desired Shape

Possible API

Design Constraints

Why This Helps

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions