Support assertions on evals

While evals are primarily qualitative, one important thing we care about is the _way_ an LLM comes to answer.

For example, I want to test that a specific tool was called as part of an eval.

```ts
import { describeEval } from "vitest-evals";
import { Factuality, FIXTURES, TaskRunner } from "./utils";

describeEval("begin-autofix", {
  data: async () => {
    return [
      {
        input: `Can you root cause this issue in Sentry?\n${FIXTURES.autofixIssueUrl}\n\nJust kick off the process and give me the Run ID.`,
        expected: "The analysis has started\n.Run ID: 123",
        assert: () => {
          // do something here
        }
      },
      {
        input: `Whats the status on rooting causing this issue in Sentry?\n${FIXTURES.autofixIssueUrl}`,
        expected:
          'Batched TRPC request incorrectly passed bottle ID 3216 to `bottleById`, instead of 16720, resulting in a "Bottle not found" error.',
      },
      {
        input: `Can you root cause this issue and retrieve the analysis?\n${FIXTURES.autofixIssueUrl}`,
        expected:
          'Batched TRPC request incorrectly passed bottle ID 3216 to `bottleById`, instead of 16720, resulting in a "Bottle not found" error.',
      },
    ];
  },
  task: TaskRunner(),
  scorers: [Factuality()],
  threshold: 0.6,
  timeout: 30000,
});
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support assertions on evals #8

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Support assertions on evals #8

Description

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions