feat(harness): Add OpenAI Agents harness#53
Merged
Conversation
Add a first-party OpenAI Agents harness and demo app for refund evals. Move VCR policy to harness-level toolReplay config for AI SDK and Pi harnesses, and reject unsafe OpenAI Agents replay configs instead of silently running live tools. Fixes GH-51 Co-Authored-By: OpenAI Codex <codex@openai.com>
Show a small describeEval usage block in each harness README so the public API shape is visible alongside harness construction. Refs GH-51 Co-Authored-By: OpenAI Codex <codex@openai.com>
Keep shared demo eval replay defaults in the eval CLI so all demo packages record and replay with the same behavior unless callers override the environment. Prefer locally captured OpenAI Agents tool results over model-visible output wrappers, and cover the demo CLI defaults with tests. Refs GH-51 Co-Authored-By: OpenAI Codex <codex@openai.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 532af0a. Configure here.
Keep explicit null tool outputs from locally captured OpenAI Agents calls instead of treating them as missing when merging SDK run items. Include script helper tests in the root Vitest targets so shared eval CLI defaults are covered by pnpm test. Refs GH-51 Co-Authored-By: OpenAI Codex <codex@openai.com>
Drop an unreachable string-model branch now covered by stringProperty(result, "model"). This keeps OpenAI Agents metadata normalization simpler without changing behavior. Refs GH-51 Co-Authored-By: OpenAI Codex <codex@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Add a first-party OpenAI Agents harness with a refund demo app, release metadata, and docs so OpenAI Agents workflows can run through normalized vitest-evals sessions. Replay configuration now lives at the harness boundary through
toolReplay, with AI SDK and Pi examples updated away from tool-definition opt-ins.Replay Safety
Pi native tool replay records in a native cassette namespace while delegated runtime calls avoid duplicate traces and cassette writes. OpenAI Agents replay config now fails before execution for unknown tools or tools without
invoke(), and locally captured function-tool results are preserved over model-visible output wrappers, including explicitnullresults.Demo And Docs
Add
apps/demo-openai-agentswith deterministic tests, passing refund evals, and failing examples that are skipped withoutOPENAI_API_KEY. Demo eval scripts now share a default replay env ofautowith recordings under.vitest-evals/recordings, while still respecting explicit caller overrides. Each harness README includes a minimaldescribeEval(..., { harness }, ...)example so the public API shape is visible next to harness construction.Test Coverage
Root test scripts now include
scriptsso shared eval CLI helper tests run underpnpm testand CI.Validated with
pnpm exec biome lint .,pnpm run typecheck,pnpm run test,pnpm release:check,pnpm run build, andpnpm --dir apps/demo-openai-agents run evals.Fixes GH-51