Claude/rlm testing framework byf lb #19

ShaneIsley · 2026-01-16T07:04:54Z

No description provided.

…xecution - Override handle_error in ThreadingLMServer to silently ignore BrokenPipeError and ConnectionResetError (expected when subprocess environments disconnect) - Update LMRequestHandler.handle() to catch these errors before attempting to send error responses on broken connections This keeps benchmark output clean when running parallel samples.

Add unified model specification format using colon separator: - --models backend:model [backend:model ...] - Examples: openai:gpt-4o, anthropic:claude-sonnet-4-20250514 - Model-only format defaults to openai (e.g., "gpt-4o" -> "openai:gpt-4o") Supports comparing multiple models in a single run: python -m benchmarks.cli run --benchmark niah \ --models openai:gpt-4o anthropic:claude-sonnet-4-20250514 Legacy --backend and --model args still work for backwards compatibility.

claude added 2 commits January 16, 2026 06:16

ShaneIsley merged commit 57fe93e into main Jan 16, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/rlm testing framework byf lb #19

Claude/rlm testing framework byf lb #19

Uh oh!

ShaneIsley commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Claude/rlm testing framework byf lb #19

Claude/rlm testing framework byf lb #19

Uh oh!

Conversation

ShaneIsley commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants