Skip to content

Conversation

@ShaneIsley
Copy link
Owner

No description provided.

…xecution

- Override handle_error in ThreadingLMServer to silently ignore BrokenPipeError
  and ConnectionResetError (expected when subprocess environments disconnect)
- Update LMRequestHandler.handle() to catch these errors before attempting
  to send error responses on broken connections

This keeps benchmark output clean when running parallel samples.
Add unified model specification format using colon separator:
- --models backend:model [backend:model ...]
- Examples: openai:gpt-4o, anthropic:claude-sonnet-4-20250514
- Model-only format defaults to openai (e.g., "gpt-4o" -> "openai:gpt-4o")

Supports comparing multiple models in a single run:
  python -m benchmarks.cli run --benchmark niah \
    --models openai:gpt-4o anthropic:claude-sonnet-4-20250514

Legacy --backend and --model args still work for backwards compatibility.
@ShaneIsley ShaneIsley merged commit 57fe93e into main Jan 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants