Skip to content

Conversation

@ShaneIsley
Copy link
Owner

No description provided.

Always set RLM verbose=False in benchmark runner to prevent RLM's
configuration box output from interleaving with the progress bar.

The runner's -v flag now only controls per-sample result output:
  [✓] Sample niah-0000: {'correct': 1.0, 'f1': 1.0}

This keeps benchmark output clean while still allowing detailed
per-sample visibility when requested.
Adds TestModelPropagation test class with 4 tests:
- test_runner_config_model_name: verifies model_name in backend_kwargs
- test_runner_config_preserves_extra_kwargs: verifies extra kwargs preserved
- test_model_spec_to_runner: end-to-end model spec parsing test
- test_rlm_receives_correct_config: mocked test verifying RLM receives correct backend_kwargs

These tests confirm that the backend:model CLI syntax correctly propagates
model configuration through BenchmarkRunner to RLM initialization.
@ShaneIsley ShaneIsley merged commit bb1f2ab into main Jan 16, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants