Claude/rlm testing framework byf lb #20

ShaneIsley · 2026-01-16T22:13:08Z

No description provided.

Always set RLM verbose=False in benchmark runner to prevent RLM's configuration box output from interleaving with the progress bar. The runner's -v flag now only controls per-sample result output: [✓] Sample niah-0000: {'correct': 1.0, 'f1': 1.0} This keeps benchmark output clean while still allowing detailed per-sample visibility when requested.

Adds TestModelPropagation test class with 4 tests: - test_runner_config_model_name: verifies model_name in backend_kwargs - test_runner_config_preserves_extra_kwargs: verifies extra kwargs preserved - test_model_spec_to_runner: end-to-end model spec parsing test - test_rlm_receives_correct_config: mocked test verifying RLM receives correct backend_kwargs These tests confirm that the backend:model CLI syntax correctly propagates model configuration through BenchmarkRunner to RLM initialization.

claude added 2 commits January 16, 2026 21:53

ShaneIsley merged commit bb1f2ab into main Jan 16, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/rlm testing framework byf lb #20

Claude/rlm testing framework byf lb #20

Uh oh!

ShaneIsley commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Claude/rlm testing framework byf lb #20

Claude/rlm testing framework byf lb #20

Uh oh!

Conversation

ShaneIsley commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants