Skip to content

Wprazuch/byob gym integration#782

Draft
wprazuch wants to merge 4 commits intomainfrom
wprazuch/byob-gym-integration
Draft

Wprazuch/byob gym integration#782
wprazuch wants to merge 4 commits intomainfrom
wprazuch/byob-gym-integration

Conversation

@wprazuch
Copy link
Contributor

No description provided.

Glorf and others added 4 commits February 25, 2026 19:12
Signed-off-by: Michal Bien <mbien@nvidia.com>
Signed-off-by: Michal Bien <mbien@nvidia.com>
Implements ByobGymHarness(GymHarness) that adapts any BYOB benchmark
(@benchmark + @scorer) to Gym's 2-method resource server interface:
- get_dataset(): loads data, renders prompts, formats for Gym
- verify(): calls BYOB scorer, maps scores to (reward, extracted_answer)

Score-to-reward mapping uses a priority chain:
correct > judge_score > first bool > first float > mean

Includes 68 unit tests covering score mapping, dataset formatting,
verify pipeline, factory methods, and end-to-end integration.

Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@wprazuch wprazuch changed the base branch from gym-integration to main February 27, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants