Skip to content

Fix dns-blacklisting reward bug, standardize task.toml format#22

Merged
benediktstroebl merged 2 commits intomainfrom
feature/testing-28r
Mar 24, 2026
Merged

Fix dns-blacklisting reward bug, standardize task.toml format#22
benediktstroebl merged 2 commits intomainfrom
feature/testing-28r

Conversation

@benediktstroebl
Copy link
Copy Markdown
Collaborator

Tested all 8 recipes with oracle agent (8/8 pass) and 5 with live codex/gpt-5-mini (5/5 pass). Found and fixed:

  • dns-blacklisting test.sh always reported reward=1 regardless of test outcome. The || true + PIPESTATUS[0] pattern was broken — confirmed by running a failed agent that still got reward=1. Replaced with the standard $? pattern used by all other recipes.
  • simple-task and multi-reward task.toml used memory = "2G" / storage = "10G" while the other 6 recipes used memory_mb / storage_mb. Standardized to the majority format.
  • multi-reward README buried the required -c config.yaml flag in a "Metrics note" section. Without it, harbor run crashes with ValueError: Expected exactly one key in reward dictionary, got 2. Moved it front and center.

- dns-blacklisting test.sh: the || true + PIPESTATUS[0] pattern always
  reported reward=1 regardless of test outcome. Replaced with standard
  $? pattern used by all other recipes.
- simple-task and multi-reward task.toml: standardize memory/storage
  fields to memory_mb/storage_mb matching the other 6 recipes.
- multi-reward README: move the required -c config.yaml flag into the
  primary run command since harbor run crashes without it.
@benediktstroebl benediktstroebl merged commit 9fd8cef into main Mar 24, 2026
2 checks passed
@benediktstroebl benediktstroebl deleted the feature/testing-28r branch March 24, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant