Skip to content

Conversation

@alt-glitch
Copy link
Contributor

@alt-glitch alt-glitch commented Jan 6, 2026

hey @alexzhang13! adding an example from oolong bench to the examples/.
running this gives a proper RLM rollout that people can look at and play with.

PR Summary

  • Adds an example demonstrating recursive RLM rollouts on the Oolong benchmark
  • Previously, examples lacked coverage of recursive calls
  • Loads context and question from oolongbench/oolong-real dataset
  • Runs RLM completion with logging enabled
  • Validates response against expected answer

Adds an example demonstrating recursive RLM rollouts on the Oolong
benchmark. Previously, examples lacked coverage of recursive calls.

- Loads context and question from oolongbench/oolong-real dataset
- Runs RLM completion with logging enabled
- Validates response against expected answer
@alt-glitch alt-glitch force-pushed the examples/oolong-benchmark branch from 87757d3 to 6367b6d Compare January 6, 2026 19:00
ShaneIsley pushed a commit to ShaneIsley/rlm that referenced this pull request Jan 14, 2026
Adds an example demonstrating recursive RLM rollouts on the Oolong
benchmark. Previously, examples lacked coverage of recursive calls.

- Loads context and question from oolongbench/oolong-real dataset
- Runs RLM completion with logging enabled
- Validates response against expected answer

Original-Author: alt-glitch (balyan.sid@gmail.com)
Upstream-PR: alexzhang13#34
Comment on lines +25 to +29
def load_oolong_row(index: int = 1) -> dict:
"""Load a single row from the Oolong benchmark."""
streaming_ds = load_dataset("oolongbench/oolong-real", "toy_dnd", split="test", streaming=True)
row = next(islice(streaming_ds, index, index + 1))
return row

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def load_oolong_row(index: int = 1) -> dict:
"""Load a single row from the Oolong benchmark."""
streaming_ds = load_dataset("oolongbench/oolong-real", "toy_dnd", split="test", streaming=True)
row = next(islice(streaming_ds, index, index + 1))
return row
def cp1252_fix(row: dict | str) -> dict | str:
"""Fixes cp1252 encoding issues in dataset rows."""
if isinstance(row, dict):
for key, value in row.items():
if isinstance(value, str):
row[key] = value.encode('cp1252','replace').decode('cp1252')
else:
row = row.encode('cp1252','replace').decode('cp1252')
return row
def load_oolong_row(index: int = 1) -> dict:
"""Load a single row from the Oolong benchmark."""
streaming_ds = load_dataset("oolongbench/oolong-real", "toy_dnd", split="test", streaming=True)
row = next(islice(streaming_ds, index, index + 1))
return cp1252_fix(row)

Thanks for sharing this example! I noticed that cp1252 decoding is broken on windows, so here's a quick fix for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants