Add xorl rl_on_policy_target + fix req_to_token_pool slot leak by kiddyboots216 · Pull Request #2 · togethercomputer/xorl-sglang

kiddyboots216 · 2026-03-20T09:47:46Z

Summary

Add xorl as a valid --rl-on-policy-target choice alongside tomni
Fix req_to_token_pool slot leak caused by stale is_prefill_only flag on ScheduleBatch

Bug details

During RL training with Qwen3-235B (8-node training + SGLang inference), SGLang's req_to_token_pool exhausts all 128 slots and becomes unresponsive (HTTP 503).

Root cause: The overlap scheduler's is_prefill_only flag on ScheduleBatch is never updated after batch merges. When a max_new_tokens=0 request creates a batch with is_prefill_only=True, and that batch becomes running_batch, subsequent normal generation requests get merged in but the flag stays True. This causes get_next_batch_to_run() to skip the decode path — requests allocate pool slots during prefill but never decode, never finish, and never free their slots.

Fix (4 lines):

schedule_batch.py: Update is_prefill_only in merge_batch() — only True when ALL merged requests are prefill-only
scheduler.py: Recompute is_prefill_only from actual request state when replacing running_batch with last_batch

Test plan

Verified --rl-on-policy-target xorl launches SGLang with Qwen3-235B-A22B on 8×H100
Ran 4+ steps of RL training (32×32 batch, pipeline mode) — SGLang stays healthy (HTTP 200)
Verified 0 req_to_token_pool leak warnings in SGLang logs

xorl is the new name for the tomni training server. This change: - Replaces tomni and tomni-batch-invariant with xorl and xorl-batch-invariant as rl_on_policy_target choices - Updates model_runner batch-invariant check and NCCL init comments

When a max_new_tokens=0 (prefill-only) request arrives during an idle window, its ScheduleBatch becomes running_batch with is_prefill_only=True. When normal generation requests are later merged in, merge_batch() never updates is_prefill_only, so get_next_batch_to_run() skips the decode path. Requests allocate pool slots during prefill but never decode, never finish, and never free their slots — exhausting the req_to_token_pool. Fix in two places: 1. schedule_batch.py merge_batch(): clear is_prefill_only when merging a batch that contains non-prefill-only requests. 2. scheduler.py: recompute is_prefill_only from actual request state when replacing running_batch with last_batch.

kiddyboots216 force-pushed the xorl-rl-target branch 3 times, most recently from ddd0fb2 to a86c591 Compare March 20, 2026 10:10

kiddyboots216 changed the title ~~Add xorl as rl_on_policy_target choice~~ Add xorl rl_on_policy_target + fix req_to_token_pool slot leak Mar 20, 2026

kiddyboots216 force-pushed the xorl-rl-target branch from a920a29 to 78fb1c8 Compare March 20, 2026 19:15

kiddyboots216 added 2 commits March 22, 2026 23:08

Rename tomni rl_on_policy_target to xorl

9452505

xorl is the new name for the tomni training server. This change: - Replaces tomni and tomni-batch-invariant with xorl and xorl-batch-invariant as rl_on_policy_target choices - Updates model_runner batch-invariant check and NCCL init comments

kiddyboots216 force-pushed the xorl-rl-target branch from 78fb1c8 to b438667 Compare March 23, 2026 06:09

kiddyboots216 requested a review from qywu March 23, 2026 06:46

qywu approved these changes Mar 23, 2026

View reviewed changes

zzz0906 approved these changes Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xorl rl_on_policy_target + fix req_to_token_pool slot leak#2

Add xorl rl_on_policy_target + fix req_to_token_pool slot leak#2
kiddyboots216 wants to merge 2 commits intomainfrom
xorl-rl-target

kiddyboots216 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kiddyboots216 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug details

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kiddyboots216 commented Mar 20, 2026 •

edited

Loading