Add xorl rl_on_policy_target + fix req_to_token_pool slot leak#2
Open
kiddyboots216 wants to merge 2 commits intomainfrom
Open
Add xorl rl_on_policy_target + fix req_to_token_pool slot leak#2kiddyboots216 wants to merge 2 commits intomainfrom
kiddyboots216 wants to merge 2 commits intomainfrom
Conversation
ddd0fb2 to
a86c591
Compare
a920a29 to
78fb1c8
Compare
xorl is the new name for the tomni training server. This change: - Replaces tomni and tomni-batch-invariant with xorl and xorl-batch-invariant as rl_on_policy_target choices - Updates model_runner batch-invariant check and NCCL init comments
When a max_new_tokens=0 (prefill-only) request arrives during an idle window, its ScheduleBatch becomes running_batch with is_prefill_only=True. When normal generation requests are later merged in, merge_batch() never updates is_prefill_only, so get_next_batch_to_run() skips the decode path. Requests allocate pool slots during prefill but never decode, never finish, and never free their slots — exhausting the req_to_token_pool. Fix in two places: 1. schedule_batch.py merge_batch(): clear is_prefill_only when merging a batch that contains non-prefill-only requests. 2. scheduler.py: recompute is_prefill_only from actual request state when replacing running_batch with last_batch.
78fb1c8 to
b438667
Compare
qywu
approved these changes
Mar 23, 2026
zzz0906
approved these changes
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
xorlas a valid--rl-on-policy-targetchoice alongsidetomnireq_to_token_poolslot leak caused by staleis_prefill_onlyflag onScheduleBatchBug details
During RL training with Qwen3-235B (8-node training + SGLang inference), SGLang's
req_to_token_poolexhausts all 128 slots and becomes unresponsive (HTTP 503).Root cause: The overlap scheduler's
is_prefill_onlyflag onScheduleBatchis never updated after batch merges. When amax_new_tokens=0request creates a batch withis_prefill_only=True, and that batch becomesrunning_batch, subsequent normal generation requests get merged in but the flag stays True. This causesget_next_batch_to_run()to skip the decode path — requests allocate pool slots during prefill but never decode, never finish, and never free their slots.Fix (4 lines):
schedule_batch.py: Updateis_prefill_onlyinmerge_batch()— only True when ALL merged requests are prefill-onlyscheduler.py: Recomputeis_prefill_onlyfrom actual request state when replacingrunning_batchwithlast_batchTest plan
--rl-on-policy-target xorllaunches SGLang with Qwen3-235B-A22B on 8×H100req_to_token_poolleak warnings in SGLang logs