Skip to content

Add xorl rl_on_policy_target + fix req_to_token_pool slot leak#2

Open
kiddyboots216 wants to merge 2 commits intomainfrom
xorl-rl-target
Open

Add xorl rl_on_policy_target + fix req_to_token_pool slot leak#2
kiddyboots216 wants to merge 2 commits intomainfrom
xorl-rl-target

Conversation

@kiddyboots216
Copy link
Copy Markdown
Contributor

@kiddyboots216 kiddyboots216 commented Mar 20, 2026

Summary

  • Add xorl as a valid --rl-on-policy-target choice alongside tomni
  • Fix req_to_token_pool slot leak caused by stale is_prefill_only flag on ScheduleBatch

Bug details

During RL training with Qwen3-235B (8-node training + SGLang inference), SGLang's req_to_token_pool exhausts all 128 slots and becomes unresponsive (HTTP 503).

Root cause: The overlap scheduler's is_prefill_only flag on ScheduleBatch is never updated after batch merges. When a max_new_tokens=0 request creates a batch with is_prefill_only=True, and that batch becomes running_batch, subsequent normal generation requests get merged in but the flag stays True. This causes get_next_batch_to_run() to skip the decode path — requests allocate pool slots during prefill but never decode, never finish, and never free their slots.

Fix (4 lines):

  1. schedule_batch.py: Update is_prefill_only in merge_batch() — only True when ALL merged requests are prefill-only
  2. scheduler.py: Recompute is_prefill_only from actual request state when replacing running_batch with last_batch

Test plan

  • Verified --rl-on-policy-target xorl launches SGLang with Qwen3-235B-A22B on 8×H100
  • Ran 4+ steps of RL training (32×32 batch, pipeline mode) — SGLang stays healthy (HTTP 200)
  • Verified 0 req_to_token_pool leak warnings in SGLang logs

@kiddyboots216 kiddyboots216 force-pushed the xorl-rl-target branch 3 times, most recently from ddd0fb2 to a86c591 Compare March 20, 2026 10:10
@kiddyboots216 kiddyboots216 changed the title Add xorl as rl_on_policy_target choice Add xorl rl_on_policy_target + fix req_to_token_pool slot leak Mar 20, 2026
xorl is the new name for the tomni training server. This change:
- Replaces tomni and tomni-batch-invariant with xorl and
  xorl-batch-invariant as rl_on_policy_target choices
- Updates model_runner batch-invariant check and NCCL init comments
When a max_new_tokens=0 (prefill-only) request arrives during an idle
window, its ScheduleBatch becomes running_batch with is_prefill_only=True.
When normal generation requests are later merged in, merge_batch() never
updates is_prefill_only, so get_next_batch_to_run() skips the decode
path. Requests allocate pool slots during prefill but never decode, never
finish, and never free their slots — exhausting the req_to_token_pool.

Fix in two places:

1. schedule_batch.py merge_batch(): clear is_prefill_only when merging
   a batch that contains non-prefill-only requests.

2. scheduler.py: recompute is_prefill_only from actual request state
   when replacing running_batch with last_batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants