fix: use aread() to fully consume HTTP response body #1488
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
--use-rollout-routing-replaywith large response payloadsresponse.json()withresponse.aread()+json.loads()to ensure full body consumption_post()andget()functions inhttp_utils.pyProblem
When using routing replay, SGLang returns large
routed_expertsdata (10-15MB per response for 8K sequences). Theresponse.json()method may not fully read the response body, leaving bytes in the TCP receive buffer. This causes connections to hang inCLOSE_WAITstate, making rollout generation stuck at ~99% completion.Test plan
--use-rollout-routing-replay --use-slime-routernetstat -tnp | grep <rollout_manager_pid>CLOSE_WAITstate after rollout completes🤖 Generated with Claude Code