Integrate sglang disagg models running on SLURM Cluster #42
+490
−19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Integrate sglang disagg models running on SLURM Cluster
Technical Details
Refactor madengine to run models in sglang_disagg of MAD-private repo .
(1) adopted models have been added to models.json
(2) use the same interface of legacy madengine, i.e., madengine run --tags sglang_disagg_pd_qwen3-32B --additional-context "{'slurm_args': {'FRAMEWORK': 'sglang_disagg', 'PREFILL_NODES': '2', 'DECODE_NODES': '2', 'PARTITION': 'amd-rccl', 'TIME': '12:00:00', 'DOCKER_IMAGE': ''}}"
(3) update the field of slurm_args to context, the fields include FRAMEWORK, PREFILL_NODES, DECODE_NODES, PARTITION, TIME, DOCKER_IMAGE. if DOCKER_IMAGE is empty, it will use the default image in run.sh. Read the field of the selected model in models.json, the model name which will be set as MODEL_NAME (the string without --model) is in the attribute of args, e.g., --model DeepSeek-V2.
(4) if the flow check the slurm_args in context, it will execute the script 'scripts/sglang_disagg/run.sh' to submit the job to SLURM cluster directly, skip the run_model function to build docker image and run container.
Test Plan
Test Result
Submission Checklist