Skip to content

Feature/batch mode #113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Jul 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
507d421
Update load config function to allow user to pass in config path in a…
XkunW May 8, 2025
e8f2864
Fix docstring indentation
XkunW May 13, 2025
99396c3
Add in the only other 2 short long name mappings for vllm args
XkunW May 13, 2025
7c14886
Add batch mode placeholder
XkunW May 15, 2025
bea7c07
Merge branch 'main' into feature/batch-mode
XkunW May 21, 2025
2856ea3
Merge branch 'feature/batch-mode' of https://github.com/VectorInstitu…
XkunW May 21, 2025
1169151
Add description for VLLM_SHORT_TO_LONG_MAP
XkunW Jun 17, 2025
a6e5432
Move slurm templates in new file for readibility and clarity, add tem…
XkunW Jun 18, 2025
cb80e7d
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 18, 2025
70ba53a
Add line breaks and adjust template formatting, add json update comma…
XkunW Jun 26, 2025
200b5b5
Add Batch mode Slurm script generator, renamed SLURM with Slurm in do…
XkunW Jun 26, 2025
5179abd
Add Batch mode launcher helper class
XkunW Jun 26, 2025
b2d0f02
Add BatchLaunchResponse data class, renamed node_list to nodelist
XkunW Jun 26, 2025
6d614aa
Add batch launch model function to client
XkunW Jun 26, 2025
9aeb7a3
Update batch-launch command for CLI
XkunW Jun 26, 2025
c6687d2
Resolve merge conflict
XkunW Jun 26, 2025
f8de9e0
Add Qwen3-14B
XkunW Jul 5, 2025
bb596c2
Remove unnecessary escapes
XkunW Jul 5, 2025
a113116
Change slurm_job_id type from int to string to accomodate het jobs
XkunW Jul 5, 2025
9b6c10e
Change type of slurm job id (int -> str), add het job ID handling
XkunW Jul 5, 2025
c164c3e
Change slurm job id type (str -> int), update batch mode post launch …
XkunW Jul 5, 2025
8e4aa08
Add log_dir to be part of StatusResponse
XkunW Jul 5, 2025
866f65e
Read log dir from output file path in slurm control command stdout in…
XkunW Jul 5, 2025
bc1bd12
Functions taking log_dir as optional param now marked as required par…
XkunW Jul 5, 2025
684a13b
Remove all occurances of asking user to provide log dir optionally, u…
XkunW Jul 5, 2025
2baa3d8
Update batch modemodel launch script template to fix json path slurm …
XkunW Jul 5, 2025
ca0d66d
Merge branch 'main' into feature/batch-mode
XkunW Jul 7, 2025
0eccce3
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 7, 2025
5c610fa
Misc small fixes from mypy, updated formatting
XkunW Jul 7, 2025
a46ea8b
Update example
XkunW Jul 7, 2025
35312a7
Change slurm_job_id type in Response models
XkunW Jul 7, 2025
9f4603d
Change slurm job id type in shutdown
XkunW Jul 8, 2025
d40c34e
Fixed existing tests to accomodate new changes
XkunW Jul 8, 2025
dfb8100
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 8, 2025
93ef90a
Add new client tests for batch mode
XkunW Jul 8, 2025
70da2db
Merge branch 'feature/batch-mode' of https://github.com/VectorInstitu…
XkunW Jul 8, 2025
1e8b61a
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 8, 2025
58bc43d
Remove redundant test
XkunW Jul 9, 2025
ca35604
Update CLI tests, added CLI helper tests
XkunW Jul 14, 2025
7c2208b
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 14, 2025
e4d38bd
ruff fix
XkunW Jul 14, 2025
c3a14b7
Update documentation
XkunW Jul 14, 2025
6bd09da
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,11 @@ export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml

#### Other commands

* `status`: Check the model status by providing its Slurm job ID, `--json-mode` supported.
* `batch-launch`: Launch multiple model inference servers at once, currently ONLY single node models supported,
* `status`: Check the model status by providing its Slurm job ID.
* `metrics`: Streams performance metrics to the console.
* `shutdown`: Shutdown a model by providing its Slurm job ID.
* `list`: List all available model names, or view the default/cached configuration of a specific model, `--json-mode` supported.
* `list`: List all available model names, or view the default/cached configuration of a specific model.
* `cleanup`: Remove old log directories. You can filter by `--model-family`, `--model-name`, `--job-id`, and/or `--before-job-id`. Use `--dry-run` to preview what would be deleted.

For more details on the usage of these commands, refer to the [User Guide](https://vectorinstitute.github.io/vector-inference/user_guide/)
Expand Down
53 changes: 51 additions & 2 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

### `launch` command

The `launch` command allows users to deploy a model as a slurm job. If the job successfully launches, a URL endpoint is exposed for the user to send requests for inference.
The `launch` command allows users to launch a OpenAI-compatible model inference server as a slurm job. If the job successfully launches, a URL endpoint is exposed for the user to send requests for inference.

We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:

Expand Down Expand Up @@ -97,6 +97,53 @@ export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
* For GPU partitions with non-Ampere architectures, e.g. `rtx6000`, `t4v2`, BF16 isn't supported. For models that have BF16 as the default type, when using a non-Ampere GPU, use FP16 instead, i.e. `--dtype: float16`.
* Setting `--compilation-config` to `3` currently breaks multi-node model launches, so we don't set them for models that require multiple nodes of GPUs.

### `batch-launch` command

The `batch-launch` command allows users to launch multiple inference servers at once, here is an example of launching 2 models:

```bash
vec-inf batch-launch DeepSeek-R1-Distill-Qwen-7B Qwen2.5-Math-PRM-7B
```

You should see an output like the following:

```
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Job Config ┃ Value ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Slurm Job ID │ 17480109 │
│ Slurm Job Name │ BATCH-DeepSeek-R1-Distill-Qwen-7B-Qwen2.5-Math-PRM-7B │
│ Model Name │ DeepSeek-R1-Distill-Qwen-7B │
│ Partition │ a40 │
│ QoS │ m2 │
│ Time Limit │ 08:00:00 │
│ Num Nodes │ 1 │
│ GPUs/Node │ 1 │
│ CPUs/Task │ 16 │
│ Memory/Node │ 64G │
│ Log Directory │ /h/marshallw/.vec-inf-logs/BATCH-DeepSeek-R1-Distill-Qwen-7B-Qwen2.5… │
│ Model Name │ Qwen2.5-Math-PRM-7B │
│ Partition │ a40 │
│ QoS │ m2 │
│ Time Limit │ 08:00:00 │
│ Num Nodes │ 1 │
│ GPUs/Node │ 1 │
│ CPUs/Task │ 16 │
│ Memory/Node │ 64G │
│ Log Directory │ /h/marshallw/.vec-inf-logs/BATCH-DeepSeek-R1-Distill-Qwen-7B-Qwen2.5… │
└────────────────┴─────────────────────────────────────────────────────────────────────────┘
```

The inference servers will begin launching only after all requested resources have been allocated, preventing resource waste. Unlike the `launch` command, `batch-launch` does not accept additional launch parameters from the command line. Users must either:

- Specify a batch launch configuration file using the `--batch-config` option, or
- Ensure model launch configurations are available at the default location (cached config or user-defined `VEC_INF_CONFIG`)

Since batch launches use heterogeneous jobs, users can request different partitions and resource amounts for each model. After launch, you can monitor individual servers using the standard commands (`status`, `metrics`, etc.) by providing the specific Slurm job ID for each server (e.g. 17480109+0, 17480109+1).

**NOTE**
* Currently only models that can fit on a single node (regardless of the node type) is supported, multi-node launches will be available in a future update.

### `status` command

You can check the inference server status by providing the Slurm job ID to the `status` command:
Expand Down Expand Up @@ -138,7 +185,9 @@ There are 5 possible states:
* **FAILED**: Inference server in an unhealthy state. Job failed reason will be shown.
* **SHUTDOWN**: Inference server is shutdown/cancelled.

Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
**Note**
* The base URL is only available when model is in `READY` state.
* For servers launched with `batch-launch`, the job ID should follow the format of "MAIN_JOB_ID+OFFSET" (e.g. 17480109+0, 17480109+1).

### `metrics` command

Expand Down
2 changes: 1 addition & 1 deletion examples/slurm_dependency/run_downstream.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

if len(sys.argv) < 2:
raise ValueError("Expected server job ID as the first argument.")
job_id = int(sys.argv[1])
job_id = sys.argv[1]

vi_client = VecInfClient()
print(f"Waiting for SLURM job {job_id} to be ready...")
Expand Down
1 change: 1 addition & 0 deletions tests/test_imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ def test_imports(self):
import vec_inf.client._exceptions
import vec_inf.client._helper
import vec_inf.client._slurm_script_generator
import vec_inf.client._slurm_templates
import vec_inf.client._utils
import vec_inf.client.api
import vec_inf.client.config
Expand Down
Loading
Loading