Releases: VectorInstitute/vector-inference
Releases · VectorInstitute/vector-inference
v0.3.3
03 Sep 21:53
Compare
Sorry, something went wrong.
No results found
Added missing package in decencies
Fixed pre-commit hooks
Linted and formatted code
Updated outdated examples
v0.3.2
03 Sep 18:27
Compare
Sorry, something went wrong.
No results found
Add support for custom models, users can now launch custom models as long as the model architecture is supported by vllm
Minor update multi-node job launching to better support custom models
Add Llama3-OpenBioLLM-70B to supported model list
v0.3.1
29 Aug 13:41
Compare
Sorry, something went wrong.
No results found
Add model-name argument to list command to show default setup of a specific supported model
Improved command option descriptions
Restructured models directory
Add some default values for using a custom model
v0.3.0
29 Aug 06:09
Compare
Sorry, something went wrong.
No results found
v0.2.1
06 Jul 15:58
Compare
Sorry, something went wrong.
No results found
Add CodeLlama
Update model variant names for Llama 2 in README
v0.2.0
04 Jul 14:29
Compare
Sorry, something went wrong.
No results found
Update default environment to use singularity container, added associated Dockerfile
Update vLLM to 0.5.0 and added VLM support (LLaVa-1.5 and LLaVa-NEXT) and updated example scripts
Refactored repo structure for simpler model onboard and update process
v0.1.1
23 May 20:32
Compare
Sorry, something went wrong.
No results found
Update vllm to 0.4.2, which resolves the flash attention package not found issue
Update instructions for using the default environment to prevent/resolve NCCL not found error
v0.1.0
24 Apr 20:21
Compare
Sorry, something went wrong.
No results found
Easy-to-use high-throughput LLM inference on Slurm clusters using vLLM
Supported models and variants:
Command R plus
DBRX: Instruct
Llama 2: 7b, 7b-chat, 13b, 13b-chat, 70b, 70b-chat
Llama 3: 8B, 8B-Instruct, 70B, 70B-Instruct
Mixtral: 8x7B-Instruct-v0.1, 8x22B-v0.1, 8x22B-Instruct-v0.1
Supported functionalities:
Completions and chat completions
Logits generation