Releases · VectorInstitute/vector-inference · GitHub

29 Aug 06:09

XkunW

v0.3.0

Added vec-inf CLI:
- Install vec-inf via pip
- launch command to launch models
- status command to check inference server status
- shutdown command to stop inference server
- list command to see all available models
Upgraded vllm to 0.5.4
Added support for new model families:
- Llama 3.1 (Including 405B)
- Gemma 2
- Phi 3
- Mistral Large

Assets 2

06 Jul 15:58

XkunW

v0.2.1

Add CodeLlama
Update model variant names for Llama 2 in README

Assets 2

04 Jul 14:29

XkunW

v0.2.0

Update default environment to use singularity container, added associated Dockerfile
Update vLLM to 0.5.0 and added VLM support (LLaVa-1.5 and LLaVa-NEXT) and updated example scripts
Refactored repo structure for simpler model onboard and update process

Assets 2

23 May 20:32

XkunW

v0.1.1

Update vllm to 0.4.2, which resolves the flash attention package not found issue
Update instructions for using the default environment to prevent/resolve NCCL not found error

Assets 2

24 Apr 20:21

XkunW

v0.1.0

Easy-to-use high-throughput LLM inference on Slurm clusters using vLLM

Supported models and variants:

Command R plus
DBRX: Instruct
Llama 2: 7b, 7b-chat, 13b, 13b-chat, 70b, 70b-chat
Llama 3: 8B, 8B-Instruct, 70B, 70B-Instruct
Mixtral: 8x7B-Instruct-v0.1, 8x22B-v0.1, 8x22B-Instruct-v0.1

Supported functionalities:

Completions and chat completions
Logits generation

Assets 2