Skip to content

Releases: vllm-project/vllm-spyre

v0.6.0

31 Jul 22:28
a7bc26b
Compare
Choose a tag to compare

This release:

  • 🎉 Supports embedding models on vLLM v1!
  • 🔥 Removes all remaining support for vLLM v0
  • ⚡ Contains performance and stability fixes for continuous batching
    • ⚗️ Support for up to --max-num-seqs 4 --max-model-len 8192 --tensor-parallel-size 4 has been tested on ibm-granite/granite-3.3-8b-instruct
  • 📦 Officially supports vllm 0.9.2 and 0.10.0

What's Changed

New Contributors

Full Changelog: v0.5.3...v0.6.0

v0.5.3

18 Jul 20:32
639295d
Compare
Choose a tag to compare

This release contains test updates and fixes for continuous batching, and a small logging improvement

What's Changed

Full Changelog: v0.5.2...v0.5.3

v0.5.2

16 Jul 20:27
377895d
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.1...v0.5.2

v0.5.1

11 Jul 17:47
e9604ff
Compare
Choose a tag to compare

This release:

  • Fixes Tensor parallel support for static batching

Known Issues

⚠️⚠️⚠️⚠️⚠️
Tensor parallel support seems to be still broken for continuous batching
⚠️⚠️⚠️⚠️⚠️

What's Changed

Full Changelog: v0.5.0...v0.5.1

v0.5.0

09 Jul 22:01
d571804
Compare
Choose a tag to compare

This release:

  • Introduces breaking changes brought in by vllm-upstream 0.9.2
  • Supports prompt logprobs with static batching

Known Issues

⚠️⚠️⚠️⚠️⚠️

Tensor parallel support is broken, look for a bugfix release soon
⚠️⚠️⚠️⚠️⚠️

What's Changed

New Contributors

Full Changelog: v0.4.1...v0.5.0

v0.4.1

02 Jul 17:42
3172162
Compare
Choose a tag to compare

This release:

  • Includes a critical bugfix for batch handling with continuous batching
  • Fixes a bug where the first prompt after warmup would take a long time with continuous batching
  • Fixes a bug where canceling requests could crash the server

What's Changed

Full Changelog: v0.4.0...v0.4.1

v0.4.0

19 Jun 23:11
2c295c8
Compare
Choose a tag to compare

This release:

  • ➕ Adds support for ibm-fms 1.1.0
  • ➕ Adds support for the latest compiler updates in the newest base image
  • ❗ Removes v0 support for text generation
  • ⚗️ Adds (very experimental) support for continuous batching mode on spyre hardware

This release is not compatible with vllm==0.9.1, read more details here

What's Changed

Full Changelog: v0.3.1...v0.4.0

v0.3.1

05 Jun 22:03
a82191a
Compare
Choose a tag to compare

This bugfix release addresses two important issues:

  • Fixes a configuration bug with tensor-parallel inference on the public quay.io/ibm-aiu/vllm-spyre image, causing 0.3.0 to fail
  • Fixes a bug where full static batches of long prompts could not be scheduled

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.3.1

v0.3.0

03 Jun 15:45
2e7c154
Compare
Choose a tag to compare

This release:

  • Updates vLLM compatibility to 0.9.0.1
  • Adds vllm profiler support
  • Supports multi-spyre setups with tensor parallel out of the box

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.3.0

v0.2.0

16 May 15:31
85c688b
Compare
Choose a tag to compare

This release

  • Updates vllm compatibility to ~=0.8.5
  • Adds support for sampling parameters for continuous batching
  • Uses standard vllm config for continuous batching parameters

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.2.0