Skip to content

Releases: vipshop/vllm

v0.8.5rc2

17 Apr 07:01
4fc1489
Compare
Choose a tag to compare

What's Changed

  • [Misc][VIP] ignore marlin_moe_wna16 local gen codes by @DefTruth in #27

Full Changelog: v0.8.4rc4...v0.8.5rc2

v0.8.5rc1

16 Apr 04:03
58f457c
Compare
Choose a tag to compare

What's Changed

  • [Bugfix][Kernel][VIP] fix potential cuda graph broken for merge_attn_states kernel by @DefTruth in #26

Full Changelog: v0.8.4rc4...v0.8.5rc1

v0.8.4rc4

13 Apr 07:12
f49e5af
Compare
Choose a tag to compare

v0.8.4rc3

12 Apr 04:44
802329d
Compare
Choose a tag to compare

Rebase to vllm:main for a more clean codebase
Full Changelog: https://github.com/vipshop/vllm/commits/v0.8.4rc3

v0.8.4rc2

10 Apr 10:51
94680eb
Compare
Choose a tag to compare

What's Changed

  • [Kernel] optimize merge_attn_states CUDA kernel dispatch by @DefTruth in #22
  • [Update][VIP] Update from vllm:main and fix conflicts by @DefTruth in #23
  • [Kernel][VIP] opt cuda merge_attn_states kernel thread block dispatch by @DefTruth in #24

Full Changelog: v0.8.4rc1...v0.8.4rc2

v0.8.4rc1

08 Apr 12:53
88fef9d
Compare
Choose a tag to compare

What's Changed

  • [Kernel][VIP] support cuda merge_attn_states kernel, max ~3x improved by @DefTruth in #18
  • [Kernel][VIP] support cuda merge_attn_states kernel by @DefTruth in #19
  • [Kernel][VIP] dispatch merge_attn_states cuda kernel, half&bf16 by @DefTruth in #20
  • [Misc][VIP] Revert to original Tritron merge_attn_states kernel by @DefTruth in #21

Full Changelog: v0.8.4rc0...v0.8.4rc1

v0.8.4rc0

07 Apr 04:15
273cd8b
Compare
Choose a tag to compare

v0.8.3rc1

02 Apr 09:22
9a41348
Compare
Choose a tag to compare

Patch release that fix mla + chunked prefill results error

What's Changed

  • [Misc][VIP] Update from latest vllm:main by @DefTruth in #16
  • [Kernel][VIP] optimize merge_attn_states_kernel by @DefTruth in #17

Full Changelog: v0.8.3rc0...v0.8.3rc1

v0.8.3rc0

01 Apr 05:48
83ee400
Compare
Choose a tag to compare

Patch release that fix mm parser error and add FA2 inf workaround for MLA with minimum overhead.

What's Changed

Full Changelog: v0.8.2rc2.dev0...v0.8.3rc0

v0.8.2rc2.dev0

25 Mar 06:26
045247b
Compare
Choose a tag to compare

What's Changed

  • [Update][VIP] merge latest update from vllm/main by @DefTruth in #10
  • [Misc][VIP] revert hotfix for gptq-marlin non-contiguous by @DefTruth in #11

Full Changelog: v0.8.2rc1.dev0...v0.8.2rc2.dev0