Releases: vipshop/vllm
Releases · vipshop/vllm
v0.8.5rc2
v0.8.5rc1
v0.8.4rc4
Full Changelog: v0.8.4rc3...v0.8.4rc4
v0.8.4rc3
Rebase to vllm:main for a more clean codebase
Full Changelog: https://github.com/vipshop/vllm/commits/v0.8.4rc3
v0.8.4rc2
v0.8.4rc1
What's Changed
- [Kernel][VIP] support cuda merge_attn_states kernel, max ~3x improved by @DefTruth in #18
- [Kernel][VIP] support cuda merge_attn_states kernel by @DefTruth in #19
- [Kernel][VIP] dispatch merge_attn_states cuda kernel, half&bf16 by @DefTruth in #20
- [Misc][VIP] Revert to original Tritron merge_attn_states kernel by @DefTruth in #21
Full Changelog: v0.8.4rc0...v0.8.4rc1
v0.8.4rc0
Full Changelog: v0.8.3rc1...v0.8.4rc0
v0.8.3rc1
v0.8.3rc0
Patch release that fix mm parser error and add FA2 inf workaround for MLA with minimum overhead.
What's Changed
- [Bugfix][VIP] Fix MLA chunked prefill performance with mini overhead by @DefTruth in #12
- [Bugfix][VIP] revert mla + chunked-prefill inf fix by @DefTruth in #13
- [Bugfix][VIP] Add workaround for FA2 inf with mini overhead by @DefTruth in #14
- [Bugfix][VIP] pick the mm parser fix from vllm-project#15828 by @DefTruth in #15
Full Changelog: v0.8.2rc2.dev0...v0.8.3rc0
v0.8.2rc2.dev0
What's Changed
- [Update][VIP] merge latest update from vllm/main by @DefTruth in #10
- [Misc][VIP] revert hotfix for gptq-marlin non-contiguous by @DefTruth in #11
Full Changelog: v0.8.2rc1.dev0...v0.8.2rc2.dev0