|
1 | 1 | # Release Notes |
2 | 2 |
|
| 3 | +## v0.11.0rc2 - 2025.11.21 |
| 4 | +This is the second release candidate of v0.11.0 for vLLM Ascend. In this release, we solved many bugs to improve the quality. Thanks for all your feedback. We'll keep working on bug fix and performance improvement. The v0.11.0 official release will come soon. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.11.0-dev) to get started. |
| 5 | + |
| 6 | +### Highlights |
| 7 | +- CANN is upgraded to 8.3.RC2. [#4332](https://github.com/vllm-project/vllm-ascend/pull/4332) |
| 8 | +- Ngram spec decode method is back now. [#4092](https://github.com/vllm-project/vllm-ascend/pull/4092) |
| 9 | +- The performance of aclgraph is improved by updating default capture size. [#4205](https://github.com/vllm-project/vllm-ascend/pull/4205) |
| 10 | + |
| 11 | +### Core |
| 12 | +- Speed up vLLM startup time. [#4099](https://github.com/vllm-project/vllm-ascend/pull/4099) |
| 13 | +- Kimi k2 with quantization works now. [#4190](https://github.com/vllm-project/vllm-ascend/pull/4190) |
| 14 | +- Fix a bug for qwen3-next. It's more stable now. [#4025](https://github.com/vllm-project/vllm-ascend/pull/4025) |
| 15 | + |
| 16 | +### Other |
| 17 | +- Fix an issue for full decode only mode. Full graph mode is more stable now. [#4106](https://github.com/vllm-project/vllm-ascend/pull/4106) [#4282](https://github.com/vllm-project/vllm-ascend/pull/4282) |
| 18 | +- Fix a allgather ops bug for DeepSeek V3 series models. [#3711](https://github.com/vllm-project/vllm-ascend/pull/3711) |
| 19 | +- Fix some bugs for EPLB feature. [#4150](https://github.com/vllm-project/vllm-ascend/pull/4150) [#4334](https://github.com/vllm-project/vllm-ascend/pull/4334) |
| 20 | +- Fix a bug that vl model doesn't work on x86 machine. [#4285](https://github.com/vllm-project/vllm-ascend/pull/4285) |
| 21 | +- Support ipv6 for prefill disaggregation proxy. Please note that mooncake connector doesn't work with ipv6. We're working on it. [#4242](https://github.com/vllm-project/vllm-ascend/pull/4242) |
| 22 | +- Add a check that to ensure EPLB only support w8a8 method for quantization case. [#4315](https://github.com/vllm-project/vllm-ascend/pull/4315) |
| 23 | +- Add a check that to ensure FLASHCOMM feature doesn't work with vl model. It'll be supported in 2025 Q4 [#4222](https://github.com/vllm-project/vllm-ascend/pull/4222) |
| 24 | +- Audio required library is installed in container. [#4324](https://github.com/vllm-project/vllm-ascend/pull/4324) |
| 25 | + |
| 26 | +### Known Issues |
| 27 | +- Ray + EP doesn't work, if you run vLLM Ascend with ray, please disable expert parallelism. [#4123](https://github.com/vllm-project/vllm-ascend/pull/4123) |
| 28 | +- `response_format` parameter is not supported yet. We'll support it soon. [#4175](https://github.com/vllm-project/vllm-ascend/pull/4175) |
| 29 | +- cpu bind feature doesn't work for multi instance case(Such as multi DP on one node). We'll fix it in the next release. |
| 30 | + |
3 | 31 | ## v0.11.0rc1 - 2025.11.10 |
4 | 32 |
|
5 | 33 | This is the first release candidate of v0.11.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.11.0-dev) to get started. |
|
0 commit comments