[Doc] Add tutorial for Qwen3-Coder-30B-A3B #4275

NSDie · 2025-11-19T09:08:36Z

What this PR does / why we need it?

Add tutorial for Qwen3-Coder-30B-A3B

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

Signed-off-by: nsdie <[email protected]>

github-actions · 2025-11-19T09:08:52Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a tutorial for the Qwen3-Coder-30B-A3B model and updates the corresponding support matrix. The documentation is a valuable addition. However, I've identified a couple of issues regarding clarity and completeness in the documentation that should be addressed. Specifically, there's an inconsistency in the hardware specifications mentioned in the tutorial, and a key model capability, max-model-len, is missing from the support matrix.

gemini-code-assist · 2025-11-19T09:09:58Z

docs/source/tutorials/Qwen3-Coder-30B-A3B.md

+
+Run the following script to execute online inference.
+
+For an Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at least 2, and for 32 GB of memory, tensor-parallel-size should be at least 4.


There's an inconsistency in the hardware description for the Atlas A2. Line 19 states that the Atlas 800 A2 node is equipped with 64G cards. However, this line introduces a recommendation for a configuration with '32 GB of memory'. This discrepancy can cause confusion for users. Please clarify if Atlas A2 variants with 32GB cards exist and are supported, or remove the reference to the 32GB configuration to maintain consistency.

gemini-code-assist · 2025-11-19T09:09:59Z

docs/source/user_guide/support_matrix/supported_models.md

 | Qwen3                         | ✅        |                                                                      |||||||||||||||||||
 | Qwen3-based                   | ✅        |                                                                      |||||||||||||||||||
-| Qwen3-Coder                   | ✅        |                                                                      |||||||||||||||||||
+| Qwen3-Coder                   | ✅        |                                                                      |✅|✅||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)|


The max-model-len column for Qwen3-Coder has been left empty. The new tutorial file mentions that the model has 'extended context support of up to 1M tokens'. This is a critical piece of information for users and should be included in this summary table. Please update the max-model-len column with the correct value (e.g., 1000000) to ensure the support matrix is complete and accurate.

wangxiyuan · 2025-11-20T08:04:18Z

docs/source/tutorials/Qwen3-Coder-30B-A3B.md

@@ -0,0 +1,118 @@
+# Qwen3-Coder-30B-A3B


add this file name to index.md

wangxiyuan · 2025-11-20T08:07:13Z

docs/source/tutorials/Qwen3-Coder-30B-A3B.md

+
+It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
+
+### Verify Multi-node Communication(Optional)


this can be removed

wangxiyuan · 2025-11-20T08:08:21Z

docs/source/tutorials/Qwen3-Coder-30B-A3B.md

+You can using our official docker image and install extra operator for supporting `Qwen3-Coder-30B-A3B-Instruct`.
+
+:::{note}
+Only AArch64 architecture are supported currently due to extra operator's installation limitations.


this can be removed

wangxiyuan · 2025-11-20T08:08:34Z

docs/source/tutorials/Qwen3-Coder-30B-A3B.md

+2. Install the package `custom-ops` to make the kernels available.
+
+```shell
+wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/a3/CANN-custom_ops-sfa-linux.aarch64.run


custom_ops is not needed

NSDie and others added 9 commits September 4, 2025 19:58

fix bug prefill not support swa

054cfd4

Signed-off-by: nsdie <[email protected]>

fix bug prefill not support swa

309368e

Signed-off-by: nsdie <[email protected]>

fix

0878bc4

Signed-off-by: nsdie <[email protected]>

cleancode

af248e4

Signed-off-by: nsdie <[email protected]>

rm ut

685bfd9

Signed-off-by: nsdie <[email protected]>

Merge branch 'vllm-project:main' into main

363406a

Merge branch 'vllm-project:main' into main

32b8cfa

add tutorial for Qwen3-Coder-30B

60c37a2

add tutorial for Qwen3-Coder-30B

942a67c

github-actions bot added the documentation Improvements or additions to documentation label Nov 19, 2025

NSDie changed the title ~~Add tutorial for Qwen3-Coder-30B~~ Add tutorial for Qwen3-Coder-30B-A3B Nov 19, 2025

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

NSDie changed the title ~~Add tutorial for Qwen3-Coder-30B-A3B~~ [Doc] Add tutorial for Qwen3-Coder-30B-A3B Nov 19, 2025

NSDie added 2 commits November 19, 2025 17:17

fix

b9f5400

fix

1337ed4

wangxiyuan reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc] Add tutorial for Qwen3-Coder-30B-A3B #4275

[Doc] Add tutorial for Qwen3-Coder-30B-A3B #4275

NSDie commented Nov 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 19, 2025

Uh oh!

gemini-code-assist bot Nov 19, 2025

Uh oh!

wangxiyuan Nov 20, 2025

Uh oh!

wangxiyuan Nov 20, 2025

Uh oh!

wangxiyuan Nov 20, 2025

Uh oh!

wangxiyuan Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Run the following script to execute online inference.

		For an Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at least 2, and for 32 GB of memory, tensor-parallel-size should be at least 4.


		It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`

		### Verify Multi-node Communication(Optional)

[Doc] Add tutorial for Qwen3-Coder-30B-A3B #4275

Are you sure you want to change the base?

[Doc] Add tutorial for Qwen3-Coder-30B-A3B #4275

Conversation

NSDie commented Nov 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NSDie commented Nov 19, 2025 •

edited by github-actions bot

Loading