Skip to content

Conversation

@NSDie
Copy link
Contributor

@NSDie NSDie commented Nov 19, 2025

What this PR does / why we need it?

Add tutorial for Qwen3-Coder-30B-A3B

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 19, 2025
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@NSDie NSDie changed the title Add tutorial for Qwen3-Coder-30B Add tutorial for Qwen3-Coder-30B-A3B Nov 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a tutorial for the Qwen3-Coder-30B-A3B model and updates the corresponding support matrix. The documentation is a valuable addition. However, I've identified a couple of issues regarding clarity and completeness in the documentation that should be addressed. Specifically, there's an inconsistency in the hardware specifications mentioned in the tutorial, and a key model capability, max-model-len, is missing from the support matrix.


Run the following script to execute online inference.

For an Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at least 2, and for 32 GB of memory, tensor-parallel-size should be at least 4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's an inconsistency in the hardware description for the Atlas A2. Line 19 states that the Atlas 800 A2 node is equipped with 64G cards. However, this line introduces a recommendation for a configuration with '32 GB of memory'. This discrepancy can cause confusion for users. Please clarify if Atlas A2 variants with 32GB cards exist and are supported, or remove the reference to the 32GB configuration to maintain consistency.

| Qwen3 || |||||||||||||||||||
| Qwen3-based || |||||||||||||||||||
| Qwen3-Coder || |||||||||||||||||||
| Qwen3-Coder || ||||||||||||||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The max-model-len column for Qwen3-Coder has been left empty. The new tutorial file mentions that the model has 'extended context support of up to 1M tokens'. This is a critical piece of information for users and should be included in this summary table. Please update the max-model-len column with the correct value (e.g., 1000000) to ensure the support matrix is complete and accurate.

@NSDie NSDie changed the title Add tutorial for Qwen3-Coder-30B-A3B [Doc] Add tutorial for Qwen3-Coder-30B-A3B Nov 19, 2025
@@ -0,0 +1,118 @@
# Qwen3-Coder-30B-A3B
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this file name to index.md


It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`

### Verify Multi-node Communication(Optional)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed

You can using our official docker image and install extra operator for supporting `Qwen3-Coder-30B-A3B-Instruct`.

:::{note}
Only AArch64 architecture are supported currently due to extra operator's installation limitations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed

2. Install the package `custom-ops` to make the kernels available.

```shell
wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/a3/CANN-custom_ops-sfa-linux.aarch64.run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom_ops is not needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants