-
Notifications
You must be signed in to change notification settings - Fork 582
[Doc] Add tutorial for Qwen3-Coder-30B-A3B #4275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: nsdie <[email protected]>
Signed-off-by: nsdie <[email protected]>
Signed-off-by: nsdie <[email protected]>
Signed-off-by: nsdie <[email protected]>
Signed-off-by: nsdie <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a tutorial for the Qwen3-Coder-30B-A3B model and updates the corresponding support matrix. The documentation is a valuable addition. However, I've identified a couple of issues regarding clarity and completeness in the documentation that should be addressed. Specifically, there's an inconsistency in the hardware specifications mentioned in the tutorial, and a key model capability, max-model-len, is missing from the support matrix.
|
|
||
| Run the following script to execute online inference. | ||
|
|
||
| For an Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at least 2, and for 32 GB of memory, tensor-parallel-size should be at least 4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistency in the hardware description for the Atlas A2. Line 19 states that the Atlas 800 A2 node is equipped with 64G cards. However, this line introduces a recommendation for a configuration with '32 GB of memory'. This discrepancy can cause confusion for users. Please clarify if Atlas A2 variants with 32GB cards exist and are supported, or remove the reference to the 32GB configuration to maintain consistency.
| | Qwen3 | ✅ | ||||||||||||||||||| | ||
| | Qwen3-based | ✅ | ||||||||||||||||||| | ||
| | Qwen3-Coder | ✅ | ||||||||||||||||||| | ||
| | Qwen3-Coder | ✅ | |✅|✅||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max-model-len column for Qwen3-Coder has been left empty. The new tutorial file mentions that the model has 'extended context support of up to 1M tokens'. This is a critical piece of information for users and should be included in this summary table. Please update the max-model-len column with the correct value (e.g., 1000000) to ensure the support matrix is complete and accurate.
| @@ -0,0 +1,118 @@ | |||
| # Qwen3-Coder-30B-A3B | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this file name to index.md
|
|
||
| It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/` | ||
|
|
||
| ### Verify Multi-node Communication(Optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed
| You can using our official docker image and install extra operator for supporting `Qwen3-Coder-30B-A3B-Instruct`. | ||
|
|
||
| :::{note} | ||
| Only AArch64 architecture are supported currently due to extra operator's installation limitations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed
| 2. Install the package `custom-ops` to make the kernels available. | ||
|
|
||
| ```shell | ||
| wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/a3/CANN-custom_ops-sfa-linux.aarch64.run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom_ops is not needed
What this PR does / why we need it?
Add tutorial for Qwen3-Coder-30B-A3B
Does this PR introduce any user-facing change?
How was this patch tested?