-
Notifications
You must be signed in to change notification settings - Fork 618
[CI] Add mla ut #4280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Add mla ut #4280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds unit tests for the Multi-Head Latent Attention (MLA) implementation. The new tests cover the metadata builder in test_mla_v1.py and the AscendMultiHeadLatentAttention layer in test_mla.py. The changes are a good step towards improving test coverage. I've found one high-severity issue in the new test test_forward within tests/ut/models/test_mla.py, where the mocking of a custom operation is incorrect, leading to a test that doesn't properly validate the intended functionality. I've provided a code suggestion to fix the test and make it more robust.
| mock_mla_forward.return_value = (3, self.hidden_size) | ||
|
|
||
| output = attn.forward(positions, hidden_states) | ||
|
|
||
| self.assertEqual(output.shape, (3, self.hidden_size)) | ||
| self.assertTrue( | ||
| torch.allclose(output, output.view(-1, self.hidden_size))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock for torch.ops.vllm.mla_forward is not correctly configured, and the assertions are not sufficient to validate the behavior of the forward method.
- The
mla_forwardcustom op is defined to returnNoneand modify itsoutputargument in-place. However, the test sets a tuple(3, self.hidden_size)as thereturn_value, which is incorrect and ignored during execution. This means the test doesn't verify that the op correctly modifies the output tensor. - The assertions only check the shape of the output tensor and that it's
allcloseto itself, which will always pass. The test should verify thatmla_forwardis called correctly and that the output tensor is populated as expected.
To make this test more robust, you should use side_effect to simulate the in-place modification of the output tensor and add assertions to check the op's arguments and the output's content.
| mock_mla_forward.return_value = (3, self.hidden_size) | |
| output = attn.forward(positions, hidden_states) | |
| self.assertEqual(output.shape, (3, self.hidden_size)) | |
| self.assertTrue( | |
| torch.allclose(output, output.view(-1, self.hidden_size))) | |
| def mla_forward_side_effect(hidden_states, need_gather_q_kv, output, prefix): | |
| # Simulate the op writing to the output tensor | |
| output.fill_(1.0) | |
| mock_mla_forward.side_effect = mla_forward_side_effect | |
| output = attn.forward(positions, hidden_states) | |
| mock_mla_forward.assert_called_once_with(hidden_states, False, output, self.prefix) | |
| self.assertEqual(output.shape, (3, self.hidden_size)) | |
| self.assertTrue(torch.all(output == 1.0)) |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: GDzhu01 <[email protected]>
|
plz update the pr message |
…cend into eplb_ci_bugfix * 'eplb_ci_bugfix' of https://github.com/845473182/vllm-ascend: (31 commits) [Test] Add ut test for torchair (vllm-project#4287) [Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (vllm-project#4265) [CI] Defaultly compile vllm with multimodal audio feature in dockerfile (vllm-project#4324) [MM][Bugfix] Add error log for VL models when enabling FLASHCOMM (vllm-project#4272) [Readme] EPLB Support Scenarios (vllm-project#4314) eplb redundant expert bugfix (vllm-project#4291) [Feat][BugFix]Support the Qwen3-Next-80B-A3B-Instruct quantization model&Fix the NZ issue (vllm-project#4245) [Test] Add ACL graph capture/replay DP test (vllm-project#4259) [Test] quick fix mla ut (vllm-project#4318) [Feat] Support MTP to running in full graph mode (vllm-project#3892) [CI] Add mla ut (vllm-project#4280) [Test] Add tests for the multi-node DeepSeek-V2-Lite network in GE Graph (vllm-project#4039) avoid mrope fusion op when running qwen2.5-vl on a+x machine (vllm-project#4270) [Bugfix] fix nightly multi-node EPLB tests' "DYNAMIC_EPLB=true" environment not working (vllm-project#4223) [long seq feat]GQA support long-prefill-token-threshold and fixbug (vllm-project#4209) [misc] clean up get_metadata_cls (vllm-project#4276) [Docs] Improve the AISBench multi-modal testing docs (vllm-project#4255) [doc]fix readme for kv pool user guide (vllm-project#4271) remove get_metadata_cls (vllm-project#4087) [Bugfix] fix hang in async scheduling (vllm-project#4233) ...
### What this PR does / why we need it? add mla_v1.py and mla.py ut ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `pytest tests/ut/attention/test_mla_v1.py` `pytest tests/ut/models/test_mla.py` - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: GDzhu01 <[email protected]> Signed-off-by: 白永斌 <[email protected]>
What this PR does / why we need it?
add mla_v1.py and mla.py ut
Does this PR introduce any user-facing change?
No
How was this patch tested?
pytest tests/ut/attention/test_mla_v1.pypytest tests/ut/models/test_mla.py