Skip to content

Conversation

@GDzhu01
Copy link
Contributor

@GDzhu01 GDzhu01 commented Nov 19, 2025

What this PR does / why we need it?

add mla_v1.py and mla.py ut

Does this PR introduce any user-facing change?

No

How was this patch tested?

pytest tests/ut/attention/test_mla_v1.py
pytest tests/ut/models/test_mla.py

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds unit tests for the Multi-Head Latent Attention (MLA) implementation. The new tests cover the metadata builder in test_mla_v1.py and the AscendMultiHeadLatentAttention layer in test_mla.py. The changes are a good step towards improving test coverage. I've found one high-severity issue in the new test test_forward within tests/ut/models/test_mla.py, where the mocking of a custom operation is incorrect, leading to a test that doesn't properly validate the intended functionality. I've provided a code suggestion to fix the test and make it more robust.

Comment on lines +209 to +215
mock_mla_forward.return_value = (3, self.hidden_size)

output = attn.forward(positions, hidden_states)

self.assertEqual(output.shape, (3, self.hidden_size))
self.assertTrue(
torch.allclose(output, output.view(-1, self.hidden_size)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The mock for torch.ops.vllm.mla_forward is not correctly configured, and the assertions are not sufficient to validate the behavior of the forward method.

  1. The mla_forward custom op is defined to return None and modify its output argument in-place. However, the test sets a tuple (3, self.hidden_size) as the return_value, which is incorrect and ignored during execution. This means the test doesn't verify that the op correctly modifies the output tensor.
  2. The assertions only check the shape of the output tensor and that it's allclose to itself, which will always pass. The test should verify that mla_forward is called correctly and that the output tensor is populated as expected.

To make this test more robust, you should use side_effect to simulate the in-place modification of the output tensor and add assertions to check the op's arguments and the output's content.

Suggested change
mock_mla_forward.return_value = (3, self.hidden_size)
output = attn.forward(positions, hidden_states)
self.assertEqual(output.shape, (3, self.hidden_size))
self.assertTrue(
torch.allclose(output, output.view(-1, self.hidden_size)))
def mla_forward_side_effect(hidden_states, need_gather_q_kv, output, prefix):
# Simulate the op writing to the output tensor
output.fill_(1.0)
mock_mla_forward.side_effect = mla_forward_side_effect
output = attn.forward(positions, hidden_states)
mock_mla_forward.assert_called_once_with(hidden_states, False, output, self.prefix)
self.assertEqual(output.shape, (3, self.hidden_size))
self.assertTrue(torch.all(output == 1.0))

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: GDzhu01 <[email protected]>
@MengqingCao
Copy link
Collaborator

plz update the pr message

@wangxiyuan wangxiyuan merged commit 15c1eb0 into vllm-project:main Nov 20, 2025
18 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Nov 21, 2025
…cend into eplb_ci_bugfix

* 'eplb_ci_bugfix' of https://github.com/845473182/vllm-ascend: (31 commits)
  [Test] Add ut test for torchair (vllm-project#4287)
  [Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (vllm-project#4265)
  [CI] Defaultly compile vllm with multimodal audio feature in dockerfile (vllm-project#4324)
  [MM][Bugfix] Add error log for VL models when enabling FLASHCOMM (vllm-project#4272)
  [Readme] EPLB Support Scenarios (vllm-project#4314)
  eplb redundant expert bugfix (vllm-project#4291)
  [Feat][BugFix]Support the Qwen3-Next-80B-A3B-Instruct quantization model&Fix the NZ issue (vllm-project#4245)
  [Test] Add ACL graph capture/replay DP test (vllm-project#4259)
  [Test] quick fix mla ut (vllm-project#4318)
  [Feat] Support MTP to running in full graph mode (vllm-project#3892)
  [CI] Add mla ut (vllm-project#4280)
  [Test] Add tests for the multi-node DeepSeek-V2-Lite network in GE Graph  (vllm-project#4039)
  avoid mrope fusion op when running qwen2.5-vl on a+x machine (vllm-project#4270)
  [Bugfix] fix nightly multi-node EPLB tests' "DYNAMIC_EPLB=true" environment not working (vllm-project#4223)
  [long seq feat]GQA support long-prefill-token-threshold and fixbug (vllm-project#4209)
  [misc] clean up get_metadata_cls (vllm-project#4276)
  [Docs] Improve the AISBench multi-modal testing docs (vllm-project#4255)
  [doc]fix readme for kv pool user guide  (vllm-project#4271)
  remove get_metadata_cls (vllm-project#4087)
  [Bugfix] fix hang in async scheduling (vllm-project#4233)
  ...
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Nov 21, 2025
### What this PR does / why we need it?
add mla_v1.py and mla.py ut
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
`pytest tests/ut/attention/test_mla_v1.py`
`pytest tests/ut/models/test_mla.py`

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: GDzhu01 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants