Skip to content

Conversation

@0Ayachi0
Copy link

@0Ayachi0 0Ayachi0 commented Nov 15, 2025

Motivation

NO.12 功能模块 fastdeploy/spec_decode/mtp.py 单测补充

Modifications

add unittest tests/spec_decode/test_mtp.py

Usage or Command

no need

Accuracy Tests

no need

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings November 15, 2025 19:35
@paddle-bot
Copy link

paddle-bot bot commented Nov 15, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Nov 15, 2025
@CLAassistant
Copy link

CLAassistant commented Nov 15, 2025

CLA assistant check
All committers have signed the CLA.

@0Ayachi0 0Ayachi0 changed the title [CI]【Hackathon第9阶段开发示例NO 12】功能模块fastdeploy/spec_decode/mtp.py 的单元测试补充内容 [CI]【Hackathon第9阶段开发示例NO 12】功能模块fastdeploy/spec_decode/mtp.py 单测补充 Nov 15, 2025
Copilot finished reviewing on behalf of 0Ayachi0 November 15, 2025 19:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds unit tests for the fastdeploy/spec_decode/mtp.py module as part of Hackathon Phase 9 Task NO.12. The tests cover initialization and basic operations of the MTPProposer class used in speculative decoding.

  • Comprehensive test coverage for MTPProposer initialization and configuration
  • Tests for cache management methods (initialize, clear, update)
  • Tests for utility methods like exist_prefill and is_chunk_prefill_enabled

Comment on lines +76 to +124
self.mock_target_model_inputs = {
"block_tables": paddle.zeros([8, 100], dtype="int32"),
"input_ids": paddle.zeros([8, 2048], dtype="int64"),
"seq_lens_this_time": paddle.zeros([8], dtype="int32"),
"seq_lens_encoder": paddle.zeros([8], dtype="int32"),
"seq_lens_decoder": paddle.zeros([8], dtype="int32"),
"step_idx": paddle.zeros([8], dtype="int32"),
"stop_flags": paddle.zeros([8], dtype="bool"),
"stop_nums": paddle.zeros([8], dtype="int32"),
"pre_ids": paddle.zeros([8], dtype="int64"),
"output_cum_offsets": paddle.zeros([8], dtype="int32"),
"output_padding_offset": paddle.zeros([8], dtype="int32"),
"ids_remove_padding": paddle.zeros([8], dtype="int64"),
"batch_id_per_token": paddle.zeros([8], dtype="int32"),
"cu_seqlens_q": paddle.zeros([9], dtype="int32"),
"cu_seqlens_k": paddle.zeros([9], dtype="int32"),
"decoder_batch_ids": paddle.zeros([8], dtype="int32"),
"decoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"),
"decoder_num_blocks_cpu": paddle.zeros([8], dtype="int32"),
"decoder_num_blocks_device": paddle.zeros([8], dtype="int32"),
"decoder_chunk_size_device": paddle.zeros([8], dtype="int32"),
"max_len_tensor_cpu": paddle.zeros([8], dtype="int32"),
"encoder_batch_ids": paddle.zeros([8], dtype="int32"),
"encoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"),
"encoder_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"),
"kv_batch_ids": paddle.zeros([8], dtype="int32"),
"kv_tile_ids_per_batch": paddle.zeros([8], dtype="int32"),
"kv_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"),
"prompt_lens": paddle.zeros([8], dtype="int32"),
"top_p": paddle.ones([8], dtype="float32") * 0.7,
"top_k": paddle.ones([8], dtype="int32") * 50,
"temperature": paddle.ones([8], dtype="float32") * 1.0,
"eos_token_id": paddle.ones([8, 1], dtype="int64") * 2,
"penalty_score": paddle.ones([8], dtype="float32"),
"frequency_score": paddle.zeros([8], dtype="float32"),
"presence_score": paddle.zeros([8], dtype="float32"),
"infer_seed": paddle.zeros([8], dtype="int64"),
"max_dec_len": paddle.ones([8], dtype="int32") * 256,
"min_dec_len": paddle.zeros([8], dtype="int32"),
"bad_tokens": paddle.zeros([8, 0], dtype="int64"),
"draft_tokens": paddle.zeros([8, 10], dtype="int64"),
"accept_tokens": paddle.zeros([8, 10], dtype="int64"),
"accept_num": paddle.zeros([8], dtype="int32"),
"encoder_block_lens": paddle.zeros([8], dtype="int32"),
"cu_batch_token_offset": paddle.zeros([9], dtype="int32"),
"temp_scaled_logprobs": None,
"top_p_normalized_logprobs": None,
"draft_logits": None,
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The test creates large mock tensors (e.g., paddle.zeros([8, 2048], dtype="int64") for input_ids) that consume significant memory but aren't used meaningfully in most tests. This could slow down test execution unnecessarily. Consider:

  1. Using smaller tensor dimensions for tests that don't require full-size tensors
  2. Lazy-initializing tensors only when needed
  3. Sharing tensor instances across tests when the values don't change

For example, input_ids with shape [8, 2048] could be [2, 10] for most tests unless the full dimensions are specifically being tested.

Copilot uses AI. Check for mistakes.
@patch("fastdeploy.spec_decode.mtp.get_attention_backend")
@patch("fastdeploy.spec_decode.mtp.get_rope")
@patch("fastdeploy.spec_decode.mtp.MTPSampler")
def test_is_chunk_prefill_enabled(self, mock_sampler, mock_get_rope, mock_get_attn_backend, mock_get_model_loader):
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the test method. According to the custom coding guidelines, functions should have clear documentation. Add a docstring that explains what behavior is being tested, for example:

"""Test that is_chunk_prefill_enabled always returns True for MTPProposer."""

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +30 to +124
def setUp(self):
"""Set up test environment"""
# Mock FDConfig
self.mock_fd_config = MagicMock()
self.mock_fd_config.model_config = MagicMock()
self.mock_fd_config.model_config.architectures = ["ErnieMoeForCausalLM"]
self.mock_fd_config.model_config.num_hidden_layers = 32
self.mock_fd_config.model_config.max_model_len = 2048
self.mock_fd_config.model_config.hidden_size = 1024
self.mock_fd_config.model_config.num_attention_heads = 16
self.mock_fd_config.model_config.num_key_value_heads = 16
self.mock_fd_config.model_config.head_dim = 64
self.mock_fd_config.model_config.rope_theta = 10000.0
self.mock_fd_config.model_config.enable_logprob = False
self.mock_fd_config.speculative_config = MagicMock()
self.mock_fd_config.speculative_config.mtp_strategy = "standard"
self.mock_fd_config.speculative_config.num_gpu_block_expand_ratio = 1.0
self.mock_fd_config.speculative_config.model = "test_model"
self.mock_fd_config.speculative_config.quantization = ""
self.mock_fd_config.speculative_config.method = "mtp"
self.mock_fd_config.scheduler_config = MagicMock()
self.mock_fd_config.scheduler_config.splitwise_role = "mixed"
self.mock_fd_config.cache_config = MagicMock()
self.mock_fd_config.cache_config.block_size = 16
self.mock_fd_config.cache_config.enc_dec_block_num = 0
self.mock_fd_config.cache_config.total_block_num = 100
self.mock_fd_config.cache_config.kv_cache_ratio = 0.9
self.mock_fd_config.cache_config.enable_prefix_caching = False
self.mock_fd_config.cache_config.enable_chunked_prefill = False
self.mock_fd_config.graph_opt_config = MagicMock()
self.mock_fd_config.graph_opt_config.draft_model_use_cudagraph = False
self.mock_fd_config.graph_opt_config.cudagraph_capture_sizes = []
self.mock_fd_config.graph_opt_config.sot_warmup_sizes = []
self.mock_fd_config.parallel_config = MagicMock()
self.mock_fd_config.parallel_config.tensor_parallel_size = 1
self.mock_fd_config.parallel_config.enable_expert_parallel = False
self.mock_fd_config.quant_config = None
self.mock_fd_config.load_config = MagicMock()
self.mock_fd_config.max_num_seqs = 8
self.mock_fd_config.max_prefill_batch = 4
self.mock_fd_config.model_config.enable_mm = False

# Mock main model
self.mock_main_model = MagicMock()

# Mock target model inputs
self.mock_target_model_inputs = {
"block_tables": paddle.zeros([8, 100], dtype="int32"),
"input_ids": paddle.zeros([8, 2048], dtype="int64"),
"seq_lens_this_time": paddle.zeros([8], dtype="int32"),
"seq_lens_encoder": paddle.zeros([8], dtype="int32"),
"seq_lens_decoder": paddle.zeros([8], dtype="int32"),
"step_idx": paddle.zeros([8], dtype="int32"),
"stop_flags": paddle.zeros([8], dtype="bool"),
"stop_nums": paddle.zeros([8], dtype="int32"),
"pre_ids": paddle.zeros([8], dtype="int64"),
"output_cum_offsets": paddle.zeros([8], dtype="int32"),
"output_padding_offset": paddle.zeros([8], dtype="int32"),
"ids_remove_padding": paddle.zeros([8], dtype="int64"),
"batch_id_per_token": paddle.zeros([8], dtype="int32"),
"cu_seqlens_q": paddle.zeros([9], dtype="int32"),
"cu_seqlens_k": paddle.zeros([9], dtype="int32"),
"decoder_batch_ids": paddle.zeros([8], dtype="int32"),
"decoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"),
"decoder_num_blocks_cpu": paddle.zeros([8], dtype="int32"),
"decoder_num_blocks_device": paddle.zeros([8], dtype="int32"),
"decoder_chunk_size_device": paddle.zeros([8], dtype="int32"),
"max_len_tensor_cpu": paddle.zeros([8], dtype="int32"),
"encoder_batch_ids": paddle.zeros([8], dtype="int32"),
"encoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"),
"encoder_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"),
"kv_batch_ids": paddle.zeros([8], dtype="int32"),
"kv_tile_ids_per_batch": paddle.zeros([8], dtype="int32"),
"kv_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"),
"prompt_lens": paddle.zeros([8], dtype="int32"),
"top_p": paddle.ones([8], dtype="float32") * 0.7,
"top_k": paddle.ones([8], dtype="int32") * 50,
"temperature": paddle.ones([8], dtype="float32") * 1.0,
"eos_token_id": paddle.ones([8, 1], dtype="int64") * 2,
"penalty_score": paddle.ones([8], dtype="float32"),
"frequency_score": paddle.zeros([8], dtype="float32"),
"presence_score": paddle.zeros([8], dtype="float32"),
"infer_seed": paddle.zeros([8], dtype="int64"),
"max_dec_len": paddle.ones([8], dtype="int32") * 256,
"min_dec_len": paddle.zeros([8], dtype="int32"),
"bad_tokens": paddle.zeros([8, 0], dtype="int64"),
"draft_tokens": paddle.zeros([8, 10], dtype="int64"),
"accept_tokens": paddle.zeros([8, 10], dtype="int64"),
"accept_num": paddle.zeros([8], dtype="int32"),
"encoder_block_lens": paddle.zeros([8], dtype="int32"),
"cu_batch_token_offset": paddle.zeros([9], dtype="int32"),
"temp_scaled_logprobs": None,
"top_p_normalized_logprobs": None,
"draft_logits": None,
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The setUp method creates an extensive mock configuration with 71 lines of repetitive mock setup. This duplicated mock setup pattern across all test methods makes the tests harder to maintain. Consider:

  1. Extracting common mock setup into a helper method or fixture
  2. Creating a factory function that returns a properly configured mock FDConfig
  3. Using a test configuration file for default values

This would make tests more readable and easier to update when the configuration schema changes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants