-
Notifications
You must be signed in to change notification settings - Fork 660
[CI]【Hackathon第9阶段开发示例NO 12】功能模块fastdeploy/spec_decode/mtp.py 单测补充 #5068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds unit tests for the fastdeploy/spec_decode/mtp.py module as part of Hackathon Phase 9 Task NO.12. The tests cover initialization and basic operations of the MTPProposer class used in speculative decoding.
- Comprehensive test coverage for MTPProposer initialization and configuration
- Tests for cache management methods (initialize, clear, update)
- Tests for utility methods like
exist_prefillandis_chunk_prefill_enabled
| self.mock_target_model_inputs = { | ||
| "block_tables": paddle.zeros([8, 100], dtype="int32"), | ||
| "input_ids": paddle.zeros([8, 2048], dtype="int64"), | ||
| "seq_lens_this_time": paddle.zeros([8], dtype="int32"), | ||
| "seq_lens_encoder": paddle.zeros([8], dtype="int32"), | ||
| "seq_lens_decoder": paddle.zeros([8], dtype="int32"), | ||
| "step_idx": paddle.zeros([8], dtype="int32"), | ||
| "stop_flags": paddle.zeros([8], dtype="bool"), | ||
| "stop_nums": paddle.zeros([8], dtype="int32"), | ||
| "pre_ids": paddle.zeros([8], dtype="int64"), | ||
| "output_cum_offsets": paddle.zeros([8], dtype="int32"), | ||
| "output_padding_offset": paddle.zeros([8], dtype="int32"), | ||
| "ids_remove_padding": paddle.zeros([8], dtype="int64"), | ||
| "batch_id_per_token": paddle.zeros([8], dtype="int32"), | ||
| "cu_seqlens_q": paddle.zeros([9], dtype="int32"), | ||
| "cu_seqlens_k": paddle.zeros([9], dtype="int32"), | ||
| "decoder_batch_ids": paddle.zeros([8], dtype="int32"), | ||
| "decoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"), | ||
| "decoder_num_blocks_cpu": paddle.zeros([8], dtype="int32"), | ||
| "decoder_num_blocks_device": paddle.zeros([8], dtype="int32"), | ||
| "decoder_chunk_size_device": paddle.zeros([8], dtype="int32"), | ||
| "max_len_tensor_cpu": paddle.zeros([8], dtype="int32"), | ||
| "encoder_batch_ids": paddle.zeros([8], dtype="int32"), | ||
| "encoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"), | ||
| "encoder_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"), | ||
| "kv_batch_ids": paddle.zeros([8], dtype="int32"), | ||
| "kv_tile_ids_per_batch": paddle.zeros([8], dtype="int32"), | ||
| "kv_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"), | ||
| "prompt_lens": paddle.zeros([8], dtype="int32"), | ||
| "top_p": paddle.ones([8], dtype="float32") * 0.7, | ||
| "top_k": paddle.ones([8], dtype="int32") * 50, | ||
| "temperature": paddle.ones([8], dtype="float32") * 1.0, | ||
| "eos_token_id": paddle.ones([8, 1], dtype="int64") * 2, | ||
| "penalty_score": paddle.ones([8], dtype="float32"), | ||
| "frequency_score": paddle.zeros([8], dtype="float32"), | ||
| "presence_score": paddle.zeros([8], dtype="float32"), | ||
| "infer_seed": paddle.zeros([8], dtype="int64"), | ||
| "max_dec_len": paddle.ones([8], dtype="int32") * 256, | ||
| "min_dec_len": paddle.zeros([8], dtype="int32"), | ||
| "bad_tokens": paddle.zeros([8, 0], dtype="int64"), | ||
| "draft_tokens": paddle.zeros([8, 10], dtype="int64"), | ||
| "accept_tokens": paddle.zeros([8, 10], dtype="int64"), | ||
| "accept_num": paddle.zeros([8], dtype="int32"), | ||
| "encoder_block_lens": paddle.zeros([8], dtype="int32"), | ||
| "cu_batch_token_offset": paddle.zeros([9], dtype="int32"), | ||
| "temp_scaled_logprobs": None, | ||
| "top_p_normalized_logprobs": None, | ||
| "draft_logits": None, | ||
| } |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The test creates large mock tensors (e.g., paddle.zeros([8, 2048], dtype="int64") for input_ids) that consume significant memory but aren't used meaningfully in most tests. This could slow down test execution unnecessarily. Consider:
- Using smaller tensor dimensions for tests that don't require full-size tensors
- Lazy-initializing tensors only when needed
- Sharing tensor instances across tests when the values don't change
For example, input_ids with shape [8, 2048] could be [2, 10] for most tests unless the full dimensions are specifically being tested.
| @patch("fastdeploy.spec_decode.mtp.get_attention_backend") | ||
| @patch("fastdeploy.spec_decode.mtp.get_rope") | ||
| @patch("fastdeploy.spec_decode.mtp.MTPSampler") | ||
| def test_is_chunk_prefill_enabled(self, mock_sampler, mock_get_rope, mock_get_attn_backend, mock_get_model_loader): |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for the test method. According to the custom coding guidelines, functions should have clear documentation. Add a docstring that explains what behavior is being tested, for example:
"""Test that is_chunk_prefill_enabled always returns True for MTPProposer."""| def setUp(self): | ||
| """Set up test environment""" | ||
| # Mock FDConfig | ||
| self.mock_fd_config = MagicMock() | ||
| self.mock_fd_config.model_config = MagicMock() | ||
| self.mock_fd_config.model_config.architectures = ["ErnieMoeForCausalLM"] | ||
| self.mock_fd_config.model_config.num_hidden_layers = 32 | ||
| self.mock_fd_config.model_config.max_model_len = 2048 | ||
| self.mock_fd_config.model_config.hidden_size = 1024 | ||
| self.mock_fd_config.model_config.num_attention_heads = 16 | ||
| self.mock_fd_config.model_config.num_key_value_heads = 16 | ||
| self.mock_fd_config.model_config.head_dim = 64 | ||
| self.mock_fd_config.model_config.rope_theta = 10000.0 | ||
| self.mock_fd_config.model_config.enable_logprob = False | ||
| self.mock_fd_config.speculative_config = MagicMock() | ||
| self.mock_fd_config.speculative_config.mtp_strategy = "standard" | ||
| self.mock_fd_config.speculative_config.num_gpu_block_expand_ratio = 1.0 | ||
| self.mock_fd_config.speculative_config.model = "test_model" | ||
| self.mock_fd_config.speculative_config.quantization = "" | ||
| self.mock_fd_config.speculative_config.method = "mtp" | ||
| self.mock_fd_config.scheduler_config = MagicMock() | ||
| self.mock_fd_config.scheduler_config.splitwise_role = "mixed" | ||
| self.mock_fd_config.cache_config = MagicMock() | ||
| self.mock_fd_config.cache_config.block_size = 16 | ||
| self.mock_fd_config.cache_config.enc_dec_block_num = 0 | ||
| self.mock_fd_config.cache_config.total_block_num = 100 | ||
| self.mock_fd_config.cache_config.kv_cache_ratio = 0.9 | ||
| self.mock_fd_config.cache_config.enable_prefix_caching = False | ||
| self.mock_fd_config.cache_config.enable_chunked_prefill = False | ||
| self.mock_fd_config.graph_opt_config = MagicMock() | ||
| self.mock_fd_config.graph_opt_config.draft_model_use_cudagraph = False | ||
| self.mock_fd_config.graph_opt_config.cudagraph_capture_sizes = [] | ||
| self.mock_fd_config.graph_opt_config.sot_warmup_sizes = [] | ||
| self.mock_fd_config.parallel_config = MagicMock() | ||
| self.mock_fd_config.parallel_config.tensor_parallel_size = 1 | ||
| self.mock_fd_config.parallel_config.enable_expert_parallel = False | ||
| self.mock_fd_config.quant_config = None | ||
| self.mock_fd_config.load_config = MagicMock() | ||
| self.mock_fd_config.max_num_seqs = 8 | ||
| self.mock_fd_config.max_prefill_batch = 4 | ||
| self.mock_fd_config.model_config.enable_mm = False | ||
|
|
||
| # Mock main model | ||
| self.mock_main_model = MagicMock() | ||
|
|
||
| # Mock target model inputs | ||
| self.mock_target_model_inputs = { | ||
| "block_tables": paddle.zeros([8, 100], dtype="int32"), | ||
| "input_ids": paddle.zeros([8, 2048], dtype="int64"), | ||
| "seq_lens_this_time": paddle.zeros([8], dtype="int32"), | ||
| "seq_lens_encoder": paddle.zeros([8], dtype="int32"), | ||
| "seq_lens_decoder": paddle.zeros([8], dtype="int32"), | ||
| "step_idx": paddle.zeros([8], dtype="int32"), | ||
| "stop_flags": paddle.zeros([8], dtype="bool"), | ||
| "stop_nums": paddle.zeros([8], dtype="int32"), | ||
| "pre_ids": paddle.zeros([8], dtype="int64"), | ||
| "output_cum_offsets": paddle.zeros([8], dtype="int32"), | ||
| "output_padding_offset": paddle.zeros([8], dtype="int32"), | ||
| "ids_remove_padding": paddle.zeros([8], dtype="int64"), | ||
| "batch_id_per_token": paddle.zeros([8], dtype="int32"), | ||
| "cu_seqlens_q": paddle.zeros([9], dtype="int32"), | ||
| "cu_seqlens_k": paddle.zeros([9], dtype="int32"), | ||
| "decoder_batch_ids": paddle.zeros([8], dtype="int32"), | ||
| "decoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"), | ||
| "decoder_num_blocks_cpu": paddle.zeros([8], dtype="int32"), | ||
| "decoder_num_blocks_device": paddle.zeros([8], dtype="int32"), | ||
| "decoder_chunk_size_device": paddle.zeros([8], dtype="int32"), | ||
| "max_len_tensor_cpu": paddle.zeros([8], dtype="int32"), | ||
| "encoder_batch_ids": paddle.zeros([8], dtype="int32"), | ||
| "encoder_tile_ids_per_batch": paddle.zeros([8], dtype="int32"), | ||
| "encoder_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"), | ||
| "kv_batch_ids": paddle.zeros([8], dtype="int32"), | ||
| "kv_tile_ids_per_batch": paddle.zeros([8], dtype="int32"), | ||
| "kv_num_blocks_x_cpu": paddle.zeros([8], dtype="int32"), | ||
| "prompt_lens": paddle.zeros([8], dtype="int32"), | ||
| "top_p": paddle.ones([8], dtype="float32") * 0.7, | ||
| "top_k": paddle.ones([8], dtype="int32") * 50, | ||
| "temperature": paddle.ones([8], dtype="float32") * 1.0, | ||
| "eos_token_id": paddle.ones([8, 1], dtype="int64") * 2, | ||
| "penalty_score": paddle.ones([8], dtype="float32"), | ||
| "frequency_score": paddle.zeros([8], dtype="float32"), | ||
| "presence_score": paddle.zeros([8], dtype="float32"), | ||
| "infer_seed": paddle.zeros([8], dtype="int64"), | ||
| "max_dec_len": paddle.ones([8], dtype="int32") * 256, | ||
| "min_dec_len": paddle.zeros([8], dtype="int32"), | ||
| "bad_tokens": paddle.zeros([8, 0], dtype="int64"), | ||
| "draft_tokens": paddle.zeros([8, 10], dtype="int64"), | ||
| "accept_tokens": paddle.zeros([8, 10], dtype="int64"), | ||
| "accept_num": paddle.zeros([8], dtype="int32"), | ||
| "encoder_block_lens": paddle.zeros([8], dtype="int32"), | ||
| "cu_batch_token_offset": paddle.zeros([9], dtype="int32"), | ||
| "temp_scaled_logprobs": None, | ||
| "top_p_normalized_logprobs": None, | ||
| "draft_logits": None, | ||
| } |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The setUp method creates an extensive mock configuration with 71 lines of repetitive mock setup. This duplicated mock setup pattern across all test methods makes the tests harder to maintain. Consider:
- Extracting common mock setup into a helper method or fixture
- Creating a factory function that returns a properly configured mock FDConfig
- Using a test configuration file for default values
This would make tests more readable and easier to update when the configuration schema changes.
Motivation
NO.12 功能模块 fastdeploy/spec_decode/mtp.py 单测补充
Modifications
add unittest tests/spec_decode/test_mtp.py
Usage or Command
no need
Accuracy Tests
no need
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.