Skip to content

Commit c21a468

Browse files
author
Andrew Briand
committed
Comment and remove unused params
1 parent d9068b5 commit c21a468

File tree

1 file changed

+3
-7
lines changed

1 file changed

+3
-7
lines changed

vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -339,22 +339,18 @@ def flashinfer_trtllm_fp4_routed_moe(
339339
topk_ids: torch.Tensor, # Packed
340340
top_k: int,
341341
global_num_experts: int,
342-
num_expert_group: int | None,
343-
topk_group: int | None,
344-
custom_routing_function: object | None,
345342
) -> torch.Tensor:
346343
"""
347-
Apply FlashInfer TensorRT-LLM FP4 MoE kernel.
344+
Apply FlashInfer TensorRT-LLM FP4 MoE kernel. Uses packed
345+
input top k expert indices and scores rather than computing
346+
top k expert indices from scores.
348347
349348
Args:
350349
layer: The MoE layer with weights and scales
351350
x: Input tensor
352351
topk_ids: Ids of selected experts
353352
top_k: Number of experts to select per token
354353
global_num_experts: Total number of experts across all ranks
355-
num_expert_group: Number of expert groups (for grouped routing)
356-
topk_group: Top-k within each group
357-
custom_routing_function: Custom routing function (e.g., Llama4)
358354
359355
Returns:
360356
Output tensor from the MoE layer

0 commit comments

Comments
 (0)