File tree Expand file tree Collapse file tree 1 file changed +3
-7
lines changed
vllm/model_executor/layers/quantization/utils Expand file tree Collapse file tree 1 file changed +3
-7
lines changed Original file line number Diff line number Diff line change @@ -339,22 +339,18 @@ def flashinfer_trtllm_fp4_routed_moe(
339339 topk_ids : torch .Tensor , # Packed
340340 top_k : int ,
341341 global_num_experts : int ,
342- num_expert_group : int | None ,
343- topk_group : int | None ,
344- custom_routing_function : object | None ,
345342) -> torch .Tensor :
346343 """
347- Apply FlashInfer TensorRT-LLM FP4 MoE kernel.
344+ Apply FlashInfer TensorRT-LLM FP4 MoE kernel. Uses packed
345+ input top k expert indices and scores rather than computing
346+ top k expert indices from scores.
348347
349348 Args:
350349 layer: The MoE layer with weights and scales
351350 x: Input tensor
352351 topk_ids: Ids of selected experts
353352 top_k: Number of experts to select per token
354353 global_num_experts: Total number of experts across all ranks
355- num_expert_group: Number of expert groups (for grouped routing)
356- topk_group: Top-k within each group
357- custom_routing_function: Custom routing function (e.g., Llama4)
358354
359355 Returns:
360356 Output tensor from the MoE layer
You can’t perform that action at this time.
0 commit comments