Skip to content

Commit d9068b5

Browse files
author
Andrew Briand
committed
Comments
1 parent dd892c7 commit d9068b5

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

vllm/model_executor/layers/quantization/modelopt.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1501,10 +1501,13 @@ def apply(
15011501
router_logits=router_logits,
15021502
)
15031503

1504+
# EPLB path
15041505
if (
15051506
self.allow_flashinfer
15061507
and self.flashinfer_moe_backend == FlashinferMoeBackend.TENSORRT_LLM
15071508
):
1509+
# Pack top k ids and expert weights into a single int32 tensor, as
1510+
# required by TRT-LLM
15081511
packed_tensor = (topk_ids.to(torch.int32) << 16) | topk_weights.to(
15091512
torch.bfloat16
15101513
).view(torch.int16)

0 commit comments

Comments
 (0)