multi-lora batching #14249

Diffizle · 2025-03-05T02:11:39Z

Diffizle
Mar 5, 2025

Does vllm support batching of prompts with different lora-adapters? Is there a more detailed example code?
the example shown in <examples/offline_inference/multilora_inference.py> does not seem to express this feature

Answered by jeejeelee

Mar 5, 2025

Does vllm support batching of prompts with different lora-adapters?

vLLM support this featrue

the example shown in <examples/offline_inference/multilora_inference.py> does not seem to express this feature

multilora_inference.py show this feature, different lora_id to represent different LoRAs.

View full answer

jeejeelee · 2025-03-05T12:06:13Z

jeejeelee
Mar 5, 2025
Collaborator

Does vllm support batching of prompts with different lora-adapters?

vLLM support this featrue

the example shown in <examples/offline_inference/multilora_inference.py> does not seem to express this feature

multilora_inference.py show this feature, different lora_id to represent different LoRAs.

1 reply

Diffizle Mar 6, 2025
Author

yeah, I also figured it out with debugging
when max_loras is set to > 1, it is indeed batching loras
thanks for comment !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-lora batching #14249

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

multi-lora batching #14249

Diffizle Mar 5, 2025

Replies: 1 comment · 1 reply

jeejeelee Mar 5, 2025 Collaborator

Diffizle Mar 6, 2025 Author

Diffizle
Mar 5, 2025

Replies: 1 comment 1 reply

jeejeelee
Mar 5, 2025
Collaborator

Diffizle Mar 6, 2025
Author