I’m observing that even after integrating vLLM, the parallelism doesn’t seem to be working as expected. Here are the details:
Even with vLLM enabled, parallel execution is not happening — it processes files sequentially.
For example, if processing one file takes X time, then processing 10 files takes roughly 10× that time, which defeats the expected parallel performance gains from vLLM.
I’m looking for faster execution across multiple files. Is there any recommended configuration, optimization, or alternative approach (other than vLLM) that I might be missing here?
Please guide on how to achieve faster multi-file processing using vLLM or suggest any alternative solution if vLLM does not support this use case directly.