feat: add vLLM scale-out deployment with nginx load balancing by maryamtahhan · Pull Request #109 · redhat-et/vllm-cpu-perf-eval

maryamtahhan · 2026-04-21T11:10:51Z

Adds turnkey Ansible automation for deploying multiple vLLM instances on a single DUT with configurable nginx load balancing. Enables performance testing at scale with flexible configuration of instance count (1-10), cores per instance (8/16/32), SMT, prefix caching, and load balancing policies (round-robin/least-conn/ip-hash). Includes comprehensive documentation, example inventory, and integration with existing GuideLLM benchmark playbooks.

coderabbitai · 2026-04-21T11:10:58Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: f6f8a21d-be26-4837-9732-3a92313a0512

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Fix incorrect default container image for vllm_bench that caused "Exec format error" on x86_64 servers. Problem: - Default was ARM image: quay.io/mtahhan/vllm:arm-base-cpu - Embedding tests failed on x86_64 EC2 with "Exec format error" - Error: "image platform (linux/arm64/v8) does not match (linux/amd64)" Root Cause: - vllm_bench config (line 98) used ARM-only image as default - Introduced in commit 5a494c8 during inventory restructure - Controller architecture (Mac ARM) is irrelevant - containers run on remote targets (EC2 x86_64) - LLM tests unaffected - they use guidellm with multi-arch images Fix: - Change default to match vLLM server image (x86_64 compatible) - Old: quay.io/mtahhan/vllm:arm-base-cpu - New: docker.io/vllm/vllm-openai-cpu:v0.18.0 - Same image as DUT ensures version consistency Impact: - Embedding tests now work on x86_64 servers (AWS EC2) - Users can still override with VLLM_BENCH_CONTAINER_IMAGE env var - No impact on LLM tests (different benchmark tool) Tested: - EC2 x86_64 instances (DUT + Load Generator) - Baseline and latency tests execute successfully Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan · 2026-05-27T08:54:08Z

@maryamtahhan make sure to also update mlfow integration and multi instance results conversion (to csv) to reflect this new feature.

Adds turnkey Ansible automation for deploying multiple vLLM instances on a single DUT with configurable nginx load balancing. Enables performance testing at scale with flexible configuration of instance count (1-10), cores per instance (8/16/32), SMT, prefix caching, and load balancing policies (round-robin/least-conn/ip-hash). Includes comprehensive documentation, example inventory, and integration with existing GuideLLM benchmark playbooks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan force-pushed the multi-instance-test branch from 44c0408 to 4d50b81 Compare May 27, 2026 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add vLLM scale-out deployment with nginx load balancing#109

feat: add vLLM scale-out deployment with nginx load balancing#109
maryamtahhan wants to merge 2 commits into
redhat-et:mainfrom
maryamtahhan:multi-instance-test

maryamtahhan commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Review skipped

Uh oh!

maryamtahhan commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maryamtahhan commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

maryamtahhan commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading