fix(vllm_performance): Avoid multiple experiments using the same kubernetes deployment at the same time #268

christian-pinto · 2025-11-28T15:39:50Z

In case a of BS>1 if the entity samples require the same deployment type and there is one running, they will all use it at the same time. This spoils the test results as the experiments interfere with each other.

This PR makes the following changes:

Experiments using the same deployment can run in parallel only if there are as many parallel K8s environments available, either already existing or creating new ones.
K8s deployments using the same model, and starting at the same time, will elect one as the leader (we first one to get in) let it start and then continue with their processing. This is to avoid multiple vLLM instances downloading the same model model in the shared HF cache. This mechanism is to avoid corrupting the cache.

I have done the following tests:

Operation: Grouped Random walk on all entities
Actuator config: out of cluster with 3 max parallel environments (Why 3? Why not?)
Space: 16 entities in total

Batch size	K8s deployment
2	all same deployment
3	all same deployment
6	all same deployment
2	4 deployment types
3	4 deployment types
6	4 deployment types

All tests successful.

I have also tested artificially failing one deployment while downloading a model for the first time and other deployments waiting. New Leader kicks in and the process continues

@michael-johnston and/or @AlessandroPomponio please try on your environment.

Example space with 16 entities all requesting the same K8s deployment

entitySpace:
  - identifier: model
    propertyDomain:
      values:
        - ibm-granite/granite-3.0-2b-instruct
  - identifier: "number_input_tokens"
    propertyDomain:
      values: [1024]
  - identifier: "request_rate"
    propertyDomain:
      values: [1, 2, 4, 8]
  - identifier: n_cpus
    propertyDomain:
      values: [2]
  - identifier: memory
    propertyDomain:
      values: ["128Gi"]
  - identifier: "max_batch_tokens"
    propertyDomain:
      values: [8192]
  - identifier: "max_num_seq"
    propertyDomain:
      values: [256]
  - identifier: "n_gpus"
    propertyDomain:
      values: [1]
  - identifier: "num_prompts"
    propertyDomain:
      values: [1, 2, 3, 4]
  - identifier: "gpu_type"
    propertyDomain:
      values: ["NVIDIA-A100-80GB-PCIe"]
experiments:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-deployment-v1
metadata:
  description: A space of vllm deployment configurations
  name: vllm_deployments_2_entities_same_deployment

Example space with 16 entities requesting 4 K8s deployment

entitySpace:
  - identifier: model
    propertyDomain:
      values:
        - ibm-granite/granite-3.0-2b-instruct
  - identifier: "number_input_tokens"
    propertyDomain:
      values: [1024]
  - identifier: "request_rate"
    propertyDomain:
      values: [1]
  - identifier: n_cpus
    propertyDomain:
      values: [2]
  - identifier: memory
    propertyDomain:
      values: ["128Gi"]
  - identifier: "max_batch_tokens"
    propertyDomain:
      values: [8192]
  - identifier: "max_num_seq"
    propertyDomain:
      values: [32, 64 , 128, 256]
  - identifier: "n_gpus"
    propertyDomain:
      values: [1]
  - identifier: "num_prompts"
    propertyDomain:
      values: [1, 2, 3, 4]
  - identifier: "gpu_type"
    propertyDomain:
      values: ["NVIDIA-A100-80GB-PCIe"]
experiments:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-deployment-v1
metadata:
  description: A space of vllm deployment configurations
  name: vllm_deployments_2_entities_same_deployment

Example space with 16 entities requesting 4 K8s deployment using two different models

entitySpace:
  - identifier: model
    propertyDomain:
      values:
        - ibm-granite/granite-3.0-2b-instruct
        - ibm-granite/granite-3.0-8b-instruct
  - identifier: "number_input_tokens"
    propertyDomain:
      values: [1024]
  - identifier: "request_rate"
    propertyDomain:
      values: [1]
  - identifier: n_cpus
    propertyDomain:
      values: [2]
  - identifier: memory
    propertyDomain:
      values: ["128Gi"]
  - identifier: "max_batch_tokens"
    propertyDomain:
      values: [8192]
  - identifier: "max_num_seq"
    propertyDomain:
      values: [32, 64]
  - identifier: "n_gpus"
    propertyDomain:
      values: [1]
  - identifier: "num_prompts"
    propertyDomain:
      values: [1, 2, 3, 4]
  - identifier: "gpu_type"
    propertyDomain:
      values: ["NVIDIA-A100-80GB-PCIe"]
experiments:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-deployment-v1
metadata:
  description: A space of vllm deployment configurations
  name: vllm_deployments_2_entities_same_deployment

sample random walk

metadata:
  name: randomwalk-grouped-vllm-performance-full
spaces:
  - your-space
actuatorConfigurationIdentifiers:
  - your-actuator-config

operation:
  module:
    moduleClass: RandomWalk
  parameters:
    numberEntities: all
    batchSize: 2
    singleMeasurement: False
    samplerConfig:
      mode: 'sequentialgrouped'
      samplerType: 'generator'
      grouping: #A unique combination of these properties is a new vLLM deployment
        - model
        - image
        - memory
        - max_batch_tokens
        - max_num_seq
        - n_gpus
        - gpu_type
        - n_cpus

…ments in parallel Signed-off-by: Christian Pinto <[email protected]>

…onments Signed-off-by: Christian Pinto <[email protected]>

…oyment Signed-off-by: Christian Pinto <[email protected]>

DRL-NextGen · 2025-11-28T15:55:33Z

No vulnerabilities found.

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/deployment_management.py

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/create_environment.py

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/deployment_management.py

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/env_manager.py

michael-johnston · 2025-12-01T21:35:42Z

~~I see this behaviour with 2 environments batch size 3 using "Example space with 16 entities " only one model.~~

~~Leader starts deployments, other two wait (they all need same deployment)~~

~~They then all appear to execute concurrently~~

~~Two things~~
~~- Since I have max_environments=2 I expected two of the three to start creating environments~~
~~- In the case of one, I did not expect them all to report they were running (is this a logging bug?)~~

Edit: Wrong commit tested - actually working as expected

michael-johnston · 2025-12-01T21:38:52Z

Also there is an issue, not related to the change where (in the case of one max environment)

you lose connection to the cluster as deployment is spinning up and the code is waiting
The max retries for checking the deployment is exceeded raising K8SConnectionError
However the deployment cannot be destroyed as there is no connection
The experiment is marked as invalid -> new experiments queue
The connection comes back however nothing can proceed as there is one "stray" deployment that will never be garbage collected.

Opened #273 to record this. Its rare so we can leave to as a future feature if we find it necessary.

Signed-off-by: Christian Pinto <[email protected]>

… use Signed-off-by: Christian Pinto <[email protected]>

Signed-off-by: Christian Pinto <[email protected]>

…onments

michael-johnston

Tested on a few scenarios and is working as expected 🚀

Signed-off-by: Christian Pinto <[email protected]>

AlessandroPomponio

looks good, just two questions

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/experiment_executor.py

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/env_manager.py

Signed-off-by: Christian Pinto <[email protected]>

DRL-NextGen · 2025-12-03T13:20:56Z

Checks Summary

Last run: 2025-12-03T13:20:54.049Z

Code Risk Analyzer vulnerability scan found 2 vulnerabilities:

Severity	Identifier	Package	Details	Fix
🔷 Medium	CVE-2025-50181	urllib3	urllib3 redirects are not disabled when retries are disabled on PoolManager instantiation GHSA-pq67-6m6q-mj2v urllib3:2.3.0->kubernetes:34.1.0	2.5.0
🔷 Medium	CVE-2025-50182	urllib3	urllib3 does not control redirects in browsers and Node.js GHSA-48p4-8xcf-vxj5 urllib3:2.3.0->kubernetes:34.1.0	2.5.0

AlessandroPomponio

LGTM thanks

christian-pinto added 4 commits November 25, 2025 12:39

fix(vllm_performance): Avoid using an environment for multiple experi…

e5628bf

…ments in parallel Signed-off-by: Christian Pinto <[email protected]>

Merge branch 'main' of github.com:IBM/ado into cp-fix-vllm-perf-envir…

66abdec

…onments Signed-off-by: Christian Pinto <[email protected]>

Merge branch 'main' into cp-fix-vllm-perf-environments

c055b22

fix(vllm_performance): Avoid multiple experiments using the same depl…

fd011cf

…oyment Signed-off-by: Christian Pinto <[email protected]>

christian-pinto requested review from AlessandroPomponio and michael-johnston and removed request for michael-johnston November 28, 2025 15:42

AlessandroPomponio requested changes Nov 28, 2025

View reviewed changes

christian-pinto added 4 commits December 2, 2025 10:00

fix(vllm_performance): Fixed wrong import

9aadafb

Signed-off-by: Christian Pinto <[email protected]>

fix(vllm_performance): Ensure port-forward processes are killed after…

41e7ddb

… use Signed-off-by: Christian Pinto <[email protected]>

fix(chore): Addressed review comments

614b536

Signed-off-by: Christian Pinto <[email protected]>

Merge branch 'main' of github.com:IBM/ado into cp-fix-vllm-perf-envir…

57a1d64

…onments

michael-johnston previously approved these changes Dec 3, 2025

View reviewed changes

fix(chore): Addressed more review comments

4347dfc

Signed-off-by: Christian Pinto <[email protected]>

christian-pinto dismissed michael-johnston’s stale review via 4347dfc December 3, 2025 11:33

AlessandroPomponio reviewed Dec 3, 2025

View reviewed changes

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/experiment_executor.py Outdated Show resolved Hide resolved

plugins/actuators/vllm_performance/ado_actuators/vllm_performance/env_manager.py Outdated Show resolved Hide resolved

fix(chore): Addressed even more review comments

90324bc

Signed-off-by: Christian Pinto <[email protected]>

AlessandroPomponio approved these changes Dec 3, 2025

View reviewed changes

christian-pinto added this pull request to the merge queue Dec 3, 2025

Merged via the queue into main with commit 34a64af Dec 3, 2025
18 checks passed

christian-pinto deleted the cp-fix-vllm-perf-environments branch December 3, 2025 15:18

fix(vllm_performance): Avoid multiple experiments using the same kubernetes deployment at the same time #268

fix(vllm_performance): Avoid multiple experiments using the same kubernetes deployment at the same time #268

Uh oh!

Conversation

christian-pinto commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DRL-NextGen commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michael-johnston commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-johnston commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-johnston left a comment

Choose a reason for hiding this comment

Uh oh!

AlessandroPomponio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DRL-NextGen commented Dec 3, 2025

Uh oh!

AlessandroPomponio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

christian-pinto commented Nov 28, 2025 •

edited

Loading

DRL-NextGen commented Nov 28, 2025 •

edited

Loading

michael-johnston commented Dec 1, 2025 •

edited

Loading

michael-johnston commented Dec 1, 2025 •

edited

Loading