Skip to content

Conversation

@christian-pinto
Copy link
Member

@christian-pinto christian-pinto commented Nov 28, 2025

In case a of BS>1 if the entity samples require the same deployment type and there is one running, they will all use it at the same time. This spoils the test results as the experiments interfere with each other.

This PR makes the following changes:

  • Experiments using the same deployment can run in parallel only if there are as many parallel K8s environments available, either already existing or creating new ones.
  • K8s deployments using the same model, and starting at the same time, will elect one as the leader (we first one to get in) let it start and then continue with their processing. This is to avoid multiple vLLM instances downloading the same model model in the shared HF cache. This mechanism is to avoid corrupting the cache.

I have done the following tests:

  • Operation: Grouped Random walk on all entities
  • Actuator config: out of cluster with 3 max parallel environments (Why 3? Why not?)
  • Space: 16 entities in total
Batch size K8s deployment
2 all same deployment
3 all same deployment
6 all same deployment
2 4 deployment types
3 4 deployment types
6 4 deployment types

All tests successful.

I have also tested artificially failing one deployment while downloading a model for the first time and other deployments waiting. New Leader kicks in and the process continues

@michael-johnston and/or @AlessandroPomponio please try on your environment.

Example space with 16 entities all requesting the same K8s deployment

entitySpace:
  - identifier: model
    propertyDomain:
      values:
        - ibm-granite/granite-3.0-2b-instruct
  - identifier: "number_input_tokens"
    propertyDomain:
      values: [1024]
  - identifier: "request_rate"
    propertyDomain:
      values: [1, 2, 4, 8]
  - identifier: n_cpus
    propertyDomain:
      values: [2]
  - identifier: memory
    propertyDomain:
      values: ["128Gi"]
  - identifier: "max_batch_tokens"
    propertyDomain:
      values: [8192]
  - identifier: "max_num_seq"
    propertyDomain:
      values: [256]
  - identifier: "n_gpus"
    propertyDomain:
      values: [1]
  - identifier: "num_prompts"
    propertyDomain:
      values: [1, 2, 3, 4]
  - identifier: "gpu_type"
    propertyDomain:
      values: ["NVIDIA-A100-80GB-PCIe"]
experiments:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-deployment-v1
metadata:
  description: A space of vllm deployment configurations
  name: vllm_deployments_2_entities_same_deployment

Example space with 16 entities requesting 4 K8s deployment

entitySpace:
  - identifier: model
    propertyDomain:
      values:
        - ibm-granite/granite-3.0-2b-instruct
  - identifier: "number_input_tokens"
    propertyDomain:
      values: [1024]
  - identifier: "request_rate"
    propertyDomain:
      values: [1]
  - identifier: n_cpus
    propertyDomain:
      values: [2]
  - identifier: memory
    propertyDomain:
      values: ["128Gi"]
  - identifier: "max_batch_tokens"
    propertyDomain:
      values: [8192]
  - identifier: "max_num_seq"
    propertyDomain:
      values: [32, 64 , 128, 256]
  - identifier: "n_gpus"
    propertyDomain:
      values: [1]
  - identifier: "num_prompts"
    propertyDomain:
      values: [1, 2, 3, 4]
  - identifier: "gpu_type"
    propertyDomain:
      values: ["NVIDIA-A100-80GB-PCIe"]
experiments:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-deployment-v1
metadata:
  description: A space of vllm deployment configurations
  name: vllm_deployments_2_entities_same_deployment

Example space with 16 entities requesting 4 K8s deployment using two different models

entitySpace:
  - identifier: model
    propertyDomain:
      values:
        - ibm-granite/granite-3.0-2b-instruct
        - ibm-granite/granite-3.0-8b-instruct
  - identifier: "number_input_tokens"
    propertyDomain:
      values: [1024]
  - identifier: "request_rate"
    propertyDomain:
      values: [1]
  - identifier: n_cpus
    propertyDomain:
      values: [2]
  - identifier: memory
    propertyDomain:
      values: ["128Gi"]
  - identifier: "max_batch_tokens"
    propertyDomain:
      values: [8192]
  - identifier: "max_num_seq"
    propertyDomain:
      values: [32, 64]
  - identifier: "n_gpus"
    propertyDomain:
      values: [1]
  - identifier: "num_prompts"
    propertyDomain:
      values: [1, 2, 3, 4]
  - identifier: "gpu_type"
    propertyDomain:
      values: ["NVIDIA-A100-80GB-PCIe"]
experiments:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-deployment-v1
metadata:
  description: A space of vllm deployment configurations
  name: vllm_deployments_2_entities_same_deployment

sample random walk

metadata:
  name: randomwalk-grouped-vllm-performance-full
spaces:
  - your-space
actuatorConfigurationIdentifiers:
  - your-actuator-config

operation:
  module:
    moduleClass: RandomWalk
  parameters:
    numberEntities: all
    batchSize: 2
    singleMeasurement: False
    samplerConfig:
      mode: 'sequentialgrouped'
      samplerType: 'generator'
      grouping: #A unique combination of these properties is a new vLLM deployment
        - model
        - image
        - memory
        - max_batch_tokens
        - max_num_seq
        - n_gpus
        - gpu_type
        - n_cpus

@DRL-NextGen
Copy link
Member

DRL-NextGen commented Nov 28, 2025

No vulnerabilities found.

@michael-johnston
Copy link
Member

michael-johnston commented Dec 1, 2025

I see this behaviour with 2 environments batch size 3 using "Example space with 16 entities " only one model.

Leader starts deployments, other two wait (they all need same deployment)
Screenshot 2025-12-01 at 21 29 11

They then all appear to execute concurrently

Screenshot 2025-12-01 at 21 30 35

Two things
- Since I have max_environments=2 I expected two of the three to start creating environments
- In the case of one, I did not expect them all to report they were running (is this a logging bug?)

Edit: Wrong commit tested - actually working as expected

@michael-johnston
Copy link
Member

michael-johnston commented Dec 1, 2025

Also there is an issue, not related to the change where (in the case of one max environment)

  • you lose connection to the cluster as deployment is spinning up and the code is waiting
  • The max retries for checking the deployment is exceeded raising K8SConnectionError
  • However the deployment cannot be destroyed as there is no connection
  • The experiment is marked as invalid -> new experiments queue
  • The connection comes back however nothing can proceed as there is one "stray" deployment that will never be garbage collected.

Opened #273 to record this. Its rare so we can leave to as a future feature if we find it necessary.

Copy link
Member

@michael-johnston michael-johnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on a few scenarios and is working as expected 🚀

Copy link
Member

@AlessandroPomponio AlessandroPomponio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just two questions

@DRL-NextGen
Copy link
Member

Checks Summary

Last run: 2025-12-03T13:20:54.049Z

Code Risk Analyzer vulnerability scan found 2 vulnerabilities:

Severity Identifier Package Details Fix
🔷 Medium CVE-2025-50181 urllib3
urllib3 redirects are not disabled when retries are disabled on PoolManager instantiationGHSA-pq67-6m6q-mj2v

urllib3:2.3.0->kubernetes:34.1.0
2.5.0
🔷 Medium CVE-2025-50182 urllib3
urllib3 does not control redirects in browsers and Node.jsGHSA-48p4-8xcf-vxj5

urllib3:2.3.0->kubernetes:34.1.0
2.5.0

Copy link
Member

@AlessandroPomponio AlessandroPomponio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@christian-pinto christian-pinto added this pull request to the merge queue Dec 3, 2025
Merged via the queue into main with commit 34a64af Dec 3, 2025
18 checks passed
@christian-pinto christian-pinto deleted the cp-fix-vllm-perf-environments branch December 3, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants