Skip to content

Conversation

@penfever
Copy link

@penfever penfever commented Dec 1, 2025

Purpose

Fix vLLM v1 engine to properly handle pipeline parallelism (PP) by correctly managing KV cache groups when layers are distributed across different ranks.

Currently, when using PP > 1, the KV cache configuration is created globally with all layer names, but each PP rank only has a subset of layers. This causes failures when:

  1. init_attn_backend tries to access layers that don't exist on the current rank
  2. _allocate_kv_cache creates tensors for layers that aren't local
  3. build_attn_metadata attempts to build metadata for non-existent layers

This PR introduces:

  • resolve_layers_from_vllm_config() - returns both found layers AND missing layer names for better visibility
  • _prune_kv_cache_group_layers() - prunes KV cache groups to only include layers local to the current rank
  • Proper handling of None attention metadata builders for empty groups
  • Debug logging for skipped remote layers

Test Plan

Run the new unit test

pytest tests/config/test_vllm_layers.py -v

Run with pipeline parallelism (requires multi-GPU)

vllm serve --pipeline-parallel-size 2

Test Result

  • Pre-commit checks pass (ruff, mypy, typos, clang-format)
  • Unit tests pass for layer resolution logic

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Benjamin Feuer added 12 commits December 1, 2025 10:03
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
Signed-off-by: Benjamin Feuer <[email protected]>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant