Skip to content
This repository was archived by the owner on May 15, 2025. It is now read-only.

Conversation

kfirtoledo
Copy link

Add support for Kubernetes environment development using GIE with KGateway and vLLM
This PR introduces support for the vllm mode, enabling integration testing of GIE with vLLM.
It also adds support for the vllm-p2p mode, which includes:

  1. Deployment of Redis and LMCache alongside the vLLM image
  2. Peer-to-peer (P2P) communication between vLLM instances
  3. Use of the EPP image to enable kv-cache-aware routing

@kfirtoledo kfirtoledo added help wanted Extra attention is needed WIP labels Apr 25, 2025
@kfirtoledo kfirtoledo changed the title feat: add scripts for kubernetes dev env using vLLM and vLLM-p2p feat: Add scripts for kubernetes dev env using vLLM and vLLM-p2p Apr 25, 2025
Copy link
Collaborator

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Most of my comments are smaller, but I do have some questions for other folks as to what effect this will have.

Also, cc @elevran @shmuelk who I think should take a look.

@@ -0,0 +1,11 @@
apiVersion: kustomize.config.k8s.io/v1beta1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I see what you're doing with the naming now. The difference now is that any one of these deployments is deploying only a working VLLM stack, and then you have to deploy your inference-gateway stack separately.

cc @tumido @Gregory-Pereira @vMaroon just wanting to check with you on how this will work with your Helm chart?

@shaneutt shaneutt requested a review from shmuelk April 25, 2025 16:15
@kfirtoledo
Copy link
Author

@shaneutt , PTOL.

@kfirtoledo kfirtoledo removed help wanted Extra attention is needed WIP labels Apr 27, 2025
@kfirtoledo
Copy link
Author

@shaneutt and @elevran PTAL

@shaneutt shaneutt self-requested a review April 28, 2025 12:50
Copy link
Collaborator

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock.

Once @elevran is 👍, I'm 👍

@elevran
Copy link
Collaborator

elevran commented Apr 29, 2025

@kfirtoledo LGTM, any idea on the CICD failure?

@kfirtoledo kfirtoledo merged commit f67cc34 into neuralmagic:dev Apr 29, 2025
1 check failed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenShift Dev Environment - Full Gateway+GIE Stack Deployment with VLLM and VLLM-P2P mode

5 participants