This repository was archived by the owner on May 15, 2025. It is now read-only.
forked from kubernetes-sigs/gateway-api-inference-extension
-
Notifications
You must be signed in to change notification settings - Fork 6
feat: Add scripts for kubernetes dev env using vLLM and vLLM-p2p #60
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
bf8017f
feat: add scripts for kubernetes dev env using vLLM and vLLM-p2p (se…
kfirtoledo 78157d5
[fix]: Small fixes for development YAMLs
kfirtoledo a11e984
[fix]: Small fixes for deployment and fix comments
kfirtoledo 937bb50
[fix]: fix typos and edit the Readme and env vars
kfirtoledo 17a23e5
[fix] Fix the kind environemnt and set gateway service to be NodePort
kfirtoledo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,10 @@ | ||
apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||
kind: InferencePool | ||
metadata: | ||
name: vllm-llama3-8b-instruct | ||
name: ${POOL_NAME} | ||
spec: | ||
targetPortNumber: 8000 | ||
selector: | ||
app: vllm-llama3-8b-instruct | ||
app: ${POOL_NAME} | ||
extensionRef: | ||
name: endpoint-picker |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# ------------------------------------------------------------------------------ | ||
# vLLM P2P Deployment | ||
# | ||
# This deploys the full vLLM model server, capable of serving real models such | ||
# as Llama 3.1-8B-Instruct via the OpenAI-compatible API. It is intended for | ||
# environments with GPU resources and where full inference capabilities are | ||
# required. | ||
# in additon it add LMcache a LLM serving engine extension using Redis to vLLM image | ||
# | ||
# The deployment can be customized using environment variables to set: | ||
# - The container image and tag (VLLM_IMAGE, VLLM_TAG) | ||
# - The model to load (MODEL_NAME) | ||
# | ||
# This setup is suitable for testing on Kubernetes (including | ||
# GPU-enabled nodes or clusters with scheduling for `nvidia.com/gpu`). | ||
# ----------------------------------------------------------------------------- | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
shaneutt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
kind: Kustomization | ||
|
||
resources: | ||
- vllm-deployment.yaml | ||
- redis-deployment.yaml | ||
- redis-service.yaml | ||
- secret.yaml | ||
|
||
images: | ||
- name: vllm/vllm-openai | ||
newName: ${VLLM_IMAGE} | ||
newTag: ${VLLM_TAG} | ||
- name: redis | ||
newName: ${REDIS_IMAGE} | ||
newTag: ${REDIS_TAG} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: ${REDIS_DEPLOYMENT_NAME} | ||
labels: | ||
app.kubernetes.io/name: redis | ||
app.kubernetes.io/component: redis-lookup-server | ||
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
app.kubernetes.io/name: redis | ||
app.kubernetes.io/component: redis-lookup-server | ||
template: | ||
metadata: | ||
labels: | ||
app.kubernetes.io/name: redis | ||
app.kubernetes.io/component: redis-lookup-server | ||
spec: | ||
containers: | ||
- name: lookup-server | ||
image: ${REDIS_IMAGE}:${REDIS_TAG} | ||
imagePullPolicy: IfNotPresent | ||
command: | ||
- redis-server | ||
ports: | ||
- name: redis-port | ||
containerPort: ${REDIS_TARGET_PORT} | ||
protocol: TCP | ||
resources: | ||
limits: | ||
cpu: "4" | ||
memory: 10G | ||
requests: | ||
cpu: "4" | ||
memory: 8G | ||
terminationMessagePath: /dev/termination-log | ||
terminationMessagePolicy: File | ||
restartPolicy: Always | ||
terminationGracePeriodSeconds: 30 | ||
dnsPolicy: ClusterFirst | ||
securityContext: {} | ||
schedulerName: default-scheduler | ||
strategy: | ||
type: RollingUpdate | ||
rollingUpdate: | ||
maxUnavailable: 25% | ||
maxSurge: 25% | ||
revisionHistoryLimit: 10 | ||
progressDeadlineSeconds: 600 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.