Skip to content

Commit a80c163

Browse files
committed
feat: add scripts for kubernetes dev env using vLLM and vLLM-p2p (setup for kvcache-aware)
Signed-off-by: Kfir Toledo <[email protected]>
1 parent c744d00 commit a80c163

19 files changed

+525
-45
lines changed

DEVELOPMENT.md

+20-3
Original file line numberDiff line numberDiff line change
@@ -178,14 +178,31 @@ Export the name of the `Secret` to the environment:
178178
export REGISTRY_SECRET=anna-pull-secret
179179
```
180180

181-
Now you need to provide several other environment variables. You'll need to
182-
indicate the location and tag of the `vllm-sim` image:
181+
Set the `VLLM_MODE` environment variable based on which version of vLLM you want to deploy:
182+
183+
- `vllm-sim`: Lightweight simulator for simple environments
184+
- `vllm`: Full vLLM model server for real inference
185+
- `vllm-p2p`: Full vLLM with LMCache P2P support for distributed KV caching
186+
187+
```console
188+
export VLLM_MODE=vllm-sim # or vllm / vllm-p2p
189+
```
190+
Each mode has default image values, but you can override them:
191+
192+
For vllm-sim:
183193

184194
```console
185195
export VLLM_SIM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
186196
export VLLM_SIM_TAG="<YOUR_TAG>"
187197
```
188198

199+
For vllm and vllm-p2p:
200+
201+
```console
202+
export VLLM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
203+
export VLLM_TAG="<YOUR_TAG>"
204+
```
205+
189206
The same thing will need to be done for the EPP:
190207

191208
```console
@@ -203,7 +220,7 @@ This will deploy the entire stack to whatever namespace you chose. You can test
203220
by exposing the inference `Gateway` via port-forward:
204221

205222
```console
206-
kubectl -n ${NAMESPACE} port-forward service/inference-gateway-istio 8080:80
223+
kubectl -n ${NAMESPACE} port-forward service/inference-gateway 8080:80
207224
```
208225

209226
And making requests with `curl`:

Makefile

+2-5
Original file line numberDiff line numberDiff line change
@@ -780,11 +780,8 @@ environment.dev.kubernetes: check-kubectl check-kustomize check-envsubst
780780
# ------------------------------------------------------------------------------
781781
.PHONY: clean.environment.dev.kubernetes
782782
clean.environment.dev.kubernetes: check-kubectl check-kustomize check-envsubst
783-
ifndef NAMESPACE
784-
$(error "Error: NAMESPACE is required but not set")
785-
endif
786-
@echo "INFO: cleaning up dev environment in $(NAMESPACE)"
787-
kustomize build deploy/environments/dev/kubernetes-kgateway | envsubst | kubectl -n "${NAMESPACE}" delete -f -
783+
@CLEAN=true ./scripts/kubernetes-dev-env.sh 2>&1
784+
@echo "INFO: Finish cleanup development environment for $(VLLM_MODE) mode in namespace $(NAMESPACE)"
788785

789786
# -----------------------------------------------------------------------------
790787
# TODO: these are old aliases that we still need for the moment, but will be
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: ${REDIS_NAME}
5+
labels:
6+
app.kubernetes.io/name: redis
7+
app.kubernetes.io/component: redis-lookup-server
8+
spec:
9+
replicas: ${REDIS_REPLICA_COUNT}
10+
selector:
11+
matchLabels:
12+
app.kubernetes.io/name: redis
13+
app.kubernetes.io/component: redis-lookup-server
14+
template:
15+
metadata:
16+
labels:
17+
app.kubernetes.io/name: redis
18+
app.kubernetes.io/component: redis-lookup-server
19+
spec:
20+
containers:
21+
- name: lookup-server
22+
image: ${REDIS_IMAGE}:${REDIS_TAG}
23+
imagePullPolicy: Always
24+
command:
25+
- redis-server
26+
ports:
27+
- containerPort: ${REDIS_TARGET_PORT}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
apiVersion: v1
2+
kind: Secret
3+
metadata:
4+
name: ${HF_SECRET_NAME}
5+
namespace: ${NAMESPACE}
6+
labels:
7+
app.kubernetes.io/name: vllm
8+
app.kubernetes.io/component: secret
9+
type: Opaque
10+
data:
11+
${HF_SECRET_KEY}: ${HF_TOKEN}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: ${VLLM_DEPLOYMENT_NAME}
5+
labels:
6+
app.kubernetes.io/name: vllm
7+
app.kubernetes.io/model: ${MODEL_LABEL}
8+
app.kubernetes.io/component: vllm
9+
spec:
10+
replicas: ${VLLM_REPLICA_COUNT}
11+
selector:
12+
matchLabels:
13+
app.kubernetes.io/name: vllm
14+
app.kubernetes.io/component: vllm
15+
app.kubernetes.io/model: ${MODEL_LABEL}
16+
template:
17+
metadata:
18+
labels:
19+
app.kubernetes.io/name: vllm
20+
app.kubernetes.io/component: vllm
21+
app.kubernetes.io/model: ${MODEL_LABEL}
22+
spec:
23+
containers:
24+
- name: vllm
25+
image: ${VLLM_IMAGE}:${VLLM_TAG}
26+
imagePullPolicy: Always
27+
command:
28+
- /bin/sh
29+
- "-c"
30+
args:
31+
- |
32+
export LMCACHE_DISTRIBUTED_URL=${POD_IP}:80 &&
33+
vllm serve ${MODEL_NAME}
34+
--host 0.0.0.0
35+
--port 8000
36+
--enable-chunked-prefill false
37+
--max-model-len ${MAX_MODEL_LEN}
38+
--kv-transfer-config
39+
'{"kv_connector":"LMCacheConnector","kv_role":"kv_both"}'
40+
ports:
41+
- name: http
42+
containerPort: 8000
43+
- name: lmcache-dist
44+
containerPort: 80
45+
env:
46+
- name: HF_TOKEN
47+
valueFrom:
48+
secretKeyRef:
49+
name: ${HF_SECRET_NAME}
50+
key: ${HF_SECRET_KEY}
51+
- name: POD_IP
52+
valueFrom:
53+
fieldRef:
54+
fieldPath: status.podIP
55+
volumeMounts:
56+
- name: model-storage
57+
mountPath: ${VOLUME_MOUNT_PATH}
58+
volumes:
59+
- name: model-storage
60+
persistentVolumeClaim:
61+
claimName: ${PVC_NAME}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
4+
namespace: ${NAMESPACE}
5+
6+
resources:
7+
- deployments/vllm-deployment.yaml
8+
- deployments/redis-deployment.yaml
9+
- service/redis-service.yaml
10+
- pvc/volume.yaml
11+
- deployments/secret.yaml
12+
13+
images:
14+
- name: vllm/vllm-openai
15+
newName: ${VLLM_IMAGE}
16+
newTag: ${VLLM_TAG}
17+
- name: redis
18+
newName: ${REDIS_IMAGE}
19+
newTag: ${REDIS_TAG}
20+
21+
configMapGenerator:
22+
- name: model-config
23+
literals:
24+
- MODEL_NAME=${MODEL_NAME}
25+
- MODEL_LABEL=${MODEL_LABEL}
26+
- POOL_LABEL=${POOL_LABEL}
27+
- REDIS_ENABLED=${REDIS_ENABLED}
28+
29+
generatorOptions:
30+
disableNameSuffixHash: true
+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
apiVersion: v1
2+
kind: PersistentVolumeClaim
3+
metadata:
4+
name: ${PVC_NAME}
5+
namespace: ${NAMESPACE}
6+
labels:
7+
app.kubernetes.io/name: vllm
8+
app.kubernetes.io/component: storage
9+
app.kubernetes.io/model: ${MODEL_LABEL}
10+
finalizers:
11+
- kubernetes.io/pvc-protection
12+
spec:
13+
accessModes:
14+
- ${PVC_ACCESS_MODE}
15+
resources:
16+
requests:
17+
storage: ${PVC_SIZE}
18+
storageClassName: ${PVC_STORAGE_CLASS}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: ${REDIS_NAME}
5+
namespace: ${NAMESPACE}
6+
labels:
7+
app.kubernetes.io/name: redis
8+
app.kubernetes.io/component: redis-lookup-server
9+
spec:
10+
ports:
11+
- name: lookupserver-port
12+
protocol: TCP
13+
port: ${REDIS_PORT}
14+
targetPort: ${REDIS_TARGET_PORT}
15+
type: ${REDIS_SERVICE_TYPE}
16+
selector:
17+
app.kubernetes.io/name: redis
18+
app.kubernetes.io/component: redis-lookup-server
+143
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: ${VLLM_DEPLOYMENT_NAME}
5+
spec:
6+
replicas: 3
7+
selector:
8+
matchLabels:
9+
app: vllm-llama3-8b-instruct
10+
template:
11+
metadata:
12+
labels:
13+
app: vllm-llama3-8b-instruct
14+
spec:
15+
securityContext:
16+
runAsUser: ${PROXY_UID}
17+
runAsNonRoot: true
18+
seccompProfile:
19+
type: RuntimeDefault
20+
containers:
21+
- name: vllm
22+
image: "vllm/vllm-openai:latest"
23+
imagePullPolicy: IfNotPresent
24+
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
25+
args:
26+
- "--model"
27+
- "meta-llama/Llama-3.1-8B-Instruct"
28+
- "--tensor-parallel-size"
29+
- "1"
30+
- "--port"
31+
- "8000"
32+
- "--max-num-seq"
33+
- "1024"
34+
- "--compilation-config"
35+
- "3"
36+
- "--enable-lora"
37+
- "--max-loras"
38+
- "2"
39+
- "--max-lora-rank"
40+
- "8"
41+
- "--max-cpu-loras"
42+
- "12"
43+
env:
44+
- name: VLLM_USE_V1
45+
value: "1"
46+
- name: PORT
47+
value: "8000"
48+
- name: HUGGING_FACE_HUB_TOKEN
49+
valueFrom:
50+
secretKeyRef:
51+
name: hf-token
52+
key: token
53+
- name: VLLM_ALLOW_RUNTIME_LORA_UPDATING
54+
value: "true"
55+
- name: XDG_CACHE_HOME
56+
value: /cache
57+
- name: HF_HOME
58+
value: /cache/huggingface
59+
- name: FLASHINFER_CACHE_DIR
60+
value: /cache/flashinfer
61+
ports:
62+
- containerPort: 8000
63+
name: http
64+
protocol: TCP
65+
lifecycle:
66+
preStop:
67+
sleep:
68+
seconds: 30
69+
livenessProbe:
70+
httpGet:
71+
path: /health
72+
port: http
73+
scheme: HTTP
74+
periodSeconds: 1
75+
successThreshold: 1
76+
failureThreshold: 5
77+
timeoutSeconds: 1
78+
readinessProbe:
79+
httpGet:
80+
path: /health
81+
port: http
82+
scheme: HTTP
83+
periodSeconds: 1
84+
successThreshold: 1
85+
failureThreshold: 1
86+
timeoutSeconds: 1
87+
startupProbe:
88+
httpGet:
89+
path: /health
90+
port: http
91+
scheme: HTTP
92+
failureThreshold: 600
93+
initialDelaySeconds: 2
94+
periodSeconds: 1
95+
resources:
96+
limits:
97+
nvidia.com/gpu: 1
98+
requests:
99+
nvidia.com/gpu: 1
100+
volumeMounts:
101+
- mountPath: /cache
102+
name: hf-cache
103+
- mountPath: /dev/shm
104+
name: shm
105+
- mountPath: /adapters
106+
name: adapters
107+
securityContext:
108+
allowPrivilegeEscalation: false
109+
capabilities:
110+
drop:
111+
- ALL
112+
initContainers:
113+
- name: lora-adapter-syncer
114+
tty: true
115+
stdin: true
116+
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/lora-syncer:main
117+
restartPolicy: Always
118+
imagePullPolicy: Always
119+
env:
120+
- name: DYNAMIC_LORA_ROLLOUT_CONFIG
121+
value: "/config/configmap.yaml"
122+
volumeMounts:
123+
- name: config-volume
124+
mountPath: /config
125+
securityContext:
126+
allowPrivilegeEscalation: false
127+
capabilities:
128+
drop:
129+
- ALL
130+
restartPolicy: Always
131+
enableServiceLinks: false
132+
terminationGracePeriodSeconds: 130
133+
volumes:
134+
- name: hf-cache
135+
emptyDir: {}
136+
- name: shm
137+
emptyDir:
138+
medium: Memory
139+
- name: adapters
140+
emptyDir: {}
141+
- name: config-volume
142+
configMap:
143+
name: vllm-llama3-8b-instruct-adapters

0 commit comments

Comments
 (0)