Skip to content
This repository was archived by the owner on May 15, 2025. It is now read-only.

Commit b189362

Browse files
committed
[fix]: Small fixes for deployment and fix comments
Signed-off-by: Kfir Toledo <[email protected]>
1 parent 1a7fa8e commit b189362

25 files changed

+152
-120
lines changed

DEVELOPMENT.md

Lines changed: 60 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,13 @@ Create the namespace:
152152
```console
153153
kubectl create namespace ${NAMESPACE}
154154
```
155+
Set the default namespace for kubectl commands
156+
157+
```console
158+
kubectl config set-context --current --namespace="${NAMESPACE}"
159+
```
160+
161+
> NOTE: If you are using OpenShift (oc CLI), use the following instead: `oc project "${NAMESPACE}"`
155162
156163
You'll need to provide a `Secret` with the login credentials for your private
157164
repository (e.g. quay.io). It should look something like this:
@@ -178,13 +185,6 @@ Export the name of the `Secret` to the environment:
178185
export REGISTRY_SECRET=anna-pull-secret
179186
```
180187

181-
You can optionally set a custom EPP image (otherwise, the default will be used):
182-
183-
```console
184-
export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
185-
export EPP_TAG="<YOUR_TAG>"
186-
```
187-
188188
Set the `VLLM_MODE` environment variable based on which version of vLLM you want to deploy:
189189

190190
- `vllm-sim`: Lightweight simulator for simple environments
@@ -194,24 +194,10 @@ Set the `VLLM_MODE` environment variable based on which version of vLLM you want
194194
```console
195195
export VLLM_MODE=vllm-sim # or vllm / vllm-p2p
196196
```
197-
Each mode has default image values, but you can override them:
198197

199-
For vllm-sim:
200-
201-
```console
202-
export VLLM_SIM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
203-
export VLLM_SIM_TAG="<YOUR_TAG>"
204-
```
205-
206-
For vllm and vllm-p2p:
207-
- set Vllm image:
208-
```console
209-
export VLLM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
210-
export VLLM_TAG="<YOUR_TAG>"
211-
```
212198
- Set hugging face token variable:
213199
export HF_TOKEN="<HF_TOKEN>"
214-
**Warning**: For vllm mode, the default image uses llama3-8b and vllm-mistral. Make sure you have permission to access these files in their respective repositories.
200+
**Warning**: For vllm mode, the default image uses llama3-8b. Make sure you have permission to access these files in their respective repositories.
215201

216202
Once all this is set up, you can deploy the environment:
217203

@@ -222,30 +208,73 @@ make environment.dev.kubernetes
222208
This will deploy the entire stack to whatever namespace you chose. You can test
223209
by exposing the inference `Gateway` via port-forward:
224210

225-
```console
211+
```bash
226212
kubectl -n ${NAMESPACE} port-forward service/inference-gateway 8080:80
227213
```
228214

229215
And making requests with `curl`:
230216
- vllm-sim
231217

232-
```console
218+
```bash
233219
curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \
234220
-d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
235221
```
236222

237-
- vllm
223+
- vllm or vllm-p2p
238224

239-
```console
225+
```bash
240226
curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \
241227
-d '{"model":"meta-llama/Llama-3.1-8B-Instruct","prompt":"hi","max_tokens":10,"temperature":0}' | jq
242228
```
229+
#### Environment Configurateion
230+
231+
##### **1. Setting the EPP image and tag:**
232+
233+
You can optionally set a custom EPP image (otherwise, the default will be used):
234+
235+
```bash
236+
export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
237+
export EPP_TAG="<YOUR_TAG>"
238+
```
239+
##### **2. Setting the vLLM image and tag:**
240+
241+
Each vLLM mode has default image values, but you can override them:
242+
243+
For `vllm-sim` mode:**
244+
245+
```bash
246+
export VLLM_SIM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
247+
export VLLM_SIM_TAG="<YOUR_TAG>"
248+
```
249+
250+
For `vllm` and `vllm-p2p` modes:**
251+
252+
```bash
253+
export VLLM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
254+
export VLLM_TAG="<YOUR_TAG>"
255+
```
256+
257+
##### **3. Setting the model name and label:**
258+
259+
You can replace the model name that will be used in the system.
260+
261+
```bash
262+
export MODEL_NAME="${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"
263+
export MODEL_LABEL="${MODEL_LABEL:-mistral7b}"
264+
```
265+
266+
It is also recommended to update the pool name accordingly:
267+
268+
```bash
269+
export POOL_NAME="${POOL_NAME:-vllm-Mistral-7B-Instruct}"
270+
```
271+
272+
##### **4. Additional environment settings:**
273+
274+
More Setting of environment variables can be found in the `scripts/kubernetes-dev-env.sh`.
275+
276+
243277

244-
- vllm-p2p
245-
```console
246-
curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \
247-
-d '{"model":"mistralai/Mistral-7B-Instruct-v0.2","prompt":"hi","max_tokens":10,"temperature":0}' | jq
248-
```
249278
#### Development Cycle
250279

251280
> **WARNING**: This is a very manual process at the moment. We expect to make

deploy/components/inference-gateway/deployments.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ spec:
2222
imagePullPolicy: IfNotPresent
2323
args:
2424
- -poolName
25-
- "vllm-llama3-8b-instruct"
25+
- "${POOL_NAME}"
2626
- -v
2727
- "4"
2828
- --zap-encoder
@@ -55,4 +55,4 @@ spec:
5555
valueFrom:
5656
secretKeyRef:
5757
name: ${HF_SECRET_NAME}
58-
key: ${HF_SECRET_KEY}
58+
key: ${HF_SECRET_KEY}

deploy/components/inference-gateway/httproutes.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ spec:
1313
backendRefs:
1414
- group: inference.networking.x-k8s.io
1515
kind: InferencePool
16-
name: vllm-llama3-8b-instruct
16+
name: ${POOL_NAME}
1717
port: 8000
1818
timeouts:
1919
request: 30s

deploy/components/inference-gateway/inference-models.yaml

Lines changed: 1 addition & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -16,27 +16,7 @@ kind: InferenceModel
1616
metadata:
1717
name: base-model
1818
spec:
19-
modelName: meta-llama/Llama-3.1-8B-Instruct
19+
modelName: ${MODEL_NAME}
2020
criticality: Critical
2121
poolRef:
2222
name: ${POOL_NAME}
23-
---
24-
apiVersion: inference.networking.x-k8s.io/v1alpha2
25-
kind: InferenceModel
26-
metadata:
27-
name: base-model-cpu
28-
spec:
29-
modelName: Qwen/Qwen2.5-1.5B-Instruct
30-
criticality: Critical
31-
poolRef:
32-
name: ${POOL_NAME}
33-
---
34-
apiVersion: inference.networking.x-k8s.io/v1alpha2
35-
kind: InferenceModel
36-
metadata:
37-
name: mistarli
38-
spec:
39-
modelName: mistralai/Mistral-7B-Instruct-v0.2
40-
criticality: Critical
41-
poolRef:
42-
name: ${POOL_NAME}

deploy/components/inference-gateway/kustomization.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ resources:
2626
- deployments.yaml
2727
- gateways.yaml
2828
- httproutes.yaml
29+
- secret.yaml
30+
2931

3032
images:
3133
- name: quay.io/vllm-d/gateway-api-inference-extension/epp

deploy/components/vllm-p2p/deployments/secret.yaml renamed to deploy/components/inference-gateway/secret.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ apiVersion: v1
22
kind: Secret
33
metadata:
44
name: ${HF_SECRET_NAME}
5-
namespace: ${NAMESPACE}
65
labels:
76
app.kubernetes.io/name: vllm
87
app.kubernetes.io/component: secret

deploy/components/vllm-p2p/kustomization.yaml

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,27 @@
1+
# ------------------------------------------------------------------------------
2+
# vLLM P2P Deployment
3+
#
4+
# This deploys the full vLLM model server, capable of serving real models such
5+
# as Llama 3.1-8B-Instruct via the OpenAI-compatible API. It is intended for
6+
# environments with GPU resources and where full inference capabilities are
7+
# required.
8+
# in additon it add LMcache a LLM serving engine extension using Redis to vLLM image
9+
#
10+
# The deployment can be customized using environment variables to set:
11+
# - The container image and tag (VLLM_IMAGE, VLLM_TAG)
12+
# - The model to load (MODEL_NAME)
13+
#
14+
# This setup is suitable for testing and production with Kubernetes (including
15+
# GPU-enabled nodes or clusters with scheduling for `nvidia.com/gpu`).
16+
# -----------------------------------------------------------------------------
117
apiVersion: kustomize.config.k8s.io/v1beta1
218
kind: Kustomization
319

4-
namespace: ${NAMESPACE}
5-
620
resources:
7-
- deployments/vllm-deployment.yaml
8-
- deployments/redis-deployment.yaml
9-
- service/redis-service.yaml
10-
- deployments/secret.yaml
21+
- vllm-deployment.yaml
22+
- redis-deployment.yaml
23+
- redis-service.yaml
24+
- secret.yaml
1125

1226
images:
1327
- name: vllm/vllm-openai

deploy/components/vllm-p2p/deployments/redis-deployment.yaml renamed to deploy/components/vllm-p2p/redis-deployment.yaml

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: apps/v1
22
kind: Deployment
33
metadata:
4-
name: ${REDIS_SVC_NAME}
4+
name: ${REDIS_DEPLOYMENT_NAME}
55
labels:
66
app.kubernetes.io/name: redis
77
app.kubernetes.io/component: redis-lookup-server
@@ -48,8 +48,3 @@ spec:
4848
maxSurge: 25%
4949
revisionHistoryLimit: 10
5050
progressDeadlineSeconds: 600
51-
# securityContext:
52-
# allowPrivilegeEscalation: false
53-
# capabilities:
54-
# drop:
55-
# - ALL
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: v1
2+
kind: Secret
3+
metadata:
4+
name: ${HF_SECRET_NAME}
5+
labels:
6+
app.kubernetes.io/name: vllm
7+
app.kubernetes.io/component: secret
8+
type: Opaque
9+
data:
10+
${HF_SECRET_KEY}: ${HF_TOKEN}

0 commit comments

Comments
 (0)