@@ -37,7 +37,7 @@ serving resources.
37
37
38
38
Run the following:
39
39
40
- ``` console
40
+ ``` bash
41
41
make environment.dev.kind
42
42
```
43
43
@@ -48,13 +48,15 @@ namespace.
48
48
There are several ways to access the gateway:
49
49
50
50
** Port forward** :
51
+
51
52
``` sh
52
53
$ kubectl --context kind-gie-dev port-forward service/inference-gateway 8080:80
53
54
```
54
55
55
56
** NodePort ` inference-gateway-istio ` **
56
57
> ** Warning** : This method doesn't work on ` podman ` correctly, as ` podman ` support
57
58
> with ` kind ` is not fully implemented yet.
59
+
58
60
``` sh
59
61
# Determine the k8s node address
60
62
$ kubectl --context kind-gie-dev get node -o yaml | grep address
@@ -80,9 +82,10 @@ By default the created inference gateway, can be accessed on port 30080. This ca
80
82
be overriden to any free port in the range of 30000 to 32767, by running the above
81
83
command as follows:
82
84
83
- ``` console
85
+ ``` bash
84
86
GATEWAY_HOST_PORT=< selected-port> make environment.dev.kind
85
87
```
88
+
86
89
** Where:** < ; selected-port> ; is the port on your local machine you want to use to
87
90
access the inference gatyeway.
88
91
@@ -96,7 +99,7 @@ access the inference gatyeway.
96
99
To test your changes to the GIE in this environment, make your changes locally
97
100
and then run the following:
98
101
99
- ``` console
102
+ ``` bash
100
103
make environment.dev.kind.update
101
104
```
102
105
@@ -122,7 +125,7 @@ the `default` namespace if the cluster is private/personal).
122
125
The following will deploy all the infrastructure-level requirements (e.g. CRDs,
123
126
Operators, etc) to support the namespace-level development environments:
124
127
125
- ``` console
128
+ ``` bash
126
129
make environment.dev.kubernetes.infrastructure
127
130
```
128
131
@@ -140,7 +143,7 @@ To deploy a development environment to the cluster you'll need to explicitly
140
143
provide a namespace. This can be ` default ` if this is your personal cluster,
141
144
but on a shared cluster you should pick something unique. For example:
142
145
143
- ``` console
146
+ ``` bash
144
147
export NAMESPACE=annas-dev-environment
145
148
```
146
149
@@ -149,10 +152,18 @@ export NAMESPACE=annas-dev-environment
149
152
150
153
Create the namespace:
151
154
152
- ``` console
155
+ ``` bash
153
156
kubectl create namespace ${NAMESPACE}
154
157
```
155
158
159
+ Set the default namespace for kubectl commands
160
+
161
+ ``` bash
162
+ kubectl config set-context --current --namespace=" ${NAMESPACE} "
163
+ ```
164
+
165
+ > NOTE: If you are using OpenShift (oc CLI), use the following instead: ` oc project "${NAMESPACE}" `
166
+
156
167
You'll need to provide a ` Secret ` with the login credentials for your private
157
168
repository (e.g. quay.io). It should look something like this:
158
169
@@ -168,51 +179,115 @@ type: kubernetes.io/dockerconfigjson
168
179
169
180
Apply that to your namespace:
170
181
171
- ` ` ` console
172
- kubectl -n ${NAMESPACE} apply -f secret.yaml
182
+ ` ` ` bash
183
+ kubectl apply -f secret.yaml
173
184
```
174
185
175
186
Export the name of the ` Secret ` to the environment:
176
187
177
- ``` console
188
+ ``` bash
178
189
export REGISTRY_SECRET=anna-pull-secret
179
190
```
180
191
181
- Now you need to provide several other environment variables. You'll need to
182
- indicate the location and tag of the ` vllm-sim ` image:
192
+ Set the ` VLLM_MODE ` environment variable based on which version of vLLM you want to deploy:
183
193
184
- ``` console
185
- export VLLM_SIM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
186
- export VLLM_SIM_TAG="<YOUR_TAG>"
194
+ * ` vllm-sim ` : Lightweight simulator for simple environments (default).
195
+ * ` vllm ` : Full vLLM model server, using GPU/CPU for inferencing
196
+ * ` vllm-p2p ` : Full vLLM with LMCache P2P support for enable KV-Cache aware routing
197
+
198
+ ``` bash
199
+ export VLLM_MODE=vllm-sim # or vllm / vllm-p2p
187
200
```
188
201
189
- The same thing will need to be done for the EPP :
202
+ - Set Hugging Face token variable :
190
203
191
- ``` console
192
- export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
193
- export EPP_TAG="<YOUR_TAG>"
204
+ ``` bash
205
+ export HF_TOKEN=" <HF_TOKEN>"
194
206
```
195
207
208
+ ** Warning** : For vllm mode, the default image uses llama3-8b. Make sure you have permission to access these files in their respective repositories.
209
+
210
+ ** Note:** The model can be replaced. See [ Environment Configuration] ( #environment-configuration ) for model settings.
211
+
196
212
Once all this is set up, you can deploy the environment:
197
213
198
- ``` console
214
+ ``` bash
199
215
make environment.dev.kubernetes
200
216
```
201
217
202
218
This will deploy the entire stack to whatever namespace you chose. You can test
203
219
by exposing the inference ` Gateway ` via port-forward:
204
220
205
- ``` console
206
- kubectl -n ${NAMESPACE} port-forward service/inference-gateway-istio 8080:80
221
+ ``` bash
222
+ kubectl port-forward service/inference-gateway 8080:80
207
223
```
208
224
209
225
And making requests with ` curl ` :
210
226
211
- ``` console
227
+ ** vllm-sim:**
228
+
229
+ ``` bash
212
230
curl -s -w ' \n' http://localhost:8080/v1/completions -H ' Content-Type: application/json' \
213
231
-d ' {"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
214
232
```
215
233
234
+ ** vllm or vllm-p2p:**
235
+
236
+ ``` bash
237
+ curl -s -w ' \n' http://localhost:8080/v1/completions -H ' Content-Type: application/json' \
238
+ -d ' {"model":"meta-llama/Llama-3.1-8B-Instruct","prompt":"hi","max_tokens":10,"temperature":0}' | jq
239
+ ```
240
+
241
+ #### Environment Configurateion
242
+
243
+ ** 1. Setting the EPP image and tag:**
244
+
245
+ You can optionally set a custom EPP image (otherwise, the default will be used):
246
+
247
+ ``` bash
248
+ export EPP_IMAGE=" <YOUR_REGISTRY>/<YOUR_IMAGE>"
249
+ export EPP_TAG=" <YOUR_TAG>"
250
+ ```
251
+
252
+ ** 2. Setting the vLLM image and tag:**
253
+
254
+ Each vLLM mode has default image values, but you can override them:
255
+
256
+ For ` vllm-sim ` mode:**
257
+
258
+ ``` bash
259
+ export VLLM_SIM_IMAGE=" <YOUR_REGISTRY>/<YOUR_IMAGE>"
260
+ export VLLM_SIM_TAG=" <YOUR_TAG>"
261
+ ```
262
+
263
+ For ` vllm ` and ` vllm-p2p ` modes:**
264
+
265
+ ``` bash
266
+ export VLLM_IMAGE=" <YOUR_REGISTRY>/<YOUR_IMAGE>"
267
+ export VLLM_TAG=" <YOUR_TAG>"
268
+ ```
269
+
270
+ ** 3. Setting the model name and label:**
271
+
272
+ You can replace the model name that will be used in the system.
273
+
274
+ ``` bash
275
+ export MODEL_NAME=" ${MODEL_NAME:- mistralai/ Mistral-7B-Instruct-v0.2} "
276
+ export MODEL_LABEL=" ${MODEL_LABEL:- mistral7b} "
277
+ ```
278
+
279
+ It is also recommended to update the inference pool name accordingly so that it aligns with the models:
280
+
281
+ ``` bash
282
+ export POOL_NAME=" ${POOL_NAME:- vllm-Mistral-7B-Instruct} "
283
+ ```
284
+
285
+ ** 4. Additional environment settings:**
286
+
287
+ More Setting of environment variables can be found in the ` scripts/kubernetes-dev-env.sh ` .
288
+
289
+
290
+
216
291
#### Development Cycle
217
292
218
293
> ** WARNING** : This is a very manual process at the moment. We expect to make
@@ -221,19 +296,19 @@ curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: applicati
221
296
Make your changes locally and commit them. Then select an image tag based on
222
297
the ` git ` SHA:
223
298
224
- ``` console
299
+ ``` bash
225
300
export EPP_TAG=$( git rev-parse HEAD)
226
301
```
227
302
228
303
Build the image:
229
304
230
- ``` console
305
+ ``` bash
231
306
DEV_VERSION=$EPP_TAG make image-build
232
307
```
233
308
234
309
Tag the image for your private registry and push it:
235
310
236
- ``` console
311
+ ``` bash
237
312
$CONTAINER_RUNTIME tag quay.io/vllm-d/gateway-api-inference-extension/epp:$TAG \
238
313
< MY_REGISTRY> /< MY_IMAGE> :$EPP_TAG
239
314
$CONTAINER_RUNTIME push < MY_REGISTRY> /< MY_IMAGE> :$EPP_TAG
@@ -245,7 +320,7 @@ $CONTAINER_RUNTIME push <MY_REGISTRY>/<MY_IMAGE>:$EPP_TAG
245
320
Then you can re-deploy the environment with the new changes (don't forget all
246
321
the required env vars):
247
322
248
- ``` console
323
+ ``` bash
249
324
make environment.dev.kubernetes
250
325
```
251
326
0 commit comments