@@ -152,6 +152,13 @@ Create the namespace:
152
152
``` console
153
153
kubectl create namespace ${NAMESPACE}
154
154
```
155
+ Set the default namespace for kubectl commands
156
+
157
+ ``` console
158
+ kubectl config set-context --current --namespace="${NAMESPACE}"
159
+ ```
160
+
161
+ > NOTE: If you are using OpenShift (oc CLI), use the following instead: ` oc project "${NAMESPACE}" `
155
162
156
163
You'll need to provide a ` Secret ` with the login credentials for your private
157
164
repository (e.g. quay.io). It should look something like this:
@@ -178,13 +185,6 @@ Export the name of the `Secret` to the environment:
178
185
export REGISTRY_SECRET=anna-pull-secret
179
186
```
180
187
181
- You can optionally set a custom EPP image (otherwise, the default will be used):
182
-
183
- ``` console
184
- export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
185
- export EPP_TAG="<YOUR_TAG>"
186
- ```
187
-
188
188
Set the ` VLLM_MODE ` environment variable based on which version of vLLM you want to deploy:
189
189
190
190
- ` vllm-sim ` : Lightweight simulator for simple environments
@@ -194,24 +194,10 @@ Set the `VLLM_MODE` environment variable based on which version of vLLM you want
194
194
``` console
195
195
export VLLM_MODE=vllm-sim # or vllm / vllm-p2p
196
196
```
197
- Each mode has default image values, but you can override them:
198
197
199
- For vllm-sim:
200
-
201
- ``` console
202
- export VLLM_SIM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
203
- export VLLM_SIM_TAG="<YOUR_TAG>"
204
- ```
205
-
206
- For vllm and vllm-p2p:
207
- - set Vllm image:
208
- ``` console
209
- export VLLM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
210
- export VLLM_TAG="<YOUR_TAG>"
211
- ```
212
198
- Set hugging face token variable:
213
199
export HF_TOKEN="<HF_TOKEN>"
214
- ** Warning** : For vllm mode, the default image uses llama3-8b and vllm-mistral . Make sure you have permission to access these files in their respective repositories.
200
+ ** Warning** : For vllm mode, the default image uses llama3-8b. Make sure you have permission to access these files in their respective repositories.
215
201
216
202
Once all this is set up, you can deploy the environment:
217
203
@@ -222,30 +208,73 @@ make environment.dev.kubernetes
222
208
This will deploy the entire stack to whatever namespace you chose. You can test
223
209
by exposing the inference ` Gateway ` via port-forward:
224
210
225
- ``` console
211
+ ``` bash
226
212
kubectl -n ${NAMESPACE} port-forward service/inference-gateway 8080:80
227
213
```
228
214
229
215
And making requests with ` curl ` :
230
216
- vllm-sim
231
217
232
- ``` console
218
+ ``` bash
233
219
curl -s -w ' \n' http://localhost:8080/v1/completions -H ' Content-Type: application/json' \
234
220
-d ' {"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
235
221
```
236
222
237
- - vllm
223
+ - vllm or vllm-p2p
238
224
239
- ```console
225
+ ` ` ` bash
240
226
curl -s -w ' \n' http://localhost:8080/v1/completions -H ' Content-Type: application/json' \
241
227
-d ' {"model":"meta-llama/Llama-3.1-8B-Instruct","prompt":"hi","max_tokens":10,"temperature":0}' | jq
242
228
` ` `
229
+ # ### Environment Configurateion
230
+
231
+ # #### **1. Setting the EPP image and tag:**
232
+
233
+ You can optionally set a custom EPP image (otherwise, the default will be used):
234
+
235
+ ` ` ` bash
236
+ export EPP_IMAGE=" <YOUR_REGISTRY>/<YOUR_IMAGE>"
237
+ export EPP_TAG=" <YOUR_TAG>"
238
+ ` ` `
239
+ # #### **2. Setting the vLLM image and tag:**
240
+
241
+ Each vLLM mode has default image values, but you can override them:
242
+
243
+ For ` vllm-sim` mode:**
244
+
245
+ ` ` ` bash
246
+ export VLLM_SIM_IMAGE=" <YOUR_REGISTRY>/<YOUR_IMAGE>"
247
+ export VLLM_SIM_TAG=" <YOUR_TAG>"
248
+ ` ` `
249
+
250
+ For ` vllm` and ` vllm-p2p` modes:**
251
+
252
+ ` ` ` bash
253
+ export VLLM_IMAGE=" <YOUR_REGISTRY>/<YOUR_IMAGE>"
254
+ export VLLM_TAG=" <YOUR_TAG>"
255
+ ` ` `
256
+
257
+ # #### **3. Setting the model name and label:**
258
+
259
+ You can replace the model name that will be used in the system.
260
+
261
+ ` ` ` bash
262
+ export MODEL_NAME=" ${MODEL_NAME:- mistralai/ Mistral-7B-Instruct-v0.2} "
263
+ export MODEL_LABEL=" ${MODEL_LABEL:- mistral7b} "
264
+ ` ` `
265
+
266
+ It is also recommended to update the pool name accordingly:
267
+
268
+ ` ` ` bash
269
+ export POOL_NAME=" ${POOL_NAME:- vllm-Mistral-7B-Instruct} "
270
+ ` ` `
271
+
272
+ # #### **4. Additional environment settings:**
273
+
274
+ More Setting of environment variables can be found in the ` scripts/kubernetes-dev-env.sh` .
275
+
276
+
243
277
244
- - vllm-p2p
245
- ``` console
246
- curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \
247
- -d '{"model":"mistralai/Mistral-7B-Instruct-v0.2","prompt":"hi","max_tokens":10,"temperature":0}' | jq
248
- ```
249
278
# ### Development Cycle
250
279
251
280
> ** WARNING** : This is a very manual process at the moment. We expect to make
0 commit comments