Skip to content

Commit 6af705e

Browse files
fix: better instructions for GAIE recipe (#4525) (#4554)
Signed-off-by: Anna Tchernych <[email protected]> Signed-off-by: Dan Gil <[email protected]> Co-authored-by: atchernych <[email protected]>
1 parent 985ca16 commit 6af705e

File tree

2 files changed

+16
-6
lines changed

2 files changed

+16
-6
lines changed

recipes/README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -90,9 +90,13 @@ kubectl get storageclass
9090
**Step 1: Download Model**
9191

9292
```bash
93+
cd recipes
9394
# Update storageClassName in model-cache.yaml first!
9495
kubectl apply -f <model>/model-cache/ -n ${NAMESPACE}
9596

97+
# Create model cache PVC
98+
kubectl apply -f <model>/model-cache/model-download.yaml -n ${NAMESPACE}
99+
96100
# Wait for download to complete (may take 10-60 minutes depending on model size)
97101
kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=6000s
98102

@@ -102,6 +106,8 @@ kubectl logs -f job/model-download -n ${NAMESPACE}
102106

103107
**Step 2: Deploy Service**
104108

109+
Update the image in `<model>/<framework>/<mode>/deploy.yaml`.
110+
105111
```bash
106112
kubectl apply -f <model>/<framework>/<mode>/deploy.yaml -n ${NAMESPACE}
107113

@@ -162,7 +168,9 @@ kubectl create secret generic hf-token-secret \
162168
-n ${NAMESPACE}
163169

164170
# Deploy
171+
cd recipes
165172
kubectl apply -f llama-3-70b/model-cache/ -n ${NAMESPACE}
173+
kubectl apply -f llama-3-70b/model-cache/model-download.yaml -n ${NAMESPACE}
166174
kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=6000s
167175
kubectl apply -f llama-3-70b/vllm/agg/deploy.yaml -n ${NAMESPACE}
168176

@@ -174,13 +182,15 @@ kubectl port-forward svc/llama3-70b-agg-frontend 8000:8000 -n ${NAMESPACE}
174182

175183
For Llama-3-70B with vLLM (Aggregated), an example of integration with the Inference Gateway is provided.
176184

177-
Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE. Then apply manifests.
178-
Update the containers.epp.image in the deployment file, i.e. llama-3-70b/vllm/agg/gaie/k8s-manifests/epp/deployment.yaml
179-
This should be the same image you have used for your deployment.
185+
First, deploy the Dynamo Graph per instructions above.
186+
187+
Then follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.
188+
189+
Update the containers.epp.image in the deployment file, i.e. llama-3-70b/vllm/agg/gaie/k8s-manifests/epp/deployment.yaml. It should match the release tag and be in the format `nvcr.io/nvidia/ai-dynamo/frontend:<my-tag>` i.e. `nvcr.io/nvstaging/ai-dynamo/dynamo-frontend:0.7.0rc2-amd64`
180190

181191
```bash
182192
export DEPLOY_PATH=llama-3-70b/vllm/agg/
183-
#DEPLOY_PATH=<model>/<framework>/<mode>/
193+
# DEPLOY_PATH=<model>/<framework>/<mode>/
184194
kubectl apply -R -f "$DEPLOY_PATH/gaie/k8s-manifests" -n "$NAMESPACE"
185195
```
186196

recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/epp/deployment.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ spec:
3838

3939
containers:
4040
- name: epp
41-
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
41+
image: nvcr.io/nvidia/ai-dynamo/frontend:<my-tag>
4242
imagePullPolicy: IfNotPresent
4343
resources:
4444
requests:
@@ -76,7 +76,7 @@ spec:
7676
- name: DYNAMO_NAMESPACE
7777
value: "$(POD_NAMESPACE)-llama3-70b-agg"
7878
- name: DYNAMO_MODEL
79-
value: "llama3-70b-agg"
79+
value: "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic"
8080
- name: DYNAMO_KV_BLOCK_SIZE
8181
value: "128" # UPDATE to match the --block-size in your deploy.yaml engine command
8282
- name: USE_STREAMING

0 commit comments

Comments
 (0)