Skip to content

Commit 489e974

Browse files
committed
fix: consistent model recipes and update simplified doc (#3858)
1 parent e02605b commit 489e974

File tree

16 files changed

+399
-154
lines changed

16 files changed

+399
-154
lines changed

benchmarks/profiler/deploy/profile_sla_moe_job.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ spec:
3030
command: ["python", "-m", "benchmarks.profiler.profile_sla"]
3131
args:
3232
- --config
33-
- /sgl-workspace/dynamo/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
33+
- /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
3434
- --output-dir
3535
- /data/profiling_results
3636
- --namespace

recipes/CONTRIBUTING.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Recipes Contributing Guide
2+
3+
When adding new model recipes, ensure they follow the standard structure:
4+
```text
5+
<model-name>/
6+
├── model-cache/
7+
│ ├── model-cache.yaml
8+
│ └── model-download.yaml
9+
├── <framework>/
10+
│ └── <deployment-mode>/
11+
│ ├── deploy.yaml
12+
│ └── perf.yaml (optional)
13+
└── README.md (optional)
14+
```
15+
16+
## Validation
17+
The `run.sh` script expects this exact directory structure and will validate that the directories and files exist before deployment:
18+
- Model directory exists in `recipes/<model>/`
19+
- Framework is one of the supported frameworks (vllm, sglang, trtllm)
20+
- Framework directory exists in `recipes/<model>/<framework>/`
21+
- Deployment directory exists in `recipes/<model>/<framework>/<deployment>/`
22+
- Required files (`deploy.yaml`) exist in the deployment directory
23+
- If present, performance benchmarks (`perf.yaml`) will be automatically executed

recipes/README.md

Lines changed: 240 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,88 +1,285 @@
1-
# Dynamo model serving recipes
1+
# Dynamo Model Serving Recipes
22

3-
| Model family | Backend | Mode | GPU | Deployment | Benchmark |
4-
|---------------|---------|---------------------|-------|------------|-----------|
5-
| llama-3-70b | vllm | agg | H100, H200 |||
6-
| llama-3-70b | vllm | disagg-multi-node | H100, H200 |||
7-
| llama-3-70b | vllm | disagg-single-node | H100, H200 |||
8-
| DeepSeek-R1 | sglang | disaggregated | H200 || 🚧 |
9-
| oss-gpt | trtllm | aggregated | GB200 |||
3+
This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup.
4+
5+
## Contents
6+
- [Available Models](#available-models)
7+
- [Quick Start](#quick-start)
8+
- [Prerequisites](#prerequisites)
9+
- Deployment Methods
10+
- [Option 1: Automated Deployment](#option-1-automated-deployment)
11+
- [Option 2: Manual Deployment](#option-2-manual-deployment)
12+
13+
14+
## Available Models
15+
16+
| Model Family | Framework | Deployment Mode | GPU Requirements | Status | Benchmark |
17+
|-----------------|-----------|---------------------|------------------|--------|-----------|
18+
| llama-3-70b | vllm | agg | 4x H100/H200 |||
19+
| llama-3-70b | vllm | disagg (1 node) | 8x H100/H200 |||
20+
| llama-3-70b | vllm | disagg (multi-node) | 16x H100/H200 |||
21+
| deepseek-r1 | sglang | disagg (1 node, wide-ep) | 8x H200 || 🚧 |
22+
| deepseek-r1 | sglang | disagg (multi-node, wide-ep) | 16x H200 || 🚧 |
23+
| gpt-oss-120b | trtllm | agg | 4x GB200 |||
24+
25+
**Legend:**
26+
- ✅ Functional
27+
- 🚧 Under development
28+
29+
30+
**Recipe Directory Structure:**
31+
Recipes are organized into a directory structure that follows the pattern:
32+
```text
33+
<model-name>/
34+
├── model-cache/
35+
│ ├── model-cache.yaml # PVC for model cache
36+
│ └── model-download.yaml # Job for model download
37+
├── <framework>/
38+
│ └── <deployment-mode>/
39+
│ ├── deploy.yaml # DynamoGraphDeployment CRD and optional configmap for custom configuration
40+
│ └── perf.yaml (optional) # Performance benchmark
41+
└── README.md (optional) # Model documentation
42+
```
43+
44+
## Quick Start
45+
46+
Follow the instructions in the [Prerequisites](#prerequisites) section to set up your environment.
47+
48+
Choose your preferred deployment method: using the `run.sh` script or manual deployment steps.
1049

1150

1251
## Prerequisites
1352

14-
1. Create a namespace and populate NAMESPACE environment variable
15-
This environment variable is used in later steps to deploy and perf-test the model.
53+
### 1. Environment Setup
54+
55+
Create a Kubernetes namespace and set environment variable:
1656

1757
```bash
1858
export NAMESPACE=your-namespace
1959
kubectl create namespace ${NAMESPACE}
2060
```
2161

22-
2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md)
62+
### 2. Deploy Dynamo Platform
63+
64+
Install the Dynamo Cloud Platform following the [Quickstart Guide](../docs/kubernetes/README.md).
65+
66+
### 3. GPU Cluster
67+
68+
Ensure your Kubernetes cluster has:
69+
- GPU nodes with appropriate GPU types (see model requirements above)
70+
- GPU operator installed
71+
- Sufficient GPU memory and compute resources
72+
73+
### 4. Container Registry Access
2374

24-
3. **Kubernetes cluster with GPU support**
75+
Ensure access to NVIDIA container registry for runtime images:
76+
- `nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z`
77+
- `nvcr.io/nvidia/ai-dynamo/trtllm-runtime:x.y.z`
78+
- `nvcr.io/nvidia/ai-dynamo/sglang-runtime:x.y.z`
2579

26-
4. **Container registry access** for vLLM runtime images
80+
### 5. HuggingFace Access and Kubernetes Secret Creation
2781

28-
5. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
29-
Update the `hf-token-secret.yaml` file with your HuggingFace token.
82+
Set up a kubernetes secret with the HuggingFace token for model download:
3083

3184
```bash
85+
# Update the token in the secret file
86+
vim hf_hub_secret/hf_hub_secret.yaml
87+
88+
# Apply the secret
3289
kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
3390
```
3491

35-
6. (Optional) Create a shared model cache pvc to store the model weights.
36-
Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file.
92+
### 6. Configure Storage Class
93+
94+
Configure persistent storage for model caching:
3795

3896
```bash
97+
# Check available storage classes
3998
kubectl get storageclass
4099
```
41100

42-
## Running the recipes
101+
Replace "your-storage-class-name" with your actual storage class in the file: `<model>/model-cache/model-cache.yaml`
102+
103+
```yaml
104+
# In <model>/model-cache/model-cache.yaml
105+
spec:
106+
storageClassName: "your-actual-storage-class" # Replace this
107+
```
108+
109+
## Option 1: Automated Deployment
43110
44-
Run the recipe to deploy a model:
111+
Use the `run.sh` script for fully automated deployment:
112+
113+
**Note:** The script automatically:
114+
- Create model cache PVC and downloads the model
115+
- Deploy the model service
116+
- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory
117+
118+
119+
#### Script Usage
45120

46121
```bash
47-
./run.sh --model <model> --framework <framework> <deployment-type>
122+
./run.sh [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>
48123
```
49124

50-
Arguments:
51-
<deployment-type> Deployment type (e.g., agg, disagg-single-node, disagg-multi-node)
125+
**Required Options:**
126+
- `--model <model>`: Model name matching the directory name in the recipes directory (e.g., llama-3-70b, gpt-oss-120b, deepseek-r1)
127+
- `--framework <framework>`: Backend framework (`vllm`, `trtllm`, `sglang`)
128+
- `--deployment <deployment-type>`: Deployment mode (e.g., agg, disagg, disagg-single-node, disagg-multi-node)
129+
130+
**Optional Options:**
131+
- `--namespace <namespace>`: Kubernetes namespace (default: dynamo)
132+
- `--dry-run`: Show commands without executing them
133+
- `-h, --help`: Show help message
134+
135+
**Environment Variables:**
136+
- `NAMESPACE`: Kubernetes namespace (default: dynamo)
137+
138+
#### Example Usage
139+
```bash
140+
# Set up environment
141+
export NAMESPACE=your-namespace
142+
kubectl create namespace ${NAMESPACE}
143+
# Configure HuggingFace token
144+
kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
145+
146+
# use run.sh script to deploy the model
147+
# Deploy Llama-3-70B with vLLM (aggregated mode)
148+
./run.sh --model llama-3-70b --framework vllm --deployment agg
149+
150+
# Deploy GPT-OSS-120B with TensorRT-LLM
151+
./run.sh --model gpt-oss-120b --framework trtllm --deployment agg
152+
153+
# Deploy DeepSeek-R1 with SGLang (disaggregated mode)
154+
./run.sh --model deepseek-r1 --framework sglang --deployment disagg
155+
156+
# Deploy with custom namespace
157+
./run.sh --namespace my-namespace --model llama-3-70b --framework vllm --deployment agg
158+
159+
# Dry run to see what would be executed
160+
./run.sh --dry-run --model llama-3-70b --framework vllm --deployment agg
161+
```
52162

53-
Required Options:
54-
--model <model> Model name (e.g., llama-3-70b)
55-
--framework <fw> Framework one of VLLM TRTLLM SGLANG (default: VLLM)
56163

57-
Optional:
58-
--skip-model-cache Skip model downloading (assumes model cache already exists)
59-
-h, --help Show this help message
164+
## Option 2: Manual Deployment
60165

61-
Environment Variables:
62-
NAMESPACE Kubernetes namespace (default: dynamo)
166+
For step-by-step manual deployment follow these steps :
63167

64-
Examples:
65-
./run.sh --model llama-3-70b --framework vllm agg
66-
./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg
67-
./run.sh --model llama-3-70b --framework trtllm disagg-single-node
68-
Example:
69168
```bash
70-
./run.sh --model llama-3-70b --framework vllm --deployment-type agg
169+
# 0. Set up environment (see Prerequisites section)
170+
export NAMESPACE=your-namespace
171+
kubectl create namespace ${NAMESPACE}
172+
kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
173+
174+
# 1. Download model (see Model Download section)
175+
kubectl apply -n $NAMESPACE -f <model>/model-cache/
176+
177+
# 2. Deploy model (see Deployment section)
178+
kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/deploy.yaml
179+
180+
# 3. Run benchmarks (optional, if perf.yaml exists)
181+
kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/perf.yaml
182+
```
183+
184+
### Step 1: Download Model
185+
186+
```bash
187+
# Start the download job
188+
kubectl apply -n $NAMESPACE -f <model>/model-cache
189+
190+
# Verify job creation
191+
kubectl get jobs -n $NAMESPACE | grep model-download
192+
```
193+
194+
Monitor and wait for the model download to complete:
195+
196+
```bash
197+
198+
# Wait for job completion (timeout after 100 minutes)
199+
kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s
200+
201+
# Check job status
202+
kubectl get job model-download -n $NAMESPACE
203+
204+
# View download logs
205+
kubectl logs job/model-download -n $NAMESPACE
206+
```
207+
208+
### Step 2: Deploy Model Service
209+
210+
```bash
211+
# Navigate to the specific deployment configuration
212+
cd <model>/<framework>/<deployment-mode>/
213+
214+
# Deploy the model service
215+
kubectl apply -n $NAMESPACE -f deploy.yaml
216+
217+
# Verify deployment creation
218+
kubectl get deployments -n $NAMESPACE
71219
```
72220

221+
#### Wait for Deployment Ready
73222

74-
## Dry run mode
223+
```bash
224+
# Get deployment name from the deploy.yaml file
225+
DEPLOYMENT_NAME=$(grep "name:" deploy.yaml | head -1 | awk '{print $2}')
226+
227+
# Wait for deployment to be ready (timeout after 10 minutes)
228+
kubectl wait --for=condition=available deployment/$DEPLOYMENT_NAME -n $NAMESPACE --timeout=1200s
229+
230+
# Check deployment status
231+
kubectl get deployment $DEPLOYMENT_NAME -n $NAMESPACE
232+
233+
# Check pod status
234+
kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT_NAME
235+
```
236+
237+
#### Verify Model Service
238+
239+
```bash
240+
# Check if service is running
241+
kubectl get services -n $NAMESPACE
242+
243+
# Test model endpoint (port-forward to test locally)
244+
kubectl port-forward service/${DEPLOYMENT_NAME}-frontend 8000:8000 -n $NAMESPACE
245+
246+
# Test the model API (in another terminal)
247+
curl http://localhost:8000/v1/models
75248
76-
To dry run the recipe, add the `--dry-run` flag.
249+
# Stop port-forward when done
250+
pkill -f "kubectl port-forward"
251+
```
252+
253+
### Step 3: Performance Benchmarking (Optional)
254+
255+
Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional):
256+
257+
#### Launch Benchmark Job
77258

78259
```bash
79-
./run.sh --dry-run --model llama-3-70b --framework vllm agg
260+
# From the deployment directory
261+
kubectl apply -n $NAMESPACE -f perf.yaml
262+
263+
# Verify benchmark job creation
264+
kubectl get jobs -n $NAMESPACE
80265
```
81266

82-
## (Optional) Running the recipes with model cache
83-
You may need to cache the model weights on a PVC to avoid repeated downloads of the model weights.
84-
See the [Prerequisites](#prerequisites) section for more details.
267+
#### Monitor Benchmark Progress
85268

86269
```bash
87-
./run.sh --model llama-3-70b --framework vllm --deployment-type agg --skip-model-cache
270+
# Get benchmark job name
271+
PERF_JOB_NAME=$(grep "name:" perf.yaml | head -1 | awk '{print $2}')
272+
273+
# Monitor benchmark logs in real-time
274+
kubectl logs -f job/$PERF_JOB_NAME -n $NAMESPACE
275+
276+
# Wait for benchmark completion (timeout after 100 minutes)
277+
kubectl wait --for=condition=Complete job/$PERF_JOB_NAME -n $NAMESPACE --timeout=6000s
88278
```
279+
280+
#### View Benchmark Results
281+
282+
```bash
283+
# Check final benchmark results
284+
kubectl logs job/$PERF_JOB_NAME -n $NAMESPACE | tail -50
285+
```

0 commit comments

Comments
 (0)