hao-ai-lab
diff --git a/‎container/Dockerfile.tensorrt_llm
+6 b/‎container/Dockerfile.tensorrt_llm
+6
diff --git a/‎docs/guides/dynamo_run.md
+1-1 b/‎docs/guides/dynamo_run.md
+1-1
diff --git a/‎examples/tensorrt_llm/README.md
+42-9 b/‎examples/tensorrt_llm/README.md
+42-9
@@ -201,6 +201,12 @@ RUN pip install dist/ai_dynamo_runtime*cp312*.whl  && \
 ENV DYNAMO_KV_CAPI_PATH="/opt/dynamo/bindings/lib/libdynamo_llm_capi.so"
 ENV DYNAMO_HOME=/workspace
 
+
+# Copy launch banner
+RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/launch_message.txt \
+    sed '/^#\s/d' /workspace/launch_message.txt > ~/.launch_screen && \
+    echo "cat ~/.launch_screen" >> ~/.bashrc
+
 # FIXME: Copy more specific folders in for dev/debug after directory restructure
 COPY . /workspace
 
 
@@ -342,7 +342,7 @@ See instructions [here](/examples/tensorrt_llm/README.md#run-container) to run t
 
 Execute the following to load the TensorRT-LLM model specified in the configuration.
 ```
-dynamo run out=pystr:/workspace/examples/tensorrt_llm/engines/agg_engine.py  -- --engine_args /workspace/examples/tensorrt_llm/configs/llm_api_config.yaml
+dynamo run out=pystr:/workspace/examples/tensorrt_llm/engines/trtllm_engine.py  -- --engine_args /workspace/examples/tensorrt_llm/configs/llm_api_config.yaml
 ```
 
 #### Dynamo does the pre-processing
 
@@ -25,6 +25,14 @@ This directory contains examples and reference implementations for deploying Lar
 See [deployment architectures](../llm/README.md#deployment-architectures) to learn about the general idea of the architecture.
 Note that this TensorRT-LLM version does not support all the options yet.
 
+Note: TensorRT-LLM disaggregation does not support conditional disaggregation yet. You can only configure the deployment to always use aggregate or disaggregated serving.
+
+## Getting Started
+
+1. Choose a deployment architecture based on your requirements
+2. Configure the components as needed
+3. Deploy using the provided scripts
+
 ### Prerequisites
 
 Start required services (etcd and NATS) using [Docker Compose](../../deploy/docker-compose.yml)
@@ -68,6 +76,29 @@ This build script internally points to the base container image built with step
 ```
 ## Run Deployment
 
+This figure shows an overview of the major components to deploy:
+
+
+
+```
+
++------+      +-----------+      +------------------+             +---------------+
+| HTTP |----->| processor |----->|      Worker      |------------>|     Prefill   |
+|      |<-----|           |<-----|                  |<------------|     Worker    |
++------+      +-----------+      +------------------+             +---------------+
+                  |    ^                  |
+       query best |    | return           | publish kv events
+           worker |    | worker_id        v
+                  |    |         +------------------+
+                  |    +---------|     kv-router    |
+                  +------------->|                  |
+                                 +------------------+
+
+```
+
+Note: The above architecture illustrates all the components. The final components
+that get spawned depend upon the chosen graph.
+
 ### Example architectures
 
 #### Aggregated serving
@@ -82,21 +113,23 @@ cd /workspace/examples/tensorrt_llm
 dynamo serve graphs.agg_router:Frontend -f ./configs/agg_router.yaml
 ```
 
-<!--
-This is work in progress and will be enabled soon.
-
 #### Disaggregated serving
 ```bash
-cd /workspace/examples/llm
-dynamo serve graphs.disagg:Frontend -f ./configs/disagg.yaml
+cd /workspace/examples/tensorrt_llm
+TRTLLM_USE_UCX_KVCACHE=1 dynamo serve graphs.disagg:Frontend -f ./configs/disagg.yaml
 ```
 
+We are defining TRTLLM_USE_UCX_KVCACHE so that TRTLLM uses UCX for transfering the KV
+cache between the context and generation workers.
+
 #### Disaggregated serving with KV Routing
 ```bash
-cd /workspace/examples/llm
-dynamo serve graphs.disagg_router:Frontend -f ./configs/disagg_router.yaml
+cd /workspace/examples/tensorrt_llm
+TRTLLM_USE_UCX_KVCACHE=1 dynamo serve graphs.disagg_router:Frontend -f ./configs/disagg_router.yaml
 ```
--->
+
+We are defining TRTLLM_USE_UCX_KVCACHE so that TRTLLM uses UCX for transfering the KV
+cache between the context and generation workers.
 
 ### Client
 
@@ -108,7 +141,7 @@ See [close deployment](../../docs/guides/dynamo_serve.md#close-deployment) secti
 
 Remaining tasks:
 
-- [ ] Add support for the disaggregated serving.
+- [x] Add support for the disaggregated serving.
 - [ ] Add integration test coverage.
 - [ ] Add instructions for benchmarking.
 - [ ] Add multi-node support.