fix: add a note for users to use VSWA feature after 0.5.1 release (#3380)

richardhuo-nv · web-flow · commit 92b8fe076d01 · 2025-10-02T15:21:22.000-07:00
Signed-off-by: richardhuo-nv &lt;rihuo@nvidia.com&gt;
diff --git a/components/backends/trtllm/gemma3_sliding_window_attention.md b/components/backends/trtllm/gemma3_sliding_window_attention.md
@@ -23,6 +23,23 @@ VSWA is a mechanism in which a model’s layers alternate between multiple slidi
 > [!Note]
 > - Ensure that required services such as `nats` and `etcd` are running before starting.
 > - Request access to `google/gemma-3-1b-it` on Hugging Face and set your `HF_TOKEN` environment variable for authentication.
+> - It’s recommended to continue using the VSWA feature with the Dynamo 0.5.0 release and the TensorRT-LLM dynamo runtime image nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.5.0. The 0.5.1 release bundles TensorRT-LLM v1.1.0rc5, which has a regression that breaks VSWA.
+>
+>   To try the latest TensorRT-LLM v1.2.0rc0 with VSWA, apply this patch to main or the latest release branch.
+>   ```bash
+>   # go to the dynamo repo
+>   cd dynamo
+>
+>   # apply the patch from the "vswa-patch-0.5.1" branch
+>   git fetch
+>   git cherry-pick -n 27dbaa19b2f4574bbfb55122661d58437d01de8e
+>
+>   # build the container with tensorrt-llm==1.2.0rc0
+>   ./container/build.sh --framework trtllm --tensorrtllm-pip-wheel tensorrt-llm==1.2.0rc0
+>
+>   # run the container after build
+>   ./container/run.sh --framework trtllm -it
+>   ```
 
 ### Aggregated Serving
 ```bash