Skip to content

Commit 92b8fe0

Browse files
fix: add a note for users to use VSWA feature after 0.5.1 release (#3380)
Signed-off-by: richardhuo-nv <[email protected]>
1 parent ad21d3a commit 92b8fe0

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

components/backends/trtllm/gemma3_sliding_window_attention.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,23 @@ VSWA is a mechanism in which a model’s layers alternate between multiple slidi
2323
> [!Note]
2424
> - Ensure that required services such as `nats` and `etcd` are running before starting.
2525
> - Request access to `google/gemma-3-1b-it` on Hugging Face and set your `HF_TOKEN` environment variable for authentication.
26+
> - It’s recommended to continue using the VSWA feature with the Dynamo 0.5.0 release and the TensorRT-LLM dynamo runtime image nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.5.0. The 0.5.1 release bundles TensorRT-LLM v1.1.0rc5, which has a regression that breaks VSWA.
27+
>
28+
> To try the latest TensorRT-LLM v1.2.0rc0 with VSWA, apply this patch to main or the latest release branch.
29+
> ```bash
30+
> # go to the dynamo repo
31+
> cd dynamo
32+
>
33+
> # apply the patch from the "vswa-patch-0.5.1" branch
34+
> git fetch
35+
> git cherry-pick -n 27dbaa19b2f4574bbfb55122661d58437d01de8e
36+
>
37+
> # build the container with tensorrt-llm==1.2.0rc0
38+
> ./container/build.sh --framework trtllm --tensorrtllm-pip-wheel tensorrt-llm==1.2.0rc0
39+
>
40+
> # run the container after build
41+
> ./container/run.sh --framework trtllm -it
42+
> ```
2643
2744
### Aggregated Serving
2845
```bash

0 commit comments

Comments
 (0)