Commit 6d9b911
committed
TRTLLM-7731 KV cache transmission in disagg with CP on gen side
Signed-off-by: Balaram Buddharaju <[email protected]>
add ds-lite tllm-gen based disagg test
Signed-off-by: Matthias Jouanneaux <[email protected]>
initial support for helix parallelism
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixed mapping tests, added working MLA module test, added disagg test for helix (WIP)
Signed-off-by: Matthias Jouanneaux <[email protected]>
Helix MLA module test: added more scenarios, removed unnecessary code
Signed-off-by: Matthias Jouanneaux <[email protected]>
MLA Helix test: restricting number of tests, better output
Signed-off-by: Matthias Jouanneaux <[email protected]>
test MLA helix: remove OOM test scenario
Signed-off-by: Matthias Jouanneaux <[email protected]>
test MLA helix: fix scenario max position embeddings
Signed-off-by: Matthias Jouanneaux <[email protected]>
test Helix MLA: try to fix NaNs
Signed-off-by: Matthias Jouanneaux <[email protected]>
added all-to-all impl
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix thop lib
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix alltoall
Signed-off-by: Matthias Jouanneaux <[email protected]>
attention MLA: remove kv heads (unused), improve heads naming, fix tests
Signed-off-by: Matthias Jouanneaux <[email protected]>
test Helix MLA: minor fixes
Signed-off-by: Matthias Jouanneaux <[email protected]>
test Helix MLA: disable numeric test
Signed-off-by: Matthias Jouanneaux <[email protected]>
test Helix MLA: add TODOs to MLA module
Signed-off-by: Matthias Jouanneaux <[email protected]>
test Helix MLA: fix MLA module
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
fully working MLA test
Signed-off-by: Matthias Jouanneaux <[email protected]>
attempt to make latent cache work
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging numerical issue
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging numerical issue
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging numerical issue
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging numerical issue
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging numerical issue
Signed-off-by: Matthias Jouanneaux <[email protected]>
adding additional test for further numerical debugging
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixing tests & correction
Signed-off-by: Matthias Jouanneaux <[email protected]>
remove debug output from tests
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix tests
Signed-off-by: Matthias Jouanneaux <[email protected]>
further debugging with multiple sequences
Signed-off-by: Matthias Jouanneaux <[email protected]>
further debugging with multiple sequences
Signed-off-by: Matthias Jouanneaux <[email protected]>
further debugging with multiple sequences
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixed multiple sequences tests
Signed-off-by: Matthias Jouanneaux <[email protected]>
automated review comments
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging of latent cache
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging of latent cache
Signed-off-by: Matthias Jouanneaux <[email protected]>
further debugging of pe values
Signed-off-by: Matthias Jouanneaux <[email protected]>
further debugging of latent cache
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixed latent cache, remove flaky test
Signed-off-by: Matthias Jouanneaux <[email protected]>
better reporting
Signed-off-by: Matthias Jouanneaux <[email protected]>
better reporting
Signed-off-by: Matthias Jouanneaux <[email protected]>
finalized test scenarios
Signed-off-by: Matthias Jouanneaux <[email protected]>
better perf measurements, added graph support
Signed-off-by: Matthias Jouanneaux <[email protected]>
added helix post process kernel
Signed-off-by: Matthias Jouanneaux <[email protected]>
added unit test, minor fix for helix kernel
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixing helix kernels
Signed-off-by: Matthias Jouanneaux <[email protected]>
better tests, minor fixes
Signed-off-by: Matthias Jouanneaux <[email protected]>
better tests, minor fixes
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging helix test
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging helix test
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging helix test
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixed helix post process kernel: main kernel had perf issue/flaw
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixed helix post process test
Signed-off-by: Matthias Jouanneaux <[email protected]>
added helix full layer test
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix full layer helix test/bench
Signed-off-by: Matthias Jouanneaux <[email protected]>
added correct mapping to ds helix
Signed-off-by: Matthias Jouanneaux <[email protected]>
further improvements for fp8 init
Signed-off-by: Matthias Jouanneaux <[email protected]>
debugging quantization config
Signed-off-by: Matthias Jouanneaux <[email protected]>
better debug output
Signed-off-by: Matthias Jouanneaux <[email protected]>
fixes for fp8
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix fp8 runs
Signed-off-by: Matthias Jouanneaux <[email protected]>
attempt to fix fp8 context
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix context phase: just randomly gen kv cache values. fix scenario sizes
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix tp size config in helix layer test
Signed-off-by: Matthias Jouanneaux <[email protected]>
minor changes for test
get trtllm-serve working with BF16 for gen with cp - v_b_proj weight loading needs to be revisited
$ CUDA_VISIBLE_DEVICES=0,1 trtllm-serve /home/scratch.trt_llm_data/llm-models/DeepSeek-V3-Lite/bf16/ --host localhost --port 8002 --cp_size 2 --extra_llm_api_options ./gen_extra-llm-api-config.yaml
end-to-end test in disagg works
$ pytest tests/integration/defs/disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_bf16_tllm_gen_helix -s -v
Switch to contiguous block dist among CP rank
save changes to _merge_requests()
undo changes to prepare_inputs()
Raise exception for blocks fewer than num_cp_ranks
save intermediate changes
attempt to fix attention tests
Signed-off-by: Matthias Jouanneaux <[email protected]>
save changes for minimal test
save minor dev comments
added helix inactive rank option to MLA kernels
Signed-off-by: Matthias Jouanneaux <[email protected]>
pass the right seq_lens_kv - test with seqlen 64 works
$ pytest tests/unittest/_torch/modules/test_mla_helix_expt.py -s -v
is_inactive_helix at request level
cp_allgather for position_id
helix: make inactive rank a bool tensor
Signed-off-by: Matthias Jouanneaux <[email protected]>
undo mapping changes to modeling_deepseek
Failed attempt to replace model_config.mapping
fill in helix_is_inactive for each request
update position_id logic
better way to package mapping - repurpose comms creation too
save disagg gen-only benchmark test
prep for integration test
improvements to position_id, num_cached_tokens_per_seq and tokens_per_block
changes to save blocks at prefill
changes to save blocks at decode
add changes to read KV from disk
updates to save and read KV blocks for all layers
over-allocate at prefill to get cache transmission right
prune saved KV cache files
updates to avoid over-allocation on gen side in disagg
Revert "over-allocate at prefill to get cache transmission right"
This reverts commit af7d000.
save disagg configs for DSV3 - currently goes OOM
verifying tests on 8 GPUs
helix: added (working) DS R1 8-GPU integration test
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix: added large prompt + ds lite config using large prompt
Signed-off-by: Matthias Jouanneaux <[email protected]>
save intermediate changes for fixes
fix debug printing
Signed-off-by: Matthias Jouanneaux <[email protected]>
Mention cache_transceiver_config.max_tokens_in_buffer for disagg servers
save initial changes to benchmarking script
added mjoux specific submit script, tighter timeouts, better defaults
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix slurm: increase timeouts slightly, use deepgemm moe backend for smaller models
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix slurm: add dataset caching path
Signed-off-by: Matthias Jouanneaux <[email protected]>
fix padding when input_len is divisible by tokens_per_block
save changes to test varying prompt len
fix_kvcache_split
Signed-off-by: Chuang Zhu <[email protected]>
avoid fabric memory and print send and recv sizes
auto-determine transceiver size
Signed-off-by: Matthias Jouanneaux <[email protected]>
remove verbose print output
Signed-off-by: Matthias Jouanneaux <[email protected]>
attempt to fix DS R1 run
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix slurm: fix parameters for DS R1 up to 256K tokens
Signed-off-by: Matthias Jouanneaux <[email protected]>
minor updates to reduce memory footprint and bring back warmup
enable cudagraph and add some debug prints
ugly hack to get results with 512k
updates to benchmark 1M seqlen
updates to benchmark 2M seqlen
updates for passing down moe properly
minor changes to get nsys profiles
test helix layer: support for slurm call, support for fp4
Signed-off-by: Matthias Jouanneaux <[email protected]>
test helix layer: added sbatch script
Signed-off-by: Matthias Jouanneaux <[email protected]>
add minimal cache transmission test for 1M seqlen
minor bug fix
changes to benchmark 4M seqlen
skip launch/wait of context servers when TRTLLM_DISAGG_BENCHMARK_GEN_ONLY=1
remove hacks; skip profiling; gpu_mem_frac
test helix layer: fix nvfp4 config to fit high perf mode
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix single layer: improved timing, added arg parsing, added output parsing
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix single layer: add dense option
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix slurm: fix gen_only config, support EP config, add submit script for multiple configs, remove build_wheel by default for array benchmarking
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix slurm: added parse script for results
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix single layer: fixed test, added config submit script, improved parsing
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix single layer: fix segment for sbatch script
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix: fixed TP-only runs (removed hack to make higher seq len work), improved sbatch scripts
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix: fix high node count runs, move back to e2e mode, improve parse script
Signed-off-by: Matthias Jouanneaux <[email protected]>
longer prompt for DSV3 Lite & DSR1 FP4 integration test
disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_bf16_tllm_gen_helix
disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_tllm_gen_helix
disaggregated/test_disaggregated.py::test_disaggregated_deepseek_r1_fp4_tllm_gen_helix
helix: added initial README for testing/benchmarking
Signed-off-by: Matthias Jouanneaux <[email protected]>
helix slurm: remove references to internal clusters
Signed-off-by: Matthias Jouanneaux <[email protected]>
minor updates to README
minor updates
helix: improve transpose/split for alltoall
Signed-off-by: Matthias Jouanneaux <[email protected]>
Revert "helix: improve transpose/split for alltoall"
This reverts commit c8b24b9.
helix: improve alltoall perf
Signed-off-by: Matthias Jouanneaux <[email protected]>
[https://nvbugs/5495789][feat] Optionally disable server GC and worker GC (#7995)
Signed-off-by: Tailing Yuan <[email protected]>
save changes for custom logging
redo cherry-pick of attention.py
save more changes for build and pipe-cleaning
save more changes
clean up - 1
clean up - 2
reuse mla_tensor_params instead of using helix_tensor_params
undo all_tp_rank_num_tokens
update test_disaggregated.py
updates to dsv3RopeOp
more cleanup
save fp8 disagg test
[https://nvbugs/5637012][fix] Fix helix unit tests
Signed-off-by: Balaram Buddharaju <[email protected]>
minor updates to attention.py
updates to test - seqlen 64 works
get integration test working1 parent 268ea9b commit 6d9b911
File tree
22 files changed
+383
-60
lines changed- cpp/tensorrt_llm
- kernels
- thop
- examples
- disaggregated/clients
- llm-api
- tensorrt_llm
- _torch
- attention_backend
- distributed
- models
- modules
- pyexecutor
- commands
- llmapi
- tests
- integration
- defs/disaggregated
- test_configs
- test_lists
- unittest/_torch/modules
22 files changed
+383
-60
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
351 | 351 | | |
352 | 352 | | |
353 | 353 | | |
354 | | - | |
| 354 | + | |
| 355 | + | |
355 | 356 | | |
356 | 357 | | |
357 | 358 | | |
| |||
424 | 425 | | |
425 | 426 | | |
426 | 427 | | |
427 | | - | |
428 | 428 | | |
429 | 429 | | |
430 | 430 | | |
| |||
460 | 460 | | |
461 | 461 | | |
462 | 462 | | |
463 | | - | |
| 463 | + | |
464 | 464 | | |
465 | 465 | | |
466 | | - | |
467 | 466 | | |
468 | 467 | | |
469 | 468 | | |
| |||
514 | 513 | | |
515 | 514 | | |
516 | 515 | | |
517 | | - | |
| 516 | + | |
518 | 517 | | |
519 | 518 | | |
520 | 519 | | |
| |||
1047 | 1046 | | |
1048 | 1047 | | |
1049 | 1048 | | |
1050 | | - | |
| 1049 | + | |
1051 | 1050 | | |
1052 | 1051 | | |
1053 | 1052 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
110 | 113 | | |
111 | 114 | | |
112 | 115 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
184 | | - | |
185 | | - | |
| 184 | + | |
| 185 | + | |
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
| |||
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
| 230 | + | |
230 | 231 | | |
231 | 232 | | |
232 | 233 | | |
233 | 234 | | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
234 | 239 | | |
235 | 240 | | |
236 | 241 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| 69 | + | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| |||
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| 109 | + | |
108 | 110 | | |
109 | 111 | | |
110 | 112 | | |
| |||
134 | 136 | | |
135 | 137 | | |
136 | 138 | | |
137 | | - | |
| 139 | + | |
138 | 140 | | |
139 | 141 | | |
140 | 142 | | |
| |||
153 | 155 | | |
154 | 156 | | |
155 | 157 | | |
| 158 | + | |
156 | 159 | | |
157 | 160 | | |
158 | 161 | | |
| |||
161 | 164 | | |
162 | 165 | | |
163 | 166 | | |
| 167 | + | |
| 168 | + | |
164 | 169 | | |
165 | 170 | | |
166 | 171 | | |
| |||
274 | 279 | | |
275 | 280 | | |
276 | 281 | | |
277 | | - | |
| 282 | + | |
278 | 283 | | |
279 | 284 | | |
280 | 285 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| 73 | + | |
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
| |||
259 | 260 | | |
260 | 261 | | |
261 | 262 | | |
| 263 | + | |
262 | 264 | | |
263 | 265 | | |
264 | 266 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
190 | | - | |
191 | 190 | | |
192 | 191 | | |
193 | 192 | | |
| |||
205 | 204 | | |
206 | 205 | | |
207 | 206 | | |
| 207 | + | |
| 208 | + | |
208 | 209 | | |
209 | 210 | | |
210 | 211 | | |
| |||
241 | 242 | | |
242 | 243 | | |
243 | 244 | | |
244 | | - | |
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
| |||
250 | 250 | | |
251 | 251 | | |
252 | 252 | | |
| 253 | + | |
| 254 | + | |
253 | 255 | | |
254 | 256 | | |
255 | 257 | | |
| |||
285 | 287 | | |
286 | 288 | | |
287 | 289 | | |
288 | | - | |
289 | 290 | | |
290 | 291 | | |
291 | 292 | | |
292 | 293 | | |
293 | 294 | | |
294 | 295 | | |
295 | 296 | | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
296 | 303 | | |
297 | 304 | | |
298 | 305 | | |
| |||
471 | 478 | | |
472 | 479 | | |
473 | 480 | | |
474 | | - | |
| 481 | + | |
475 | 482 | | |
476 | 483 | | |
477 | 484 | | |
| |||
630 | 637 | | |
631 | 638 | | |
632 | 639 | | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
633 | 647 | | |
634 | 648 | | |
635 | 649 | | |
| |||
838 | 852 | | |
839 | 853 | | |
840 | 854 | | |
841 | | - | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
842 | 872 | | |
843 | 873 | | |
844 | 874 | | |
| |||
1435 | 1465 | | |
1436 | 1466 | | |
1437 | 1467 | | |
1438 | | - | |
1439 | 1468 | | |
1440 | 1469 | | |
1441 | 1470 | | |
| |||
1458 | 1487 | | |
1459 | 1488 | | |
1460 | 1489 | | |
| 1490 | + | |
| 1491 | + | |
1461 | 1492 | | |
1462 | 1493 | | |
1463 | 1494 | | |
| |||
1717 | 1748 | | |
1718 | 1749 | | |
1719 | 1750 | | |
| 1751 | + | |
1720 | 1752 | | |
1721 | 1753 | | |
1722 | 1754 | | |
| |||
1736 | 1768 | | |
1737 | 1769 | | |
1738 | 1770 | | |
1739 | | - | |
| 1771 | + | |
| 1772 | + | |
| 1773 | + | |
| 1774 | + | |
| 1775 | + | |
| 1776 | + | |
| 1777 | + | |
| 1778 | + | |
| 1779 | + | |
| 1780 | + | |
1740 | 1781 | | |
1741 | 1782 | | |
1742 | 1783 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| |||
341 | 342 | | |
342 | 343 | | |
343 | 344 | | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
344 | 362 | | |
345 | 363 | | |
346 | | - | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
347 | 369 | | |
348 | 370 | | |
349 | 371 | | |
| |||
0 commit comments