mock: cache-aware prefill/decode latency so disaggregated TTFT/TPS tests are measurable by ddjukicTT · Pull Request #4246 · tenstorrent/tt-inference-server

ddjukicTT · 2026-06-16T13:27:36Z

Make the prefill/decode disaggregation smoke suite runnable end-to-end from the top-level run.py, against a self-contained mock stack that's faithful enough to measure prefix-cache behavior

Commands:

python run.py --workflow prefill_decode --served-model moonshotai/Kimi-K2.6      # non-catalog
python run.py --model DeepSeek-R1-0528 --workflow prefill_decode --device galaxy  # catalog

test_07 fails on the mock simulator: the prefix cache is correct , but its TTFT-ratio assertion trips because simulator TTFT (~40 ms) is dominated by prompt transport/tokenization, not on-device prefill. Sim artifact, not a regression; tunable via TTFT_MEANINGFUL_S / TTFT_HIT_MAX_FRACTION.

…TPS tests are measurable

dmadicTT · 2026-06-16T18:44:11Z

-    echo "DYNAMO_ENDPOINT_NAME=generate"
-}
-
 start_frontend() {


Can we run deploy.sh instead of run_stack.sh?

dmadicTT · 2026-06-16T18:47:27Z

+// active sequence and takes MOCK_DECODE_SLEEP_US, modeling inter-token latency
+// so the decode tokens-per-second (TPS ≈ 1e6 / MOCK_DECODE_SLEEP_US) is
+// measurable on the mock. Default 0 (tokens emitted as fast as the loop runs).
+std::chrono::microseconds mockDecodeDelay() {


LLM_DEVICE_BACKEND=mock_pipeline already has realistic model latency https://github.com/tenstorrent/tt-inference-server/blob/main/tt-media-server/cpp_server/include/runtime/runners/blaze_runner/blaze_utils.hpp#L191

dmadicTT · 2026-06-16T18:50:10Z

In general, we should think switching to docker-compose to deploy multiple related docker containers instead of multiple bash scripts

…rver into ddjukic/prefill-functional-requirements-test # Conflicts: # run.py # workflows/runtime_config.py # workflows/v2_bridge.py # workflows/validate_setup.py

ddjukicTT added 4 commits June 16, 2026 13:20

feat: mock: cache-aware prefill/decode latency so disaggregated TTFT/…

ead3ce8

…TPS tests are measurable

fix: extract common parts from run_stack.sh and deploy.sh

518de34

fix: ruff format

dd70eda

fix: "Failed to open existing queue: tt_mem_results

f76dd51

dmadicTT reviewed Jun 16, 2026

View reviewed changes

Comment thread tt-media-server/cpp_server/src/runtime/runners/llm_runner.cpp

dmadicTT reviewed Jun 16, 2026

View reviewed changes

Comment thread tt-media-server/cpp_server/src/runtime/runners/llm_runner.cpp Outdated

dmadicTT reviewed Jun 16, 2026

View reviewed changes

ddjukicTT added 7 commits June 17, 2026 13:56

fix: switch to Blaze mock_pipeline backend

13926ba

feat: add multi-turn prefix-cache test

d4627ab

fix: format fix

29192a9

feat: wire the prefill/decode smoke suite into the top-level run.py

aae6e63

Merge branch 'main' of https://github.com/tenstorrent/tt-inference-se…

ef702ec

…rver into ddjukic/prefill-functional-requirements-test # Conflicts: # run.py # workflows/runtime_config.py # workflows/v2_bridge.py # workflows/validate_setup.py

fix: --model is ptional only for --workflow prefill_decode

42be667

feat: add prefill on decode/prefill node

1508377

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mock: cache-aware prefill/decode latency so disaggregated TTFT/TPS tests are measurable#4246

mock: cache-aware prefill/decode latency so disaggregated TTFT/TPS tests are measurable#4246
ddjukicTT wants to merge 11 commits into
mainfrom
ddjukic/prefill-functional-requirements-test

ddjukicTT commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dmadicTT Jun 16, 2026

Uh oh!

dmadicTT Jun 16, 2026

Uh oh!

dmadicTT Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ddjukicTT commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmadicTT Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

dmadicTT Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

dmadicTT Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ddjukicTT commented Jun 16, 2026 •

edited

Loading