Stop reporting a not-ready media model as 405 Method Not Allowed by stisiTT · Pull Request #4274 · tenstorrent/tt-inference-server

stisiTT · 2026-06-17T19:40:54Z

What this fixes

When a media model isn't ready yet (still warming up), the server replied to requests with HTTP 405 Method Not Allowed. That status is the wrong kind of signal: 405 means "you're calling this endpoint the wrong way" — a client/method/contract error. It says nothing about the model, and it points whoever sees it at the wrong problem.

That mislabeling had real consequences. Our CI health probe hit the endpoint, saw 405, and concluded the endpoint itself was broken — so it gave up instead of waiting for warmup. A human reading the same 405 would reasonably go check routing, methods, or the URL — none of which were actually wrong.

The fix makes every not-ready path return 503 Service Unavailable, which truthfully describes the situation: the endpoint is correct, the service is just temporarily not ready. Callers (CI, the cloud console, Kubernetes) already treat 503 as "retry shortly," so nothing on their side changes.

This is purely about what the endpoint reports. It does not change which endpoint anything hits, the routes, the methods, or the paths.

What changed

The scheduler's readiness check returns 503 ("warming up") instead of 405.
The same 405 was duplicated in five request endpoints (chat, llm, audio, video, fine-tuning) — all switched to 503.
Updated the /tt-liveness docstring and two tests to match.

Tested

tests/test_scheduler.py — 23 passed
tests/test_tt_maintenance_api.py — 12 passed
Validation runs across media models + devices dispatched (see comment below).

Out of scope (separate follow-up)

Two related-but-distinct items are not in this PR: (1) which endpoint consumers should probe (/health vs /tt-liveness), and (2) making the server actually detect a hung (not just not-ready) model and report it unhealthy. Those are tracked separately.

The scheduler raised HTTP 405 ("Method Not Allowed") while the model was still warming up, which is semantically wrong and confused health probes: a shield liveness probe polling /tt-liveness received 405 and treated the server as broken rather than warming up. Use 503 ("Service Unavailable") for every not-ready state so probes, the cloud console, and k8s all interpret it as "retry later." Applies the same fix to the inference endpoints (chat, llm, audio, video, fine_tuning) that independently re-raised 405 on the not-ready path.

stisiTT · 2026-06-17T19:55:04Z

tt-shield validation runs

Dispatched on-dispatch release runs to validate the 405→503 change across media models and devices. Most are on this PR's branch (stisi/fix-media-liveness-405-to-503); the FLUX p300x2 run uses a throwaway branch that also carries the trace_region_size override (PR #4273) so it reaches the health endpoint instead of crashing on the trace bug.

Targeted runs

Model	Runner	Device	Branch	Run
whisper-large-v3	p150	p150	405-fix	link
speecht5_tts	p150	p150	405-fix	link
Wan2.2-T2V-A14B	bh-qb-ge	p300x2	405-fix	link
FLUX.1-dev	bh-qb-ge	p300x2	405-fix + trace override	link

Blackhole galaxy (`bh-galaxy` / `blackhole_galaxy`)

Model	Run
whisper-large-v3	link
Wan2.2-T2V-A14B	link
Wan2.2-I2V-A14B	link
stable-diffusion-xl-base-1.0	link
mochi-1-preview	link

Wormhole galaxy (`6u` / `galaxy`)

Model	Run
whisper-large-v3	link
FLUX.1-dev	link
Wan2.2-T2V-A14B	link
stable-diffusion-xl-base-1.0	link
mochi-1-preview	link

Models excluded from galaxy coverage (no galaxy spec): speecht5_tts, FLUX.1-schnell, Z-Image-Turbo.

stisiTT requested review from a team, ddjukicTT, dmadicTT, fivanovicTT, idjuricTT, knovokmetTT, ljovanovicTT, lmuravljovTT, nanicicTT, visnjakrsmanovicTT, vpetrovicTT and ztorlakTT as code owners June 17, 2026 19:40

stisiTT mentioned this pull request Jun 17, 2026

Do checks does /health endpoint work fine with media models #4263

Open

stisiTT changed the title ~~Return 503 instead of 405 when media model not ready~~ Stop reporting a not-ready media model as 405 Method Not Allowed Jun 17, 2026

This was referenced Jun 17, 2026

tt-media-server: wire canary monitor to real device health checks #4275

Open

Wire canary monitor to real device health checks #4276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop reporting a not-ready media model as 405 Method Not Allowed#4274

Stop reporting a not-ready media model as 405 Method Not Allowed#4274
stisiTT wants to merge 1 commit into
mainfrom
stisi/fix-media-liveness-405-to-503

stisiTT commented Jun 17, 2026 •

edited

Loading

Uh oh!

stisiTT commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stisiTT commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this fixes

What changed

Tested

Out of scope (separate follow-up)

Uh oh!

stisiTT commented Jun 17, 2026

tt-shield validation runs

Targeted runs

Blackhole galaxy (bh-galaxy / blackhole_galaxy)

Wormhole galaxy (6u / galaxy)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stisiTT commented Jun 17, 2026 •

edited

Loading

Blackhole galaxy (`bh-galaxy` / `blackhole_galaxy`)

Wormhole galaxy (`6u` / `galaxy`)