Skip to content

Stop reporting a not-ready media model as 405 Method Not Allowed#4274

Open
stisiTT wants to merge 1 commit into
mainfrom
stisi/fix-media-liveness-405-to-503
Open

Stop reporting a not-ready media model as 405 Method Not Allowed#4274
stisiTT wants to merge 1 commit into
mainfrom
stisi/fix-media-liveness-405-to-503

Conversation

@stisiTT

@stisiTT stisiTT commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What this fixes

When a media model isn't ready yet (still warming up), the server replied to requests with HTTP 405 Method Not Allowed. That status is the wrong kind of signal: 405 means "you're calling this endpoint the wrong way" — a client/method/contract error. It says nothing about the model, and it points whoever sees it at the wrong problem.

That mislabeling had real consequences. Our CI health probe hit the endpoint, saw 405, and concluded the endpoint itself was broken — so it gave up instead of waiting for warmup. A human reading the same 405 would reasonably go check routing, methods, or the URL — none of which were actually wrong.

The fix makes every not-ready path return 503 Service Unavailable, which truthfully describes the situation: the endpoint is correct, the service is just temporarily not ready. Callers (CI, the cloud console, Kubernetes) already treat 503 as "retry shortly," so nothing on their side changes.

This is purely about what the endpoint reports. It does not change which endpoint anything hits, the routes, the methods, or the paths.

What changed

  • The scheduler's readiness check returns 503 ("warming up") instead of 405.
  • The same 405 was duplicated in five request endpoints (chat, llm, audio, video, fine-tuning) — all switched to 503.
  • Updated the /tt-liveness docstring and two tests to match.

Tested

  • tests/test_scheduler.py — 23 passed
  • tests/test_tt_maintenance_api.py — 12 passed
  • Validation runs across media models + devices dispatched (see comment below).

Out of scope (separate follow-up)

Two related-but-distinct items are not in this PR: (1) which endpoint consumers should probe (/health vs /tt-liveness), and (2) making the server actually detect a hung (not just not-ready) model and report it unhealthy. Those are tracked separately.

The scheduler raised HTTP 405 ("Method Not Allowed") while the model was
still warming up, which is semantically wrong and confused health probes:
a shield liveness probe polling /tt-liveness received 405 and treated the
server as broken rather than warming up.

Use 503 ("Service Unavailable") for every not-ready state so probes,
the cloud console, and k8s all interpret it as "retry later." Applies
the same fix to the inference endpoints (chat, llm, audio, video,
fine_tuning) that independently re-raised 405 on the not-ready path.
@stisiTT

stisiTT commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

tt-shield validation runs

Dispatched on-dispatch release runs to validate the 405→503 change across media models and devices. Most are on this PR's branch (stisi/fix-media-liveness-405-to-503); the FLUX p300x2 run uses a throwaway branch that also carries the trace_region_size override (PR #4273) so it reaches the health endpoint instead of crashing on the trace bug.

Targeted runs

Model Runner Device Branch Run
whisper-large-v3 p150 p150 405-fix link
speecht5_tts p150 p150 405-fix link
Wan2.2-T2V-A14B bh-qb-ge p300x2 405-fix link
FLUX.1-dev bh-qb-ge p300x2 405-fix + trace override link

Blackhole galaxy (bh-galaxy / blackhole_galaxy)

Model Run
whisper-large-v3 link
Wan2.2-T2V-A14B link
Wan2.2-I2V-A14B link
stable-diffusion-xl-base-1.0 link
mochi-1-preview link

Wormhole galaxy (6u / galaxy)

Model Run
whisper-large-v3 link
FLUX.1-dev link
Wan2.2-T2V-A14B link
stable-diffusion-xl-base-1.0 link
mochi-1-preview link

Models excluded from galaxy coverage (no galaxy spec): speecht5_tts, FLUX.1-schnell, Z-Image-Turbo.

@stisiTT stisiTT changed the title Return 503 instead of 405 when media model not ready Stop reporting a not-ready media model as 405 Method Not Allowed Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant