Replies: 4 comments
-
Update: Priority Routing Decoded + Full E2E Verification (v2.7.1)Since the original post, we've identified the exact mechanism behind Claude Code's priority routing and replicated it in dario. Here's the full technical breakdown. Root Cause: The Billing TagClaude Code embeds a billing classification tag as the first line of its system prompt: This tag activates per-model rate limit evaluation on Anthropic's backend. Without it, requests are evaluated against the overall Proof: Additional Routing FactorsSDK reverse engineering revealed three more fields the CLI sends:
E2E Test Results — v2.7.1 (12/12 pass)Rate Limit SnapshotWhat's New in v2.6.0–v2.7.1
Key InsightBilling classification is determined solely by the OAuth token's subscription type — not by headers, betas, or metadata. But rate limit routing (which pool your request is evaluated against) depends on the billing tag in the system prompt. These are two separate systems:
Credit to @belangertrading for the initial 429 diagnosis and CLI fallback workaround that kicked off this investigation (#4, #6). April 10, 2026. Tested on Claude Max 5x at 100% 7d utilization. All findings independently verified via MITM proxy capture and systematic A/B testing. |
Beta Was this translation helpful? Give feedback.
-
Clarification: Per-model headers vs per-model routingFollow-up from today's default vs passthrough comparison testing (v2.8.0). Finding 1: Per-model headers are always returnedThe original post implied per-model rate limit headers ( The server always calculates per-model utilization. The billing tag determines whether it's used for routing decisions. At high overall utilization, default mode routes through the per-model pool (5% = allowed), while passthrough evaluates against overall 7d (100% = rejected). Finding 2: Per-model counters have independent reset schedulesAfter the 7d window reset this morning: Per-model utilization is tracked independently from overall utilization. They have different reset timestamps and don't move in lockstep. This means per-model pools can be at different utilization levels than the overall quota. Updated mental modelCore findings from the original post remain valid. This is a precision refinement. April 10, 2026. Tested on Claude Max 5x at 2% 7d / 34% 5h utilization (post-reset baseline). |
Beta Was this translation helpful? Give feedback.
-
|
Appreciate the shoutout. The 429s were driving us crazy running a multi-agent stack on Claude Max — the CLI fallback was duct tape until you found the real fix. Billing tag in the system prompt is wild. v2.8.0 running clean, zero 429s. Great work @askalf. |
Beta Was this translation helpful? Give feedback.
-
|
Glad it's solid for you. Your investigation on #4 and #6 is what got us to the root cause — wouldn't have found the billing tag without your MITM analysis and the A/B testing across versions. v2.8.1 just shipped with a cleanup pass (−7% codebase) and a Haiku fix. Also skips If you hit anything on the multi-agent stack, let us know. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Looking under the hood of Anthropic's unified rate limit system
Anthropic documents that Claude Max and Pro subscriptions have usage limits governed by two rolling windows — a 5-hour window and a 7-day ceiling — shared across claude.ai and Claude Code. They offer extra usage as a pay-as-you-go option when you exceed those limits. Claude Code exposes your utilization via the
/usagecommand and the statusline, which shows real-time 5h and 7d usage percentages.What I found interesting while building dario is the granular header system that powers all of this behind the scenes.
The Unified Rate Limit Headers
When you make an API call with a subscription OAuth token, Anthropic returns headers that expose the internal mechanics of the rate limit system:
Here's what stood out:
The representative claim system
When both windows are in play, the
representative-claimfield tells you which one is actually limiting you. In my case, the 5-hour utilization was only 38%, but the 7-day was at 81%. Therepresentative-claim: seven_dayconfirmed the weekly window was the bottleneck.This means you can start a fresh session, be well within your 5-hour budget, and still get throttled because of accumulated usage over the past week. Claude Code's statusline shows both percentages, but knowing which one is the
representative-claimrequires looking at the headers.The fallback system
When throttled, the headers show
fallback-percentage: 0.5— meaning only 50% of capacity is available. In practice, this means Opus and Sonnet return 429 while Haiku keeps working. Your subscription silently degrades to the cheapest model.The overage gap
With extra usage enabled and funded, I expected overage to kick in and keep Opus working. Instead:
The overage status changed to
allowed, but Opus still returned 429. The credits unlocked the overage bucket, but the rate limiter still blocked the request. This suggests there are at least two independent throttles — and extra usage only addresses one of them.The error message
When Opus or Sonnet gets throttled via direct API, the response is:
{"type":"error","error":{"type":"rate_limit_error","message":"Error"}}Just "Error." Claude Code's UI handles this more gracefully — it shows notifications and the statusline updates. But API consumers (including third-party tools) get no actionable information in the error body itself. The rate limit headers are there, but you have to know to look for them.
Claude Code Priority Routing
This is the most interesting finding. When Opus was returning 429 for direct API calls, I tested whether Claude Code itself could still use it.
Same OAuth token. Same API endpoint. Same headers. One goes through the Claude Code binary, one doesn't.
I verified this exhaustively with identical tokens, headers, session IDs, user agents, beta flags, and SDK versions. Every direct call returned 429. Every
claude --printsucceeded instantly.Claude Code's binary appears to have some form of priority routing that isn't replicable through the public API surface. This means third-party tools using subscription OAuth tokens are subject to stricter limits than Anthropic's own client, even with the same credentials.
What would help
Anthropic already provides solid visibility into rate limits via Claude Code's
/usagecommand and the statusline. A few additions would make the system even more transparent:Richer error responses. Instead of
"message":"Error", include the utilization percentage, which window triggered it, and when it resets in the JSON body — not just the headers. This would help third-party tool developers handle rate limits gracefully.Clarify extra usage behavior. If extra usage is enabled and funded, does it guarantee access to all models? The current behavior suggests it doesn't always.
Equal treatment for all OAuth clients. If a user authenticates with a valid subscription token, the rate limit behavior should be the same regardless of which client makes the call.
How to check your own status
Via Claude Code (built-in)
Via headers (raw)
Key fields:
unified-5h-utilization/unified-7d-utilization— your usage percentage per windowunified-representative-claim— which window is currently limiting youunified-fallback-percentage— how degraded your access isunified-overage-status— whether extra usage is activeThe workaround
I built dario as an OAuth proxy that lets any tool use your Claude subscription. When rate limits hit,
dario proxy --cliroutes through the Claude Code binary, which gets the same priority access:npx @askalf/dario login npx @askalf/dario proxy --cli # Opus works, even when the API returns 429April 2026. All data from direct observation on a Claude Max 5x subscription. Reproducible by anyone at >75% weekly utilization.
Sources:
Beta Was this translation helpful? Give feedback.
All reactions