What Claude's Rate Limit Headers Reveal About Subscription Throttling #1

askalf · 2026-04-08T20:54:58Z

askalf
Apr 8, 2026
Maintainer

Looking under the hood of Anthropic's unified rate limit system

Anthropic documents that Claude Max and Pro subscriptions have usage limits governed by two rolling windows — a 5-hour window and a 7-day ceiling — shared across claude.ai and Claude Code. They offer extra usage as a pay-as-you-go option when you exceed those limits. Claude Code exposes your utilization via the /usage command and the statusline, which shows real-time 5h and 7d usage percentages.

What I found interesting while building dario is the granular header system that powers all of this behind the scenes.

The Unified Rate Limit Headers

When you make an API call with a subscription OAuth token, Anthropic returns headers that expose the internal mechanics of the rate limit system:

anthropic-ratelimit-unified-status: allowed_warning
anthropic-ratelimit-unified-5h-status: allowed
anthropic-ratelimit-unified-5h-utilization: 0.38
anthropic-ratelimit-unified-5h-reset: 1775692800
anthropic-ratelimit-unified-7d-status: allowed_warning
anthropic-ratelimit-unified-7d-utilization: 0.81
anthropic-ratelimit-unified-7d-surpassed-threshold: 0.75
anthropic-ratelimit-unified-representative-claim: seven_day
anthropic-ratelimit-unified-fallback-percentage: 0.5
anthropic-ratelimit-unified-overage-status: rejected
anthropic-ratelimit-unified-overage-disabled-reason: out_of_credits
anthropic-ratelimit-unified-upgrade-paths: upgrade_plan,overage

Here's what stood out:

The representative claim system

When both windows are in play, the representative-claim field tells you which one is actually limiting you. In my case, the 5-hour utilization was only 38%, but the 7-day was at 81%. The representative-claim: seven_day confirmed the weekly window was the bottleneck.

This means you can start a fresh session, be well within your 5-hour budget, and still get throttled because of accumulated usage over the past week. Claude Code's statusline shows both percentages, but knowing which one is the representative-claim requires looking at the headers.

The fallback system

When throttled, the headers show fallback-percentage: 0.5 — meaning only 50% of capacity is available. In practice, this means Opus and Sonnet return 429 while Haiku keeps working. Your subscription silently degrades to the cheapest model.

The overage gap

With extra usage enabled and funded, I expected overage to kick in and keep Opus working. Instead:

BEFORE adding $10 in credits:
  overage-status: rejected
  overage-disabled-reason: out_of_credits

AFTER adding $10 in credits:
  overage-status: allowed
  overage-utilization: 0.0

The overage status changed to allowed, but Opus still returned 429. The credits unlocked the overage bucket, but the rate limiter still blocked the request. This suggests there are at least two independent throttles — and extra usage only addresses one of them.

The error message

When Opus or Sonnet gets throttled via direct API, the response is:

{"type":"error","error":{"type":"rate_limit_error","message":"Error"}}

Just "Error." Claude Code's UI handles this more gracefully — it shows notifications and the statusline updates. But API consumers (including third-party tools) get no actionable information in the error body itself. The rate limit headers are there, but you have to know to look for them.

Claude Code Priority Routing

This is the most interesting finding. When Opus was returning 429 for direct API calls, I tested whether Claude Code itself could still use it.

Same OAuth token. Same API endpoint. Same headers. One goes through the Claude Code binary, one doesn't.

# Direct API — 429
curl https://api.anthropic.com/v1/messages ... → 429

# Claude Code — 200
echo "Say OK" | claude --print --model claude-opus-4-6 → "OK"

I verified this exhaustively with identical tokens, headers, session IDs, user agents, beta flags, and SDK versions. Every direct call returned 429. Every claude --print succeeded instantly.

Claude Code's binary appears to have some form of priority routing that isn't replicable through the public API surface. This means third-party tools using subscription OAuth tokens are subject to stricter limits than Anthropic's own client, even with the same credentials.

What would help

Anthropic already provides solid visibility into rate limits via Claude Code's /usage command and the statusline. A few additions would make the system even more transparent:

Richer error responses. Instead of "message":"Error", include the utilization percentage, which window triggered it, and when it resets in the JSON body — not just the headers. This would help third-party tool developers handle rate limits gracefully.
Clarify extra usage behavior. If extra usage is enabled and funded, does it guarantee access to all models? The current behavior suggests it doesn't always.
Equal treatment for all OAuth clients. If a user authenticates with a valid subscription token, the rate limit behavior should be the same regardless of which client makes the call.

How to check your own status

Via Claude Code (built-in)

# Quick check
/usage                    # in Claude Code interactive mode

# Statusline (real-time)
/statusline               # configure to show 5h/7d percentages

Via headers (raw)

echo "hi" | ANTHROPIC_LOG=debug claude --print --model claude-haiku-4-5 2>&1 | grep "ratelimit"

Key fields:

unified-5h-utilization / unified-7d-utilization — your usage percentage per window
unified-representative-claim — which window is currently limiting you
unified-fallback-percentage — how degraded your access is
unified-overage-status — whether extra usage is active

The workaround

I built dario as an OAuth proxy that lets any tool use your Claude subscription. When rate limits hit, dario proxy --cli routes through the Claude Code binary, which gets the same priority access:

npx @askalf/dario login
npx @askalf/dario proxy --cli
# Opus works, even when the API returns 429

April 2026. All data from direct observation on a Claude Max 5x subscription. Reproducible by anyone at >75% weekly utilization.

Sources:

askalf · 2026-04-10T13:24:29Z

askalf
Apr 10, 2026
Maintainer Author

Update: Priority Routing Decoded + Full E2E Verification (v2.7.1)

Since the original post, we've identified the exact mechanism behind Claude Code's priority routing and replicated it in dario. Here's the full technical breakdown.

Root Cause: The Billing Tag

Claude Code embeds a billing classification tag as the first line of its system prompt:

x-anthropic-billing-header: cc_version=2.1.100.3d2; cc_entrypoint=cli; cch=98638;

This tag activates per-model rate limit evaluation on Anthropic's backend. Without it, requests are evaluated against the overall 7d utilization (which can be at 100%). With it, requests are evaluated against model-specific pools like 7d_sonnet (often at 5%) or 7d_opus.

Proof:

# Same token, same moment, 7d utilization at 100%:
Direct API (no billing tag): 429
With billing tag in system prompt: 200

Response headers confirm dual evaluation:
  7d-status: rejected          (overall: 100%)
  7d-utilization: 1.0
  7d_sonnet-status: allowed     (model-specific: 5%)
  7d_sonnet-utilization: 0.05

Additional Routing Factors

SDK reverse engineering revealed three more fields the CLI sends:

Field	What it does	dario v2.7.1
`thinking: { type: 'adaptive' }`	Model decides when/how much to think (deprecated: `type: enabled`)	✅ Injected for Opus/Sonnet
`service_tier: 'auto'`	Requests priority capacity pool (50% fallback allocation)	✅ Injected
`effort-2025-11-24` beta	Enables reasoning effort control	✅ Added to beta header
`context_management`	Thinking compaction strategy	✅ Injected for thinking models

E2E Test Results — v2.7.1 (12/12 pass)

✅ #1  Health endpoint
✅ #2  Status endpoint
✅ #3  Models endpoint (3 models)
✅ #4  Haiku non-streaming
✅ #5  Sonnet non-streaming (adaptive thinking)
✅ #6  Opus non-streaming (adaptive thinking)
✅ #7  Sonnet streaming (SSE)
✅ #8  Opus streaming (SSE + thinking deltas)
✅ #9  OpenAI compat non-streaming
✅ #10 OpenAI compat streaming
✅ #11 Tool use (function calling)
✅ #12 Rate limit headers (all 6 categories)

Rate Limit Snapshot

5h-utilization: 0.14    (14%)
7d-utilization: 0.0     (just reset)
overage-utilization: 0.0
fallback-percentage: 0.5
representative-claim: five_hour
status: allowed

What's New in v2.6.0–v2.7.1

v2.6.0: Priority routing via billing tag injection + automatic CLI fallback on 429 with SSE conversion
v2.7.0: Adaptive thinking, service_tier: auto, effort beta
v2.7.1: Fix Haiku 400 — skip thinking/context_management for non-thinking models

Key Insight

Billing classification is determined solely by the OAuth token's subscription type — not by headers, betas, or metadata. But rate limit routing (which pool your request is evaluated against) depends on the billing tag in the system prompt. These are two separate systems:

Billing → token-based → cannot be changed
Routing → system-prompt-based → can be replicated

Credit to @belangertrading for the initial 429 diagnosis and CLI fallback workaround that kicked off this investigation (#4, #6).

April 10, 2026. Tested on Claude Max 5x at 100% 7d utilization. All findings independently verified via MITM proxy capture and systematic A/B testing.

0 replies

askalf · 2026-04-10T14:11:30Z

askalf
Apr 10, 2026
Maintainer Author

Clarification: Per-model headers vs per-model routing

Follow-up from today's default vs passthrough comparison testing (v2.8.0).

Finding 1: Per-model headers are always returned

The original post implied per-model rate limit headers (7d_sonnet-utilization, etc.) only appear with priority routing. That's not accurate — they're returned by the server regardless of whether the billing tag is present.

# Passthrough mode (no billing tag, no injection):
7d_sonnet-utilization: 0.05
7d_sonnet-status: allowed
7d_sonnet-reset: 1776020400

# Default mode (with billing tag):
7d_sonnet-utilization: 0.05
7d_sonnet-status: allowed
7d_sonnet-reset: 1776020400

The server always calculates per-model utilization. The billing tag determines whether it's used for routing decisions. At high overall utilization, default mode routes through the per-model pool (5% = allowed), while passthrough evaluates against overall 7d (100% = rejected).

Finding 2: Per-model counters have independent reset schedules

After the 7d window reset this morning:

7d-utilization: 0.02        (reset to near-zero)
7d-reset: 1776427200         (next reset timestamp)

7d_sonnet-utilization: 0.05  (NOT reset — different counter)
7d_sonnet-reset: 1776020400  (different reset timestamp)

Per-model utilization is tracked independently from overall utilization. They have different reset timestamps and don't move in lockstep. This means per-model pools can be at different utilization levels than the overall quota.

Updated mental model

Server always tracks:
  - Overall 5h utilization
  - Overall 7d utilization  
  - Per-model 7d utilization (sonnet, opus, haiku separately)
  - Overage utilization

Routing decision:
  Without billing tag: evaluate against overall 7d
  With billing tag:    evaluate against per-model 7d

Core findings from the original post remain valid. This is a precision refinement.

April 10, 2026. Tested on Claude Max 5x at 2% 7d / 34% 5h utilization (post-reset baseline).

0 replies

belangertrading · 2026-04-10T14:30:07Z

belangertrading
Apr 10, 2026

Appreciate the shoutout. The 429s were driving us crazy running a multi-agent stack on Claude Max — the CLI fallback was duct tape until you found the real fix. Billing tag in the system prompt is wild. v2.8.0 running clean, zero 429s. Great work @askalf.

0 replies

askalf · 2026-04-10T14:32:39Z

askalf
Apr 10, 2026
Maintainer Author

Glad it's solid for you. Your investigation on #4 and #6 is what got us to the root cause — wouldn't have found the billing tag without your MITM analysis and the A/B testing across versions.

v2.8.1 just shipped with a cleanup pass (−7% codebase) and a Haiku fix. Also skips output_config.effort for Haiku now, same pattern as thinking. 12/12 E2E passing.

If you hit anything on the multi-agent stack, let us know.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What Claude's Rate Limit Headers Reveal About Subscription Throttling #1

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What Claude's Rate Limit Headers Reveal About Subscription Throttling #1

Uh oh!

Uh oh!

askalf Apr 8, 2026 Maintainer

Looking under the hood of Anthropic's unified rate limit system

The Unified Rate Limit Headers

The representative claim system

The fallback system

The overage gap

The error message

Claude Code Priority Routing

What would help

How to check your own status

Via Claude Code (built-in)

Via headers (raw)

The workaround

Replies: 4 comments

Uh oh!

askalf Apr 10, 2026 Maintainer Author

Update: Priority Routing Decoded + Full E2E Verification (v2.7.1)

Root Cause: The Billing Tag

Additional Routing Factors

E2E Test Results — v2.7.1 (12/12 pass)

Rate Limit Snapshot

What's New in v2.6.0–v2.7.1

Key Insight

Uh oh!

askalf Apr 10, 2026 Maintainer Author

Clarification: Per-model headers vs per-model routing

Finding 1: Per-model headers are always returned

Finding 2: Per-model counters have independent reset schedules

Updated mental model

Uh oh!

belangertrading Apr 10, 2026

Uh oh!

askalf Apr 10, 2026 Maintainer Author

askalf
Apr 8, 2026
Maintainer

askalf
Apr 10, 2026
Maintainer Author

askalf
Apr 10, 2026
Maintainer Author

belangertrading
Apr 10, 2026

askalf
Apr 10, 2026
Maintainer Author