Why Opus 4.6 feels worse (and how to fix it without reverting) #9

askalf · 2026-04-11T04:05:03Z

askalf
Apr 11, 2026
Maintainer

TL;DR

Opus 4.6 isn't dumber. Your context is poisoned and your tokens are being wasted. Here's what's actually happening and how to fix it.

The Two Problems Everyone Is Hitting

1. "Opus 4.6 gives worse answers than 4.5"

This is real — but it's not a model regression. It's context degradation from thinking block accumulation.

Opus 4.6 introduced adaptive thinking — the model generates internal reasoning blocks (thinking type) on every response. These blocks are large (2K-20K tokens each) and by default they stay in your conversation history. By turn 15-20 of a multi-turn session, you can have 100K+ tokens of stale thinking noise sitting in context alongside your actual conversation.

The model's effective reasoning degrades well before the 1M context limit. Users report quality drops at 20-40% context fill. This isn't a model issue — it's the same model drowning in its own prior reasoning traces.

Why Opus 4.5 didn't have this problem: No adaptive thinking. Shorter default context. Less accumulated noise per turn.

2. "My Max subscription runs out in hours"

Three compounding causes:

Thinking tokens bill as output. Every adaptive thinking response generates thousands of reasoning tokens at output pricing. With effort: high, a single response can burn 20K+ thinking tokens before writing a single line of code. Ten of those and you've consumed 200K output tokens from thinking you never see.

Tool definitions are invisible context bloat. MCP servers inject tool schemas into every request. Each server adds ~18K tokens. If you have 3 MCP servers configured, that's 54K tokens of tool definitions sent on every single message — before your actual conversation even starts.

Autocompact is miscalibrated. Claude Code's autocompact triggers at ~76K tokens despite the 1M context window. This threshold was calibrated for the old 200K window and was never updated. It fires too early, compacts aggressively, and then the next turn re-expands with tool definitions anyway.

What Real Claude Code Actually Sends

From our reverse engineering of the Claude Code binary, here's what Claude Code does to manage this:

Effort: `medium`, not `high`

{
  "output_config": { "effort": "medium" }
}

Claude Code defaults to medium effort. This is deliberate — it balances reasoning quality against token consumption. Most third-party tools default to high or don't set it at all (which defaults to high). This alone can 2-3x your token burn rate.

Context management: `clear_thinking`

{
  "context_management": {
    "edits": [{ "type": "clear_thinking_20251015", "keep": "all" }]
  }
}

This strips thinking blocks from prior turns in the conversation history before sending the next request. Without it, every prior thinking trace stays in context, accumulating noise and consuming input tokens on every subsequent request.

Adaptive thinking: `adaptive`, not `enabled`

{
  "thinking": { "type": "adaptive" }
}

adaptive lets the model decide when to think and how deeply. enabled with a fixed budget forces thinking on every response. Most frameworks that "support thinking" use enabled with high budgets — burning tokens on trivial responses that don't benefit from extended reasoning.

Max tokens: `64000`

Claude Code sends max_tokens: 64000. Some frameworks send 16000 or even 8192, which constrains the model's response space. Others send 128000, which inflates the thinking budget allocation. 64000 is the sweet spot the binary uses.

What dario Does About This

dario injects all of the above automatically. When a request comes through the proxy:

Parameter	What your client probably sends	What dario injects
`thinking`	`{ type: "enabled", budget: 32000 }`	`{ type: "adaptive" }`
`output_config.effort`	`high` (or nothing, defaults to `high`)	`medium`
`context_management`	Nothing	`clear_thinking_20251015` with `keep: all`
`max_tokens`	8192-16000	64000
Billing tag	Nothing	Full Claude Code fingerprint

dario only sets defaults — it never overrides values your client explicitly sends. If you want effort: high for a specific task, send it and dario will pass it through.

Environment Variables You Can Set

From the Claude Code binary, these env vars control the behavior directly if you're running Claude Code (not through dario):

Variable	What it does	Default
`CLAUDE_CODE_EFFORT_LEVEL`	Override effort level (`low`, `medium`, `high`)	`medium`
`CLAUDE_AUTOCOMPACT_PCT_OVERRIDE`	Autocompact trigger threshold (0-100)	~76% of context window
`CLAUDE_CODE_AUTO_COMPACT_WINDOW`	Set the context window size for compaction calculation	200000

If you're using Claude Code directly and burning tokens fast, try:

export CLAUDE_CODE_EFFORT_LEVEL=medium

If you're using dario, you don't need to set anything — these are already the defaults.

The Actual Fix for "Opus 4.6 Is Dumb"

Clear thinking history. The API's context_management: clear_thinking does NOT reduce input token billing (verified: same input tokens with or without it). You must strip thinking blocks client-side, or use dario v2.9.0 which does it automatically. Restart conversations every 30-40 turns for best quality.
Use effort: medium. The quality difference between medium and high is marginal for most tasks. The token difference is ~2x (verified: 2.22x on complex prompts).
Audit your MCP servers. Run claude mcp list and count the tools. Each tool definition is ~500 tokens. 30 tools = 15K tokens per message, before you say anything. Remove servers you're not actively using.
Don't revert to Opus 4.5. The model is better. The defaults around it are worse. Fix the defaults.

If you're running agents through dario, all of this is handled automatically. npm install -g @askalf/dario

Documented from binary reverse engineering of Claude Code v2.1.100. See Discussion #8 for the full fingerprint analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Opus 4.6 feels worse (and how to fix it without reverting) #9

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why Opus 4.6 feels worse (and how to fix it without reverting) #9

Uh oh!

Uh oh!

askalf Apr 11, 2026 Maintainer

TL;DR

The Two Problems Everyone Is Hitting

1. "Opus 4.6 gives worse answers than 4.5"

2. "My Max subscription runs out in hours"

What Real Claude Code Actually Sends

Effort: medium, not high

Context management: clear_thinking

Adaptive thinking: adaptive, not enabled

Max tokens: 64000

What dario Does About This

Environment Variables You Can Set

The Actual Fix for "Opus 4.6 Is Dumb"

Replies: 0 comments

askalf
Apr 11, 2026
Maintainer

Effort: `medium`, not `high`

Context management: `clear_thinking`

Adaptive thinking: `adaptive`, not `enabled`

Max tokens: `64000`