Skip to content

feat(http): per-turn cost_usd in /v1/chat/completions usage#68

Merged
manojp99 merged 2 commits into
mainfrom
feat/cost-in-http-face
Jun 21, 2026
Merged

feat(http): per-turn cost_usd in /v1/chat/completions usage#68
manojp99 merged 2 commits into
mainfrom
feat/cost-in-http-face

Conversation

@manojp99

@manojp99 manojp99 commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Surfaces per-turn dollar cost on the chat-completions HTTP face by lifting the cost_usd value that provider modules already emit on the NDJSON wire (via amplifier_agent_lib/bundle/hook_streaming.py) into the OpenAI usage envelope.

Pure wire translation — no new pricing data, no provider-side changes, no parallel catalog surface. The cost telemetry already flows on the JSON-RPC wire; this PR just plumbs it through to the OpenAI-shape wire that opencode and other OpenAI-compatible clients see.

What the wire looks like now

"usage": {
  "prompt_tokens":           9276,
  "completion_tokens":       54,
  "total_tokens":            9330,
  "prompt_tokens_details":   { "cached_tokens": 0 },
  "cost_usd":                "0.0118625"
}

cost_usd is a string to preserve Decimal precision. Standard OpenAI clients ignore the non-standard field; cost-aware clients can render the real dollar value.

Implementation

File Change
_event_translator.py extract_usage widens return type to dict[str, Any]; reads event['cost'] (set by hook_streaming from kernel cost_usd) and stamps cost_usd: str(...) on the result
_wire.py _build_usage_block, stop_chunk, tool_calls_stop_chunk accept `cost_usd: str
routes/chat_completions.py Accumulates `usage_cost: Decimal

cost_usd is omitted from the response entirely when no provider emitted it (older provider modules, third-party endpoints without cost telemetry).

Verified end-to-end

$ curl -X POST /v1/chat/completions -d '{"model":"claude-haiku-4-5-...", ...}' | grep usage
"usage":{"prompt_tokens":9276,"completion_tokens":54,"total_tokens":9330,
         "prompt_tokens_details":{"cached_tokens":0},"cost_usd":"0.0118625"}

Real $$ from Anthropic's pricing, accumulated and surfaced on the OpenAI wire.

Compatibility

  • Additive — standard OpenAI clients ignore the new cost_usd field.
  • No wire-protocol bump.
  • No provider-side change required — the field is only emitted when the provider already supplies cost_usd (anthropic, openai, and chat-completions providers all do today).

Companion PRs

  • microsoft/amplifier-module-provider-anthropic#59 — closed (catalog approach was redundant with existing NDJSON flow)
  • microsoft/amplifier-app-opencode#2 — surfaces per-model limit in the opencode config (was always available in /v1/models; was stripped for paranoia)

Manoj Prabhakar Paidiparthy and others added 2 commits June 20, 2026 19:29
Captures the chat-completions HTTP face (#65) and model-routing matrix
integration (#64). New `amplifier-agent serve chat-completions` exposes
amplifier-agent as an OpenAI-compatible HTTP service for embedding in
third-party tools (opencode, custom UIs). New `amplifier-agent auth`
subcommand persists provider credentials to ~/.amplifier-agent/credentials.json
so users can configure once and have every invocation pick them up.

Wire protocol unchanged at 0.3.0; no wrapper bump required. TypeScript
wrapper stays at 0.7.0, Python wrapper stays at 0.3.0.

See CHANGELOG.md for full details.

Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Provider modules already compute cost_usd per turn and emit it on the
JSON-RPC NDJSON wire via hook_streaming
(amplifier_agent_lib/bundle/hook_streaming.py). The chat-completions
HTTP face was throwing this away — extracting only token counts from
kernel usage events. Now it lifts cost_usd through too, accumulating
across sub-turns (a single user turn can drive multiple LLM calls for
tool-call rounds) and emitting the total in the OpenAI usage envelope
as the non-standard `cost_usd` field. Real $$ from the provider's
own pricing, surfaced on the SSE response.

Implementation
--------------
- `_event_translator.extract_usage`: widened return type from
  `dict[str, int]` to `dict[str, Any]`; reads `event['cost']` (set by
  hook_streaming from kernel `cost_usd`) and stamps it on the result
  as `cost_usd: str(...)`.
- `_wire._build_usage_block / stop_chunk / tool_calls_stop_chunk`: new
  `cost_usd: str | None = None` parameter; surfaced on the usage block
  when set.
- `routes/chat_completions`: accumulates `usage_cost: Decimal | None`
  across all usage events in the turn (preserves precision), serializes
  to str on emission, passes through to the terminal chunk helpers.

cost_usd is a string (Decimal precision) and is omitted entirely when
no provider emitted it (older provider modules, third-party endpoints
without cost telemetry, etc.) — standard OpenAI clients ignore the
non-standard field.

Verified end-to-end against the running server: a one-word reply with
claude-haiku-4-5 produces `cost_usd: '0.0118625'` in the terminal
chunk's usage block.

This is a wire translation — it leverages cost telemetry that already
flows on the NDJSON wire. No new pricing catalog or provider-side
plumbing required.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@manojp99 manojp99 changed the title feat(http): surface per-model pricing in /v1/models and per-turn cost in chat-completions usage feat(http): per-turn cost_usd in /v1/chat/completions usage Jun 21, 2026
@manojp99 manojp99 force-pushed the feat/cost-in-http-face branch from 9344217 to bccb5f3 Compare June 21, 2026 17:57
@manojp99 manojp99 merged commit 782d132 into main Jun 21, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant