feat(http): per-turn cost_usd in /v1/chat/completions usage by manojp99 · Pull Request #68 · microsoft/amplifier-agent

manojp99 · 2026-06-21T09:51:29Z

Summary

Surfaces per-turn dollar cost on the chat-completions HTTP face by lifting the cost_usd value that provider modules already emit on the NDJSON wire (via amplifier_agent_lib/bundle/hook_streaming.py) into the OpenAI usage envelope.

Pure wire translation — no new pricing data, no provider-side changes, no parallel catalog surface. The cost telemetry already flows on the JSON-RPC wire; this PR just plumbs it through to the OpenAI-shape wire that opencode and other OpenAI-compatible clients see.

What the wire looks like now

"usage": {
  "prompt_tokens":           9276,
  "completion_tokens":       54,
  "total_tokens":            9330,
  "prompt_tokens_details":   { "cached_tokens": 0 },
  "cost_usd":                "0.0118625"
}

cost_usd is a string to preserve Decimal precision. Standard OpenAI clients ignore the non-standard field; cost-aware clients can render the real dollar value.

Implementation

File	Change
`_event_translator.py`	`extract_usage` widens return type to `dict[str, Any]`; reads `event['cost']` (set by `hook_streaming` from kernel `cost_usd`) and stamps `cost_usd: str(...)` on the result
`_wire.py`	`_build_usage_block`, `stop_chunk`, `tool_calls_stop_chunk` accept `cost_usd: str
`routes/chat_completions.py`	Accumulates `usage_cost: Decimal

cost_usd is omitted from the response entirely when no provider emitted it (older provider modules, third-party endpoints without cost telemetry).

Verified end-to-end

$ curl -X POST /v1/chat/completions -d '{"model":"claude-haiku-4-5-...", ...}' | grep usage
"usage":{"prompt_tokens":9276,"completion_tokens":54,"total_tokens":9330,
         "prompt_tokens_details":{"cached_tokens":0},"cost_usd":"0.0118625"}

Real $$ from Anthropic's pricing, accumulated and surfaced on the OpenAI wire.

Compatibility

Additive — standard OpenAI clients ignore the new cost_usd field.
No wire-protocol bump.
No provider-side change required — the field is only emitted when the provider already supplies cost_usd (anthropic, openai, and chat-completions providers all do today).

Companion PRs

~~microsoft/amplifier-module-provider-anthropic#59~~ — closed (catalog approach was redundant with existing NDJSON flow)
microsoft/amplifier-app-opencode#2 — surfaces per-model limit in the opencode config (was always available in /v1/models; was stripped for paranoia)

Captures the chat-completions HTTP face (#65) and model-routing matrix integration (#64). New `amplifier-agent serve chat-completions` exposes amplifier-agent as an OpenAI-compatible HTTP service for embedding in third-party tools (opencode, custom UIs). New `amplifier-agent auth` subcommand persists provider credentials to ~/.amplifier-agent/credentials.json so users can configure once and have every invocation pick them up. Wire protocol unchanged at 0.3.0; no wrapper bump required. TypeScript wrapper stays at 0.7.0, Python wrapper stays at 0.3.0. See CHANGELOG.md for full details. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Provider modules already compute cost_usd per turn and emit it on the JSON-RPC NDJSON wire via hook_streaming (amplifier_agent_lib/bundle/hook_streaming.py). The chat-completions HTTP face was throwing this away — extracting only token counts from kernel usage events. Now it lifts cost_usd through too, accumulating across sub-turns (a single user turn can drive multiple LLM calls for tool-call rounds) and emitting the total in the OpenAI usage envelope as the non-standard `cost_usd` field. Real $$ from the provider's own pricing, surfaced on the SSE response. Implementation -------------- - `_event_translator.extract_usage`: widened return type from `dict[str, int]` to `dict[str, Any]`; reads `event['cost']` (set by hook_streaming from kernel `cost_usd`) and stamps it on the result as `cost_usd: str(...)`. - `_wire._build_usage_block / stop_chunk / tool_calls_stop_chunk`: new `cost_usd: str | None = None` parameter; surfaced on the usage block when set. - `routes/chat_completions`: accumulates `usage_cost: Decimal | None` across all usage events in the turn (preserves precision), serializes to str on emission, passes through to the terminal chunk helpers. cost_usd is a string (Decimal precision) and is omitted entirely when no provider emitted it (older provider modules, third-party endpoints without cost telemetry, etc.) — standard OpenAI clients ignore the non-standard field. Verified end-to-end against the running server: a one-word reply with claude-haiku-4-5 produces `cost_usd: '0.0118625'` in the terminal chunk's usage block. This is a wire translation — it leverages cost telemetry that already flows on the NDJSON wire. No new pricing catalog or provider-side plumbing required. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Manoj Prabhakar Paidiparthy and others added 2 commits June 20, 2026 19:29

manojp99 changed the title ~~feat(http): surface per-model pricing in /v1/models and per-turn cost in chat-completions usage~~ feat(http): per-turn cost_usd in /v1/chat/completions usage Jun 21, 2026

manojp99 force-pushed the feat/cost-in-http-face branch from 9344217 to bccb5f3 Compare June 21, 2026 17:57

manojp99 mentioned this pull request Jun 21, 2026

feat: surface per-model cost and limit from static catalog microsoft/amplifier-app-opencode#2

Merged

manojp99 merged commit 782d132 into main Jun 21, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(http): per-turn cost_usd in /v1/chat/completions usage#68

feat(http): per-turn cost_usd in /v1/chat/completions usage#68
manojp99 merged 2 commits into
mainfrom
feat/cost-in-http-face

manojp99 commented Jun 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

manojp99 commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the wire looks like now

Implementation

Verified end-to-end

Compatibility

Companion PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

manojp99 commented Jun 21, 2026 •

edited

Loading