Skip to content

feat(messages): add native Anthropic Messages API (/v1/messages)#5386

Draft
cdoern wants to merge 6 commits intollamastack:mainfrom
cdoern:messages-api
Draft

feat(messages): add native Anthropic Messages API (/v1/messages)#5386
cdoern wants to merge 6 commits intollamastack:mainfrom
cdoern:messages-api

Conversation

@cdoern
Copy link
Copy Markdown
Collaborator

@cdoern cdoern commented Mar 30, 2026

Summary

  • Adds a native /v1/messages endpoint implementing the Anthropic Messages API, enabling llama-stack to serve as a drop-in backend for Claude Code, Codex CLI, and other Anthropic-protocol clients
  • Follows the same architecture as the Responses API: a single inline::builtin provider (BuiltinMessagesImpl) that depends on Api.inference and works with all inference backends automatically
  • For providers that natively support /v1/messages (e.g. Ollama), requests are forwarded directly without translation, preserving full fidelity (thinking blocks, native streaming, etc.)
  • For all other providers, translates Anthropic Messages format to/from OpenAI Chat Completions format transparently

What's included

  1. API layer (src/llama_stack_api/messages/): Protocol, Pydantic models for all Anthropic types (content blocks, streaming events, tool use, thinking), FastAPI routes with Anthropic-specific named SSE events
  2. Provider implementation (src/llama_stack/providers/inline/messages/): Translation layer (request/response/streaming) + native passthrough for Ollama
  3. Distribution configs: Enabled in starter and ci-tests distributions
  4. Tests: 17 unit tests covering request translation, response translation, and streaming translation
  5. Generated artifacts: OpenAPI specs, provider docs, Stainless SDK config

Translation map

Anthropic OpenAI Notes
system (top-level) messages[0] role=system Moved to first message
Content blocks (text, image) String or content parts Restructured
tool_use block tool_calls on assistant msg Different structure
tool_result block role: "tool" message Different message type
tool_choice: "any" tool_choice: "required" Renamed
stop_sequences stop Renamed
stop_reason: "end_turn" finish_reason: "stop" Mapped
stop_reason: "tool_use" finish_reason: "tool_calls" Mapped
Streaming content blocks Streaming deltas Full event sequence

Test plan

  • uv run pytest tests/unit/providers/inline/messages/ -x --tb=short -v (17/17 passing)
  • uv run pre-commit run mypy --all-files (passes)
  • Manual end-to-end test: non-streaming via Ollama (translation path)
  • Manual end-to-end test: non-streaming via Ollama (native passthrough)
  • Manual end-to-end test: streaming via Ollama (native passthrough, including thinking blocks)
  • Integration tests with recording/replay

Generated with Claude Code

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

feat(messages): add native Anthropic Messages API (/v1/messages)

Edit this comment to update it. It will appear in the SDK's changelogs.

llama-stack-client-openapi studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️

New diagnostics (2 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages`
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages/count_tokens`
⚠️ llama-stack-client-go studio · conflict

Your SDK build had at least one new warning diagnostic, which is a regression from the base state.

New diagnostics (2 warning)
⚠️ Endpoint/NotConfigured: `post /v1/messages` exists in the OpenAPI spec, but isn't specified in the Stainless config, so code will not be generated for it.
⚠️ Endpoint/NotConfigured: `post /v1/messages/count_tokens` exists in the OpenAPI spec, but isn't specified in the Stainless config, so code will not be generated for it.
llama-stack-client-python studio · conflict

Your SDK build had at least one new note diagnostic, which is a regression from the base state.

New diagnostics (2 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages`
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages/count_tokens`
llama-stack-client-node studio · conflict

Your SDK build had at least one new note diagnostic, which is a regression from the base state.

New diagnostics (2 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages`
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages/count_tokens`

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-04-01 15:46:43 UTC

@cdoern
Copy link
Copy Markdown
Collaborator Author

cdoern commented Mar 31, 2026

going to add integration tests here too since ollama is compatible

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 1, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 1, 2026
@cdoern cdoern added this to the 1.0.0 milestone Apr 1, 2026
@mergify mergify bot removed the needs-rebase label Apr 1, 2026
cdoern and others added 6 commits April 1, 2026 11:44
Add the API layer for the Anthropic Messages API (/v1/messages). This
includes the Messages protocol definition, Pydantic models for all
Anthropic request/response types (content blocks, streaming events,
tool use, thinking), and FastAPI routes with Anthropic-specific SSE
streaming format. Also registers the "messages" logging category and
adds Api.messages to the Api enum.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
…ive passthrough

Add the single BuiltinMessagesImpl provider that translates Anthropic
Messages format to/from OpenAI Chat Completions, delegating to the
inference API. For providers that natively support /v1/messages (e.g.
Ollama), requests are forwarded directly without translation. Also
registers the provider in the registry, wires the router in the server,
and adds Messages to the protocol map in the resolver.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
…ions

Add the messages provider (inline::builtin) to the starter distribution
template and regenerate configs for starter and ci-tests distributions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add 17 unit tests covering request translation, response translation,
and streaming translation. Regenerate OpenAPI specs, provider docs, and
Stainless SDK config to include the new /v1/messages endpoints.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add a new messages integration test suite that exercises the Anthropic
Messages API (/v1/messages) end-to-end through the server. The suite
includes 13 tests covering non-streaming, streaming, system prompts,
multi-turn conversations, tool definitions, tool use round trips,
content block arrays, error handling, and response headers.

To enable replay mode (no live backend required), extend the api_recorder
to patch httpx.AsyncClient.post and httpx.AsyncClient.stream. This
captures the native Ollama passthrough requests the Messages provider
makes via raw httpx, following the same pattern used for aiohttp rerank
recording. Recordings are stored in tests/integration/messages/recordings/.

Also fix pre-commit violations: structured logging in impl.py, unused
loop variable, and remove redundant @pytest.mark.asyncio decorators
from unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
…pruned

The cleanup_recordings.py script uses ci_matrix.json to determine which
test suites are active. Without the messages suite listed, the script
considers all messages recordings unused and deletes them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant