feat(messages): add native Anthropic Messages API (/v1/messages)#5386
feat(messages): add native Anthropic Messages API (/v1/messages)#5386cdoern wants to merge 6 commits intollamastack:mainfrom
Conversation
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-openapi studio · code · diff
|
✅ llama-stack-client-python studio · conflict
Your SDK build had at least one new note diagnostic, which is a regression from the base state.
New diagnostics (2 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages/count_tokens`
✅ llama-stack-client-node studio · conflict
Your SDK build had at least one new note diagnostic, which is a regression from the base state.
New diagnostics (2 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /v1/messages/count_tokens`
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-04-01 15:46:43 UTC
|
going to add integration tests here too since ollama is compatible |
|
This pull request has merge conflicts that must be resolved before it can be merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Add the API layer for the Anthropic Messages API (/v1/messages). This includes the Messages protocol definition, Pydantic models for all Anthropic request/response types (content blocks, streaming events, tool use, thinking), and FastAPI routes with Anthropic-specific SSE streaming format. Also registers the "messages" logging category and adds Api.messages to the Api enum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
…ive passthrough Add the single BuiltinMessagesImpl provider that translates Anthropic Messages format to/from OpenAI Chat Completions, delegating to the inference API. For providers that natively support /v1/messages (e.g. Ollama), requests are forwarded directly without translation. Also registers the provider in the registry, wires the router in the server, and adds Messages to the protocol map in the resolver. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
…ions Add the messages provider (inline::builtin) to the starter distribution template and regenerate configs for starter and ci-tests distributions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add 17 unit tests covering request translation, response translation, and streaming translation. Regenerate OpenAPI specs, provider docs, and Stainless SDK config to include the new /v1/messages endpoints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add a new messages integration test suite that exercises the Anthropic Messages API (/v1/messages) end-to-end through the server. The suite includes 13 tests covering non-streaming, streaming, system prompts, multi-turn conversations, tool definitions, tool use round trips, content block arrays, error handling, and response headers. To enable replay mode (no live backend required), extend the api_recorder to patch httpx.AsyncClient.post and httpx.AsyncClient.stream. This captures the native Ollama passthrough requests the Messages provider makes via raw httpx, following the same pattern used for aiohttp rerank recording. Recordings are stored in tests/integration/messages/recordings/. Also fix pre-commit violations: structured logging in impl.py, unused loop variable, and remove redundant @pytest.mark.asyncio decorators from unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
…pruned The cleanup_recordings.py script uses ci_matrix.json to determine which test suites are active. Without the messages suite listed, the script considers all messages recordings unused and deletes them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
Summary
/v1/messagesendpoint implementing the Anthropic Messages API, enabling llama-stack to serve as a drop-in backend for Claude Code, Codex CLI, and other Anthropic-protocol clientsinline::builtinprovider (BuiltinMessagesImpl) that depends onApi.inferenceand works with all inference backends automatically/v1/messages(e.g. Ollama), requests are forwarded directly without translation, preserving full fidelity (thinking blocks, native streaming, etc.)What's included
src/llama_stack_api/messages/): Protocol, Pydantic models for all Anthropic types (content blocks, streaming events, tool use, thinking), FastAPI routes with Anthropic-specific named SSE eventssrc/llama_stack/providers/inline/messages/): Translation layer (request/response/streaming) + native passthrough for OllamaTranslation map
system(top-level)messages[0]role=systemtool_useblocktool_callson assistant msgtool_resultblockrole: "tool"messagetool_choice: "any"tool_choice: "required"stop_sequencesstopstop_reason: "end_turn"finish_reason: "stop"stop_reason: "tool_use"finish_reason: "tool_calls"Test plan
uv run pytest tests/unit/providers/inline/messages/ -x --tb=short -v(17/17 passing)uv run pre-commit run mypy --all-files(passes)Generated with Claude Code