fix(llm/messages): normalize Claude Code in-conversation system messages#2015
fix(llm/messages): normalize Claude Code in-conversation system messages#2015yalindogusahin wants to merge 1 commit into
Conversation
…ges into top-level system Signed-off-by: Yalın Şahin <yalinsahin1@gmail.com>
2a8c517 to
53e2538
Compare
|
Yes actually I've tried two things before I decided to go with the merging into system (sacrificing the positional part).
Both ways completely messed up the models behavior (I was testing with Qwen series) so I decided to go with the the merging. I've inspected Claude's midway system messages - they're mainly to inject skills information, which is completely fine to be in system for now. |
|
Thanks @Syraxius. Anything you'd change in this PR? Naming, normalization placement? |
|
I am not 100% convinced this is the right path. The This seems more like something for vllm to fix, as they are no longer implementing Anthropic Messages API but "Something almost like anthropic messages API". I could be way off on this, didn't get a chance to research it much |
|
I ran Claude Code 2.1.158 through a logging proxy in front of api.anthropic.com to see what the real API does. The live That said, I'm honestly not 100% sure whether vLLM or agentgateway is the right place to handle it either. The case for vLLM accepting it is real, since api.anthropic.com does. The other side is that agentgateway's own typed conversion (messages -> openai) also trips on this before any backend is involved. Curious what you think with that context. |
Summary
Claude Code 2.1.157+ injects in-conversation system reminders inside the Anthropic
messagesarray (messages[*].role == "system"), which is not part of the Anthropic Messages API spec —systemis a top-level field, andmessages[].roleis limited touser/assistant. Strict spec implementations like vLLM's Anthropic endpoint reject the payload before request conversion (see vllm-project/vllm#44048), and the same payload also fails agentgateway's typedRoleenum during messages → completions / bedrock translation.This PR normalizes the request at the proxy entry point so every downstream conversion path sees a canonical Anthropic shape:
messages[*].role == "system"entries frommessagessystemfield, preserving any existing top-level system content firstChanges
crates/agentgateway/src/llm/types/messages.rs: addRequest::normalize_system_messages()method handling both string and arrayContentBlockforms; merges extracted content into aTextBlock::Arrayafter existing top-level system. 6 new unit tests cover extraction, preservation order, array content blocks, no-op paths, and a regression test that exercises the actualcompletions::from_messages::translatecall with a Claude Code 2.1.157+ shaped payload.crates/agentgateway/src/llm/mod.rs: callnormalize_system_messages()insideLLM::process_messages_requestafter the request body is parsed and beforeprocess_requestdispatch. Single chokepoint covers OpenAI, Bedrock, Vertex, and Anthropic-native passthrough.crates/agentgateway/src/llm/conversion/completions.rs,crates/agentgateway/src/llm/conversion/bedrock.rs, andRequestType::to_anthropicin messages.rs: defensivenormalize_system_messages()calls in the direct library entry points so callers that bypass the proxy entry point (tests, embedders) get the same compatibility.Trade-off (out of scope here)
The merge loses positional information: every in-conversation system block is appended after existing top-level system content. Preserving position would require provider-aware translation — OpenAI chat completions natively supports multiple
systemmessages mid-conversation, Bedrock could usesystemblocks, Anthropic native cannot. A future opt-in flag (e.g.preserve_in_conversation_system) could expose this for backends that benefit from it. For now, normalizing into the standard Anthropicsystemfield matches the same approach vLLM took in vllm-project/vllm#44048 and gives the least surprising baseline behavior.Test Plan
cargo build -p agentgateway(passes locally)cargo test -p agentgateway llm::types::messages::normalize_system_messages_tests— 6/6 passed locally