[Anthropic][Frontend] auto-extract system messages from messages array#43959
[Anthropic][Frontend] auto-extract system messages from messages array#43959aleksandaryanakiev wants to merge 4 commits into
Conversation
Signed-off-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Signed-off-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
|
Instead of extracting and converting in a validator, could we simply allow I believe _convert_messages in anthropic/serving.py would then already handle turning that into a system message for Chat Completions. |
It's possible, I thought about that before I did this fix, my only concern with it is that the message will leak into the conversation (not sure how Claude Code will handle it) |
|
Hmm - with just allowing I don't see any additional system turns accumulating. You can see how the original top level |
|
This is the system message that is in the messages array: { I think it should be in the system prompts, and I think your way of doing it (with allowing |
|
Ahh my Some models, such as Qwen 3 variants, will complain if there is any system message not in the first position. But, there are other PRs open to handle that more generally, as we also have that problem with Responses API and Codex CLI usage with those models so the Messages API case here is no different. |
|
Hmm - testing my simple proposed fix, I don't think having system messages interleaved after user messages is going to work for typical open weight models. A non-trivial number of them actually expect all system messages to be in the very first message, and that gets treated differently by the chat template with special tokens for the system message that won't be present for later system messages. |
Claude Code sends the system prompt as messages with role "system" in the messages array instead of the top-level system field. Extract them during Pydantic validation and merge into the system field so models that expect system content before user messages work correctly. Based on vllm-project#43959. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>
bbrowning
left a comment
There was a problem hiding this comment.
After testing this in the real-world, your original proposed fix is better than my suggestion. I was getting some odd trajectories with a self-hosted Nemotron 3 Super model and Claude Code with my simpler suggestion, while your fix of hoisting all of these into the original system looks more stable. I haven't had time to do a full before/after eval, but given that we're returning a 400 error today with the latest Claude Code will defer that until later.
I pulled this locally and confirmed the new unit tests pass as well as tested in a live server and things look reasonable. Thanks!
sfeng33
left a comment
There was a problem hiding this comment.
Nice fix! One gap I noticed: AnthropicCountTokensRequest in the same file has the same messages: list[AnthropicMessage] field, so it would hit the same Pydantic validation error if a client sends system messages in the messages array to /v1/messages/count_tokens.
I think sth like this can be done:
def _extract_system_from_messages(request_body: dict) -> dict:
# ... existing logic ...
class AnthropicMessagesRequest(BaseModel):
@model_validator(mode="before")
@classmethod
def extract_system_messages(cls, v):
return _extract_system_from_messages(v)
class AnthropicCountTokensRequest(BaseModel):
@model_validator(mode="before")
@classmethod
def extract_system_messages(cls, v):
return _extract_system_from_messages(v)
Signed-off-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
4c4e126 to
7532441
Compare
There was a problem hiding this comment.
In Claude Code v2.1.156, the CLI puts a system message in the
messages array instead of the top-level system array.
I captured the request sent by CC. Both the system message and the top-level system array are present at the same time.
{
"model": "Qwen/Qwen3.5-27B-FP8",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "<system-reminder>\n.....</system-reminder>\n\n"
},
{
"type": "text",
"text": "help?",
"cache_control": {
"type": "ephemeral"
}
}
]
},
{
"role": "system",
"content": "....."
}
],
"system": [
{
"type": "text",
"text": "x-anthropic-billing-header: cc_version=2.1.160.bca; cc_entrypoint=cli; cch=d1d48;"
},
{
"type": "text",
"text": "You are Claude Code, Anthropic's official CLI for Claude.",
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "....",
"cache_control": {
"type": "ephemeral"
}
}
],
"tools": []
}
Just a question:
Why not handle it in def _convert_system_message?
|
@aleksandaryanakiev Thanks for this PR. I feel that this implementation is not very clear in terms of semantics. I have submitted a new PR #44283 and added you as a Co-Authored-By. Could you take a look? I think this new implementation should be cleaner. |
Claude Code sends the system prompt as messages with role "system" in the messages array instead of the top-level system field. Extract them during Pydantic validation and merge into the system field so models that expect system content before user messages work correctly. Based on vllm-project#43959. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>
Claude Code sends the system prompt as messages with role "system" in the messages array instead of the top-level system field. Extract them during Pydantic validation and merge into the system field so models that expect system content before user messages work correctly. Based on vllm-project#43959. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>
Purpose
Fix: Auto-extract system messages from Anthropic messages array
In Claude Code v2.1.156, the CLI puts a system message in the
messagesarray instead of the top-levelsystemarray. This causes a Pydantic validation error:Input should be 'user' or 'assistant'
This PR adds a
model_validator(mode="before")toAnthropicMessagesRequestthat silently extractssystem messages from the
messagesarray and moves them to thesystemfield, making vLLM morecompatible with this change.
Test Plan
I tested it on our machine running vLLM with this change on CC version v2.1.156, and it's working properly. Also I created 6 tests and ran them with:
pytest tests/entrypoints/anthropic/test_anthropic_messages_conversion.pyTest Result
The 400 BAD REQUEST error is gone, and everything is working as expected
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.