Skip to content

[Anthropic] Support system role messages inside messages array#44283

Merged
sfeng33 merged 1 commit into
vllm-project:mainfrom
chaunceyjiang:anthropic_system_messages
Jun 2, 2026
Merged

[Anthropic] Support system role messages inside messages array#44283
sfeng33 merged 1 commit into
vllm-project:mainfrom
chaunceyjiang:anthropic_system_messages

Conversation

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang commented Jun 2, 2026

Purpose

[Anthropic] Support system role messages inside messages array

FIX #44000

Test Result

before
image

after
image


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Co-Authored-By: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-Authored-By: Ang Kah Min, Kelvin <syraxius@hotmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@aleksandaryanakiev
Copy link
Copy Markdown
Contributor

This looks better, I'm closing my PR as it's not needed anymore

@chaunceyjiang
Copy link
Copy Markdown
Collaborator Author

/cc @DarkLight1337 @sfeng33 PTAL.

Copy link
Copy Markdown
Collaborator

@sfeng33 sfeng33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@sfeng33 sfeng33 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 2, 2026
@sfeng33 sfeng33 enabled auto-merge (squash) June 2, 2026 16:20
@sfeng33 sfeng33 merged commit ed9a752 into vllm-project:main Jun 2, 2026
51 checks passed
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Jun 4, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
andakai pushed a commit to andakai/vllm that referenced this pull request Jun 4, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
system_parts.append(block.text)

# System messages embedded inside the messages array
for msg in anthropic_request.messages:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew @potatosalad I'm a bit concerned about the system role fix. It seems like merging a mid-conversation system:role message into a single system message could cause issues with KV-cache hits. In multi-turn conversations, this would likely change the prefix, potentially hurting cache reuse.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have also observed this issue. The fix here is not correct. I am trying a new solution.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang
OK, I also have an idea here. Later, I will prepare a Merge Request for you. You can check if it meets your requirements.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

felix0080 added a commit to felix0080/vllm that referenced this pull request Jun 5, 2026
…ching

PR vllm-project#44283 merged all inline system:role messages into a single leading
system message, which changes the conversation prefix and breaks
KV-cache hits in multi-turn dialogues.

This fix keeps inline system messages at their original position:

- Remove inline system extraction from _convert_system_message (only
  top-level system is handled there)
- In _convert_messages, handle system messages with a dedicated
  _extract_system_text helper that strips billing headers and only
  emits the message if real content exists — avoiding the
  _convert_block / _convert_message_content path which does not strip
  billing headers and may omit the "content" key
- Add tests for billing header stripping on inline system messages

Unlike vllm-project#44048 which moves the same merge logic to the protocol layer,
this approach fundamentally avoids the prefix-breaking merge entirely.

Co-authored-by: Hermes Agent
@felix0080
Copy link
Copy Markdown

felix0080 commented Jun 5, 2026

I noticed the prefix caching concern discussed here. I opened #44602 with an alternative approach that preserves inline role: system messages at their original position instead of merging them into the leading system message, so the conversation prefix structure stays intact for KV-cache hits. This also handles x-anthropic-billing-header stripping consistently for both top-level and inline system messages. @chaunceyjiang

felix0080 added a commit to felix0080/vllm that referenced this pull request Jun 5, 2026
…ching

PR vllm-project#44283 merged all inline system:role messages into a single leading
system message, which changes the conversation prefix and breaks
KV-cache hits in multi-turn dialogues.

This fix keeps inline system messages at their original position:

- Remove inline system extraction from _convert_system_message (only
  top-level system is handled there)
- In _convert_messages, handle system messages with a dedicated
  _extract_system_text helper that strips billing headers and only
  emits the message if real content exists — avoiding the
  _convert_block / _convert_message_content path which does not strip
  billing headers and may omit the "content" key
- Add tests for billing header stripping on inline system messages

Unlike vllm-project#44048 which moves the same merge logic to the protocol layer,
this approach fundamentally avoids the prefix-breaking merge entirely.

Co-authored-by: Hermes Agent
Signed-off-by: felix0080 <felix0080@users.noreply.github.com>
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
Signed-off-by: JisoLya <523420504@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Claude Code CLI >= 2.1.154 sends ctx/msg/system roles and breaks vLLM Anthropic Messages API validation

4 participants