[Anthropic] Support system role messages inside messages array#44283
Conversation
Co-Authored-By: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-Authored-By: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
This looks better, I'm closing my PR as it's not needed anymore |
|
/cc @DarkLight1337 @sfeng33 PTAL. |
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
| system_parts.append(block.text) | ||
|
|
||
| # System messages embedded inside the messages array | ||
| for msg in anthropic_request.messages: |
There was a problem hiding this comment.
@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew @potatosalad I'm a bit concerned about the system role fix. It seems like merging a mid-conversation system:role message into a single system message could cause issues with KV-cache hits. In multi-turn conversations, this would likely change the prefix, potentially hurting cache reuse.
There was a problem hiding this comment.
Yes, I have also observed this issue. The fix here is not correct. I am trying a new solution.
There was a problem hiding this comment.
@chaunceyjiang
OK, I also have an idea here. Later, I will prepare a Merge Request for you. You can check if it meets your requirements.
There was a problem hiding this comment.
@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew
please check this merge request: https://github.com/vllm-project/vllm/pull/44602/changes
…ching PR vllm-project#44283 merged all inline system:role messages into a single leading system message, which changes the conversation prefix and breaks KV-cache hits in multi-turn dialogues. This fix keeps inline system messages at their original position: - Remove inline system extraction from _convert_system_message (only top-level system is handled there) - In _convert_messages, handle system messages with a dedicated _extract_system_text helper that strips billing headers and only emits the message if real content exists — avoiding the _convert_block / _convert_message_content path which does not strip billing headers and may omit the "content" key - Add tests for billing header stripping on inline system messages Unlike vllm-project#44048 which moves the same merge logic to the protocol layer, this approach fundamentally avoids the prefix-breaking merge entirely. Co-authored-by: Hermes Agent
|
I noticed the prefix caching concern discussed here. I opened #44602 with an alternative approach that preserves inline |
…ching PR vllm-project#44283 merged all inline system:role messages into a single leading system message, which changes the conversation prefix and breaks KV-cache hits in multi-turn dialogues. This fix keeps inline system messages at their original position: - Remove inline system extraction from _convert_system_message (only top-level system is handled there) - In _convert_messages, handle system messages with a dedicated _extract_system_text helper that strips billing headers and only emits the message if real content exists — avoiding the _convert_block / _convert_message_content path which does not strip billing headers and may omit the "content" key - Add tests for billing header stripping on inline system messages Unlike vllm-project#44048 which moves the same merge logic to the protocol layer, this approach fundamentally avoids the prefix-breaking merge entirely. Co-authored-by: Hermes Agent Signed-off-by: felix0080 <felix0080@users.noreply.github.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: JisoLya <523420504@qq.com>
Purpose
[Anthropic] Support system role messages inside messages array
FIX #44000
Test Result
before

after

Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)