-
Notifications
You must be signed in to change notification settings - Fork 1.9k
gh-4325: Enhance cache management for Anthropic API by introducing per-message TTL and configurable content block usage optimization. #4342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm aware of https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#mixing-different-ttls where
I din't make any attempt to enforce that as, in my opinion, it's up to the user to configure that properly. (And also not worth the complexity). I probably could have mentioned that in the documentation though. |
List<ContentBlock> mediaContent = userMessage.getMedia().stream().map(media -> { | ||
Type contentBlockType = getContentBlockTypeByMedia(media); | ||
var source = getSourceByMedia(media); | ||
return new ContentBlock(contentBlockType, source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically these can be cached too - maybe as a next step including these content blocks. https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached
Thanks for the PR! We will start reviewing it soon. |
@adase11 Can you add your name as an author to the classes you changed? I have a couple of questions regarding the Another general question, how does token count requirements work in the case of tool segments? Thanks! |
@sobychacko Thanks! I'll add my name as the author to the classes I changed. With regard to your questions:
I didn't want to be out-of-the-gate prescriptive to users about a minimum, so rather than guessing at a default that will continue to be valid for Anthropic's API spec I wanted to leave that up to the user to decide. And plus, like you said
Definitely understand what you're saying, I wanted to take the least intrusive / overhead approach as possible by default. My anticipation is that for a vast majority of the time a rough approximation will be sufficient. And then for the times when its not - if that means introducing a more sophisticated token approximation function that probably means much more overhead (the most robust way would be to actually introduce a tokenizer) than I would want to add to users without their explicit opt-in.
That's correct, and I believe it reflects the current behavior of the prompt caching (where users can optimize a bit using the different strategies). The goal of my PR is partly to address this behavior in order to give users more control over what is attempted to be cached (optimizing their allotted cache blocks). However, my intent is to let users opt into this optimization rather than enforcing it by default, since user contexts vary widely. Any default I set risks being a poor fit—or worse, masking inefficiencies until usage scales (e.g., small initial workloads seem fine under defaults, but performance degrades as volume grows). I chose 1 instead of 0 because the Anthropic API doesn’t allow caching of empty text blocks (docs).
For Tool definitions I took the easy way out, I think I should have documented better but I left the comment in there that addresses it Let me know what you think and I'm happy to make any changes. I don't feel exceptionally strongly about the default min to |
@sobychacko - I went ahead and added the author tag as well as updated the documentation to talk about how the tool definitions are handled. |
@adase11 Sounds good to me. I will let @markpollack take a look before we can proceed with the PR. |
Thanks! |
.cacheTtl("1h") // 1-hour cache lifetime | ||
.cacheOptions(AnthropicCacheOptions.builder() | ||
.strategy(AnthropicCacheStrategy.SYSTEM_ONLY) | ||
.messageTypeTtls(MessageType.SYSTEM, AnthropicCacheTtl.ONE_HOUR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think just 'ttl' as a builder name is cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
else { | ||
contents.add(new ContentBlock(message.getText())); | ||
contentBlocks.add(cacheAwareContentBlock(contentBlock, messageType, cacheEligibilityResolver)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adase11 Doesn't this make all the user messages in the request get cached? The first if conditional checks if it's the last user message and then skips adding the cache, and in the else, you just add it to the cache as long as it's a user message (based on the outer if condition). Isn't that wrong though? I think we only need to add the cache control to the last user message, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean it is not correct right now as it stands in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's currently incorrect, I should have only attempted to cache the last user message. And I can check that content size based on the last 20 user messages https://docs.claude.com/en/docs/build-with-claude/prompt-caching#continuing-a-multi-turn-conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Are you going to update the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see
Cache the entire conversation history up to (but not including) the current user question. This is ideal for multi-turn conversations where you want to reuse the conversation context while asking new questions.
on the documentation on AnthropicCacheStrategy.CONVERSATION_HISTORY
- I think from the Anthropic docs that we could be caching including the last user message. I'll make an update to keep with the old logic - only apply to the next to last user message - and then if we're in agreement that actually the last one is ok to make eligible for caching that's an easy change to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sobychacko - updated, and added some ITs in AnthropicPromptCachingIT
for better coverage. If we do decide that all user messages are eligible including the last one that would make things a little simpler. But for now I left that logic the same and just considered the eligible (according to Anthropic) content when lookin at the content size (i.e. the prior 20 content blocks https://docs.claude.com/en/docs/build-with-claude/prompt-caching#when-to-use-multiple-breakpoints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have gotten more sophisticated and attempted to add additional breakpoints if there are > 20 messages in the conversation history. I chose not to consider that for the moment in order to keep things simple but if you prefer me to do that I can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's review your changes with Mark first and we can proceed from there. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
…racking spring-projectsgh-4325: Enhance cache management for Anthropic API by introudicing per-message TTL and configurable content block usage optimization. Signed-off-by: Austin Dase <[email protected]>
thanks! it is merged and will be in 1.1 M2. I do want to experiement a bit more to fine tune. merged in 1d5ab9b |
Summary:
Introduces a structured prompt caching API for Anthropic via AnthropicCacheOptions, applies cache_control deterministically, adds per-message TTLs (5m/1h), content-length eligibility, and clarifies tool caching behavior. Updates docs and adds tests to validate wire format, headers, and limits.
API
messageTypeMinContentLength; contentLengthFunction for custom token estimates.
auto-sets Anthropic beta header for 1h TTL.
Tests
Documentation
contentLengthFunction
).Compatibility:
Closes: #4325