gh-4325: Enhance cache management for Anthropic API by introducing per-message TTL and configurable content block usage optimization. #4342

adase11 · 2025-09-09T00:51:27Z

Summary:

Introduces a structured prompt caching API for Anthropic via AnthropicCacheOptions, applies cache_control deterministically, adds per-message TTLs (5m/1h), content-length eligibility, and clarifies tool caching behavior. Updates docs and adds tests to validate wire format, headers, and limits.

API

AnthropicChatOptions: adds cacheOptions(AnthropicCacheOptions); default remains no caching.
AnthropicCacheOptions: strategy (NONE, SYSTEM_ONLY, SYSTEM_AND_TOOLS, CONVERSATION_HISTORY); per-message TTLs via AnthropicCacheTtl (FIVE_MINUTES, ONE_HOUR);
messageTypeMinContentLength; contentLengthFunction for custom token estimates.
AnthropicChatModel: applies cache_control only when eligible; caches the last tool definition; never caches the latest user question; uses array system format when caching;
auto-sets Anthropic beta header for 1h TTL.
AnthropicApi: models cache_control on content blocks and tools; exposes Usage cacheCreationInputTokens and cacheReadInputTokens.
Utilities: CacheEligibilityResolver (with CacheBreakpointTracker) enforces Anthropic’s 4-breakpoint limit.

Tests

Unit: AnthropicCacheOptionsTests; CacheEligibilityResolverTests; AnthropicPromptCachingMockTest (wire format, TTL beta header, 4-breakpoint limit).
IT: AnthropicPromptCachingIT (guarded by ANTHROPIC_API_KEY; validates real usage fields when available).

Documentation

Updates spring-ai-docs Anthropic page to use cacheOptions with strategy examples.
Adds per-message TTL examples (1h) and notes automatic beta header.
Adds eligibility guidance (min lengths, custom contentLengthFunction).
Clarifies tool caching (last tool definition) and that latest user message is not cached in conversation history.

Compatibility:

Default behavior unchanged (caching disabled unless cacheOptions provided).
Prior doc examples using cacheStrategy/cacheTtl are replaced with cacheOptions; docs updated accordingly.

Closes: #4325

adase11 · 2025-09-09T01:05:18Z

I'm aware of https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#mixing-different-ttls where

You can use both 1-hour and 5-minute cache controls in the same request, but with an important constraint: Cache entries with longer TTL must appear before shorter TTLs (i.e., a 1-hour cache entry must appear before any 5-minute cache entries).

I din't make any attempt to enforce that as, in my opinion, it's up to the user to configure that properly. (And also not worth the complexity). I probably could have mentioned that in the documentation though.

adase11 · 2025-09-09T01:07:11Z

...s/spring-ai-anthropic/src/main/java/org/springframework/ai/anthropic/AnthropicChatModel.java

 						List<ContentBlock> mediaContent = userMessage.getMedia().stream().map(media -> {
 							Type contentBlockType = getContentBlockTypeByMedia(media);
 							var source = getSourceByMedia(media);
 							return new ContentBlock(contentBlockType, source);


Technically these can be cached too - maybe as a next step including these content blocks. https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached

sobychacko · 2025-09-09T01:09:49Z

Thanks for the PR! We will start reviewing it soon.

sobychacko · 2025-09-12T04:09:50Z

@adase11 Can you add your name as an author to the classes you changed?

I have a couple of questions regarding the messageTypeMinContentLengths in AnthropicCacheOptions. Why are we defaulting to the min content length of 1? Aren't the model requirements much higher for minimum tokens? 1024 and 2048? Also, content length function as it stands, does not map 1-to-1 with the number of tokens. I know the function is flexible so that more sophisticated token count mechanisms can be injected, but still I wonder if using the basic content length is the way to go there? But more importantly, my concern is with the global min requirements in the map. Since we default to 1 and since its unlikely users may override that, we may end up with a situation where we try to cache contents with shorter length than stipulated by the models. You have a check for length < this.messageTypeMinContentLengths.get(messageType) in CacheEligibilityResolver in which you return null if true. Imagine a content with 200 length. That check will fail because the content is longer than the minimum default of 1, which then ends up being cached and thus wasting the break point, right? Please correct me if I am wrong.

Another general question, how does token count requirements work in the case of tool segments?

Thanks!

adase11 · 2025-09-12T10:55:10Z

@sobychacko Thanks! I'll add my name as the author to the classes I changed. With regard to your questions:

Why are we defaulting to the min content length of 1? Aren't the model requirements much higher for minimum tokens? 1024 and 2048? Also, content length function as it stands, does not map 1-to-1 with the number of tokens.

I didn't want to be out-of-the-gate prescriptive to users about a minimum, so rather than guessing at a default that will continue to be valid for Anthropic's API spec I wanted to leave that up to the user to decide. And plus, like you said String text length doesn't map one-to-one with token length, so rather than guess at the right minimum string content I thought it was easier to just allow everything by default and then let users tweak their configurations to fit their specific use case.

Also, content length function as it stands, does not map 1-to-1 with the number of tokens. I know the function is flexible so that more sophisticated token count mechanisms can be injected, but still I wonder if using the basic content length is the way to go there?

Definitely understand what you're saying, I wanted to take the least intrusive / overhead approach as possible by default. My anticipation is that for a vast majority of the time a rough approximation will be sufficient. And then for the times when its not - if that means introducing a more sophisticated token approximation function that probably means much more overhead (the most robust way would be to actually introduce a tokenizer) than I would want to add to users without their explicit opt-in.

Since we default to 1 and since its unlikely users may override that, we may end up with a situation where we try to cache contents with shorter length than stipulated by the models. You have a check for length < this.messageTypeMinContentLengths.get(messageType) in CacheEligibilityResolver in which you return null if true. Imagine a content with 200 length. That check will fail because the content is longer than the minimum default of 1, which then ends up being cached and thus wasting the break point, right? Please correct me if I am wrong.

That's correct, and I believe it reflects the current behavior of the prompt caching (where users can optimize a bit using the different strategies). The goal of my PR is partly to address this behavior in order to give users more control over what is attempted to be cached (optimizing their allotted cache blocks). However, my intent is to let users opt into this optimization rather than enforcing it by default, since user contexts vary widely. Any default I set risks being a poor fit—or worse, masking inefficiencies until usage scales (e.g., small initial workloads seem fine under defaults, but performance degrades as volume grows).

I chose 1 instead of 0 because the Anthropic API doesn’t allow caching of empty text blocks (docs).

Another general question, how does token count requirements work in the case of tool segments?

For Tool definitions I took the easy way out, I think I should have documented better but I left the comment in there that addresses it Tool definition messages are always considered for caching if the strategy includes system messages..

Let me know what you think and I'm happy to make any changes. I don't feel exceptionally strongly about the default min to 1 vs something like 1000 so i'm happy to change that and adjust documentation if you like.

adase11 · 2025-09-12T11:13:20Z

@sobychacko - I went ahead and added the author tag as well as updated the documentation to talk about how the tool definitions are handled.

sobychacko · 2025-09-12T14:33:43Z

@adase11 Sounds good to me. I will let @markpollack take a look before we can proceed with the PR.

adase11 · 2025-09-12T14:38:55Z

Thanks!

markpollack · 2025-09-17T18:25:05Z

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/anthropic-chat.adoc

-            .cacheTtl("1h")  // 1-hour cache lifetime
+            .cacheOptions(AnthropicCacheOptions.builder()
+                .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
+                .messageTypeTtls(MessageType.SYSTEM, AnthropicCacheTtl.ONE_HOUR)


i think just 'ttl' as a builder name is cleaner

sounds good

sobychacko · 2025-09-18T17:43:31Z

...s/spring-ai-anthropic/src/main/java/org/springframework/ai/anthropic/AnthropicChatModel.java

 				}
 				else {
-					contents.add(new ContentBlock(message.getText()));
+					contentBlocks.add(cacheAwareContentBlock(contentBlock, messageType, cacheEligibilityResolver));


@adase11 Doesn't this make all the user messages in the request get cached? The first if conditional checks if it's the last user message and then skips adding the cache, and in the else, you just add it to the cache as long as it's a user message (based on the outer if condition). Isn't that wrong though? I think we only need to add the cache control to the last user message, right?

You are correct, thanks

Do you mean it is not correct right now as it stands in this PR?

Yes, it's currently incorrect, I should have only attempted to cache the last user message. And I can check that content size based on the last 20 user messages https://docs.claude.com/en/docs/build-with-claude/prompt-caching#continuing-a-multi-turn-conversation

Ok. Are you going to update the PR?

Yeah I see

Cache the entire conversation history up to (but not including) the current user question. This is ideal for multi-turn conversations where you want to reuse the conversation context while asking new questions.

on the documentation on AnthropicCacheStrategy.CONVERSATION_HISTORY - I think from the Anthropic docs that we could be caching including the last user message. I'll make an update to keep with the old logic - only apply to the next to last user message - and then if we're in agreement that actually the last one is ok to make eligible for caching that's an easy change to make.

@sobychacko - updated, and added some ITs in AnthropicPromptCachingIT for better coverage. If we do decide that all user messages are eligible including the last one that would make things a little simpler. But for now I left that logic the same and just considered the eligible (according to Anthropic) content when lookin at the content size (i.e. the prior 20 content blocks https://docs.claude.com/en/docs/build-with-claude/prompt-caching#when-to-use-multiple-breakpoints).

I could have gotten more sophisticated and attempted to add additional breakpoints if there are > 20 messages in the conversation history. I chose not to consider that for the moment in order to keep things simple but if you prefer me to do that I can.

Let's review your changes with Mark first and we can proceed from there. Thanks!

sounds good

…racking spring-projectsgh-4325: Enhance cache management for Anthropic API by introudicing per-message TTL and configurable content block usage optimization. Signed-off-by: Austin Dase <[email protected]>

markpollack · 2025-09-22T19:05:45Z

thanks! it is merged and will be in 1.1 M2. I do want to experiement a bit more to fine tune.

merged in 1d5ab9b

adase11 force-pushed the gh-4325 branch from bc361de to e92c720 Compare September 9, 2025 00:52

adase11 commented Sep 9, 2025

View reviewed changes

adase11 mentioned this pull request Sep 9, 2025

Make Anthropic prompt caching message-type aware (TTLs, eligibility, min-size) and optimize cache-block usage #4325

Open

adase11 force-pushed the gh-4325 branch from e92c720 to 79c7f12 Compare September 12, 2025 11:12

markpollack reviewed Sep 17, 2025

View reviewed changes

adase11 force-pushed the gh-4325 branch from 79c7f12 to fa121af Compare September 18, 2025 00:32

sobychacko reviewed Sep 18, 2025

View reviewed changes

feat: Implement cache management for Anthropic API with eligibility t…

55c977e

…racking spring-projectsgh-4325: Enhance cache management for Anthropic API by introudicing per-message TTL and configurable content block usage optimization. Signed-off-by: Austin Dase <[email protected]>

adase11 force-pushed the gh-4325 branch from fa121af to 55c977e Compare September 18, 2025 20:55

markpollack closed this Sep 22, 2025

gh-4325: Enhance cache management for Anthropic API by introducing per-message TTL and configurable content block usage optimization. #4342

gh-4325: Enhance cache management for Anthropic API by introducing per-message TTL and configurable content block usage optimization. #4342

Uh oh!

Conversation

adase11 commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

API

Tests

Documentation

Compatibility:

Uh oh!

adase11 commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sobychacko commented Sep 9, 2025

Uh oh!

sobychacko commented Sep 12, 2025

Uh oh!

adase11 commented Sep 12, 2025

Uh oh!

adase11 commented Sep 12, 2025

Uh oh!

sobychacko commented Sep 12, 2025

Uh oh!

adase11 commented Sep 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markpollack commented Sep 22, 2025

Uh oh!

Uh oh!

adase11 commented Sep 9, 2025 •

edited

Loading

adase11 commented Sep 9, 2025 •

edited

Loading