-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
Bug description
The AzureOpenAiChatModel
does not support the retry mechanism. According to the documentation, this feature should already be supported. However, after configuring the following parameters, retries still do not occur:
spring.ai.retry.on-client-errors=true
spring.ai.retry.on-http-codes=400,408,429,500,502,503,504
Environment
- Spring AI version: 1.0.0-M5
- Java version: 21
Steps to reproduce
- Configure the retry settings in your application properties file as follows:
spring.ai.retry.on-client-errors=true spring.ai.retry.on-http-codes=400,408,429,500,502,503,504
- Invoke the
AzureOpenAiChatModel
under conditions that would trigger a retry (e.g., by forcing an HTTP 400 or other configured codes). - Observe that no retry occurs.
Expected behavior
The AzureOpenAiChatModel
should attempt to retry the request according to the specified retry configuration parameters.
Error Logs
com.azure.core.exception.HttpResponseException: Status code 400, "{"error":{"inner_error":{"code":"ResponsibleAIPolicyViolation","content_filter_results":{"sexual":{"filtered":true,"severity":"high"},"violence":{"filtered":false,"severity":"safe"},"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"}}},"code":"content_filter","message":"The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: \r\nhttps://go.microsoft.com/fwlink/?linkid=2198766.","param":"prompt","type":null}}"
at com.azure.core.implementation.http.rest.RestProxyBase.instantiateUnexpectedException(RestProxyBase.java:388)
at com.azure.core.implementation.http.rest.SyncRestProxy.ensureExpectedStatus(SyncRestProxy.java:133)
at com.azure.core.implementation.http.rest.SyncRestProxy.handleRestReturnType(SyncRestProxy.java:211)
at com.azure.core.implementation.http.rest.SyncRestProxy.invoke(SyncRestProxy.java:86)
at com.azure.core.implementation.http.rest.RestProxyBase.invoke(RestProxyBase.java:124)
at com.azure.core.http.rest.RestProxy.invoke(RestProxy.java:95)
at jdk.proxy2/jdk.proxy2.$Proxy196.getChatCompletionsSync(Unknown Source)
at com.azure.ai.openai.implementation.OpenAIClientImpl.getChatCompletionsWithResponse(OpenAIClientImpl.java:1900)
at com.azure.ai.openai.OpenAIClient.getChatCompletionsWithResponse(OpenAIClient.java:350)
at com.azure.ai.openai.OpenAIClient.getChatCompletions(OpenAIClient.java:760)
at org.springframework.ai.azure.openai.AzureOpenAiChatModel.lambda$internalCall$1(AzureOpenAiChatModel.java:244)
at io.micrometer.observation.Observation.observe(Observation.java:565)
at org.springframework.ai.azure.openai.AzureOpenAiChatModel.internalCall(AzureOpenAiChatModel.java:240)
at org.springframework.ai.azure.openai.AzureOpenAiChatModel.call(AzureOpenAiChatModel.java:226)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultChatClientRequestSpec$1.aroundCall(DefaultChatClient.java:675)
at org.springframework.ai.chat.client.advisor.DefaultAroundAdvisorChain.lambda$nextAroundCall$1(DefaultAroundAdvisorChain.java:98)
at io.micrometer.observation.Observation.observe(Observation.java:565)
at org.springframework.ai.chat.client.advisor.DefaultAroundAdvisorChain.nextAroundCall(DefaultAroundAdvisorChain.java:98)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.doGetChatResponse(DefaultChatClient.java:488)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.lambda$doGetObservableChatResponse$1(DefaultChatClient.java:477)
at io.micrometer.observation.Observation.observe(Observation.java:565)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.doGetObservableChatResponse(DefaultChatClient.java:477)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.doSingleWithBeanOutputConverter(DefaultChatClient.java:451)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.entity(DefaultChatClient.java:446)
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
Select code repository
Activity
markpollack commentedon Apr 30, 2025
i believe that we didn't add spring retry around the azure openai sdk, because there is already a feature in that SDK relating to retry. I can't seem to find the docs on that. Could someone confirm?
markpollack commentedon May 6, 2025
@mkheck any insight?
mkheck commentedon May 6, 2025
@markpollack I'll dig into it and let you know. If there is no Azure-specific mechanism, I should be able to wrap it with Spring Retry. More news shortly.
mkheck commentedon May 6, 2025
@markpollack Please go ahead and assign it to me. I'll look at the ones you've tagged me with for review and work through them.
iAMSagar44 commentedon May 13, 2025
Hi @markpollack / @mkheck ,
There is already a feature in the azure-sdk-for-java which retries for transient errors. I think the default retry count is 3.
Please check the
RetryPolicy.class
andExponentialBackoff.class
n the com.azure.core.http.policy package.I believe these are added to the
HttpPipelinePolicy
array in thecreateHttpPipeline()
method in theOpenAIClientBuilder.class
.Here is an example of the retry occurring for a 429 error.
1st Request -
2025-05-13T17:35:59.381+10:00 INFO 10148 --- [docs-ai-assistant] [oundedElastic-8] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP request","method":"POST","url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","tryCount":1,"content-length":26643}
1st Response - Failed -
2025-05-13T17:35:59.428+10:00 INFO 10148 --- [docs-ai-assistant] [-http-kqueue-13] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP response","statusCode":429,"url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","durationMs":46,"content-length":440,"content-length":440,"body":"{"error":{"code":"429","message": "Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2025-01-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 51 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit. For Free Account customers, upgrade to Pay as you Go here: https://aka.ms/429TrialUpgrade.\"}}"}
Retry attempt 1 - 2nd Request -
2025-05-13T17:36:50.434+10:00 INFO 10148 --- [docs-ai-assistant] [ parallel-5] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP request","method":"POST","url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","tryCount":2,"content-length":26643}
Success Response -
2025-05-13T17:36:51.354+10:00 INFO 10148 --- [docs-ai-assistant] [-http-kqueue-15] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP response","statusCode":200,"url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","durationMs":921}
I did not have any custom retry mechanism in my code.
You can see the logs in your application by setting this env variable -
export AZURE_HTTP_LOG_DETAIL_LEVEL=BODY
when using Azure OpenAI Chat models or embedding models or AI Search in your Spring AI application.Hope this analysis helps.