Python: Function Calling Error when Using APIM Managed Azure OpenAI #10182

sophialagerkranspandey · 2025-01-14T16:48:12Z

Discussed in #10176

^{Originally posted by awonglk January 14, 2025}
I've followed the article pointed by following issue to try to get my Semantic Kernel app working with APIM managed Azure OpenAI:
#7143

If there's no function calls involved, the responses back from LLM seems to work as normal.

But as soon as I ask a question that involves a plugin (even the core plugin time_plugin() as an example)

    kernel = Kernel()
    service_id = "function_calling"
    api_key = os.environ.get('AZURE_OPENAI_API_KEY')
    kernel.add_service(
        AzureChatCompletion(
            service_id=service_id,
            default_headers={'Ocp-Apim-SubscriptionKey': api_key}
        )
    )
    
    kernel.add_plugin(TimePlugin(), plugin_name="time")

This is what I get when asking a simple question like "What is the time?"

<class 'semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion.AzureChatCompletion'> service failed to complete the prompt\", BadRequestError('Error code: 400 - {\\'statusCode\\': 400, \\'message\\': \"Unable to parse and estimate tokens from incoming request. Please ensure incoming request does not contain any images and is of one of the following types: \\'Chat Completion\\', \\'Completion\\', \\'Embeddings\\' and works with current prompt estimation mode of \\'Auto\\'.\"}'))

Is there anything obvious that I may have missed?
Using semantic-kernel 1.16.0

The text was updated successfully, but these errors were encountered:

moonbox3 · 2025-01-14T22:36:36Z

A bad request means that something formed in the request body is not accepted. Can you please give us some more details about the request being made to the model?

awonglk · 2025-01-15T01:26:24Z

I am able to reproduce the issue with a very simple function call script below.
Fails when using APIM endpoint similar to: https://<our_apim_URL>/openai/deployments/chat/chat/completions?api-version=2024-08-01-preview
Works fine when using regular Azure OpenAI endpoint.

from semantic_kernel.kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.connectors.ai.open_ai.prompt_execution_settings.open_ai_prompt_execution_settings import OpenAIChatPromptExecutionSettings
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.functions.kernel_arguments import KernelArguments
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.core_plugins.time_plugin import TimePlugin

async def main():
    try:
        # Initialize kernel
        kernel = Kernel()
        
        # Configure Azure OpenAI service
        service_id = "function_calling"
        deployment_name = "chat"  # Your deployment name
        endpoint = "<endpoint>"  # your endpoint
        api_key = "<key>" # your key

        
        # Add Azure OpenAI service
        kernel.add_service(
            AzureChatCompletion(
                service_id=service_id,
                deployment_name=deployment_name,
                endpoint=endpoint,
                api_key=api_key,
                default_headers={'Ocp-Apim-SubscriptionKey': api_key}
            )
        )
        
        # Add TimePlugin
        kernel.add_plugin(TimePlugin(), plugin_name="time")
        
        # Create chat function with system message
        chat_function = kernel.add_function(
            prompt="""
            You are a helpful assistant. Use the available functions when appropriate.
            
            {{$chat_history}}
            Human: {{$user_input}}
            Assistant: Let me help you with that.
            """,
            plugin_name="ChatBot",
            function_name="Chat",
        )
        
        # Configure execution settings with function calling
        execution_settings = OpenAIChatPromptExecutionSettings(
            service_id=service_id,
            temperature=0.7,
            max_tokens=1000,
            function_choice_behavior=FunctionChoiceBehavior.Auto(
                filters={"included_plugins": ["time"]},
            ),
        )
        
        # Initialize chat history
        chat_history = ChatHistory()
        
        # Test message that should trigger function calling
        user_input = "What's the current time?"
        
        # Create arguments for the chat function
        arguments = KernelArguments(
            settings=execution_settings,
            user_input=user_input,
            chat_history=chat_history,
        )
        
        # Invoke the chat function
        print("Sending request to LLM...")
        response = await kernel.invoke(
            chat_function,
            return_function_results=False,
            arguments=arguments,
        )
        
        print("\nResponse:", str(response))
        
    except Exception as e:
        print(f"Error: {str(e)}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Let me know if there is anything else I need to provide.

moonbox3 · 2025-01-15T01:39:08Z

Hi @awonglk, thanks for the added context. Can you please make sure you're following the follow flow when using the APIM auth:

https://jordanbeandev.com/how-to-set-up-a-python-api-to-call-an-azure-openai-instance-that-is-behind-azure-api-management-and-requires-both-a-subscription-key-an-oauth2-access-token-using-semantic-kernel/

awonglk · 2025-01-15T01:50:29Z

The only thing I'm not following from that link is I'm not using "OnBehalfOfCredential".
Just using plain api_key, as our APIM is setup for api key authentication.

Is using OnBehalfOfCredential auth a must?

moonbox3 · 2025-01-15T02:01:09Z

Alright, I understand. Apologies to try and lead you down the wrong path. I'll need to figure out how to replicate your issue.

For further context, have you followed this step? Pass the API key in API requests - set-backend-service policy.

moonbox3 · 2025-01-15T02:04:04Z

@awonglk, based on this previous comment - a user was experiencing a function calling error using APIM in SK .Net -- could it be that your model you're calling doesn't support function calling?

#9016 (comment)

moonbox3 · 2025-01-15T02:16:05Z

Or this option as well: #8340 (comment)

AzureChatCompletion(
    api_key=`Value of Ocp-Apim-SubscriptionKey`
    endpoint=AZURE_OPENAI_ENDPOINT, # your endpoint, which doesn't include `deployments/gpt-4o/chat/completions?api-version=2024-02-15-preview`
    deployment_name=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME, # gpt-4o
    api_version='2024-02-15-preview'
    default_headers={'Ocp-Apim-SubscriptionKey': <value>}
}

awonglk · 2025-01-15T06:54:14Z

For your 3 x comments above:

[Pass the API key in API requests - set-backend-service policy]
We are using managed identity for the authentication between APIM and backend LLM instead of passing the LLM's API key
LLM Support for function calling
We are using gpt-4o (2024-05-13), which supports parallel function calling
Option for passing Ocp-Apim-SubscriptionKey header
I am already passing this default_headers in the code example above. I find that whether I have it or not, doesn't make a difference. The behaviour is still the same (when no function call involved, it works. But when function calling is involved, the error occurs)

moonbox3 · 2025-01-15T07:13:11Z

We are using gpt-4o (2024-05-13), which supports parallel function calling

Which Azure OpenAI API version are you using? For example, 2024-09-01-preview

I know APIM is a part of this, but the model is not failing with a 401/403, so it doesn't look like an auth issue.

The behaviour is still the same (when no function call involved, it works. But when function calling is involved, the error occurs

In the "it works" case, are you including tools in your request? Or are you removing tools?

awonglk · 2025-01-16T00:56:34Z

Regarding which Azure Open API version -> I'm using 2024-08-01-preview
https://<azure_open_ai_url>/openai/deployments/chat/chat/completions?api-version=2024-08-01-preview

Regarding "it works" case, I'm still passing the list of tools (as per simple script above). The only difference is the question I ask.

"what is the time?" -> invokes Time_plugin which fails with the error
"how are you? " -> does not invoke any plugins. Query succeeds.

fbinotto · 2025-01-16T03:20:30Z

I have also just updated the model to use version 2024-08-06, with same endpoint URL with same results.

fbinotto · 2025-01-17T01:54:19Z

This is the error log with debug enabled:

Tell me a joke
[2025-01-16 18:34:08 - httpx:1740 - INFO] HTTP Request: POST https://aiservices.xxxxxx.com/openaitest/openai/deployments/chat/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
[2025-01-16 18:34:08 - semantic_kernel.connectors.ai.open_ai.services.open_ai_handler:194 - INFO] OpenAI usage: CompletionUsage(completion_tokens=13, prompt_tokens=142, total_tokens=155, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
Why don't scientists trust atoms?

Because they make up everything!
User > what is the state of the lights?
[2025-01-16 18:35:11 - httpx:1740 - INFO] HTTP Request: POST https://aiservices.xxxxxxxx.com/openaitest/openai/deployments/chat/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
[2025-01-16 18:35:11 - semantic_kernel.connectors.ai.open_ai.services.open_ai_handler:194 - INFO] OpenAI usage: CompletionUsage(completion_tokens=13, prompt_tokens=170, total_tokens=183, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
[2025-01-16 18:35:11 - semantic_kernel.connectors.ai.chat_completion_client_base:157 - INFO] processing 1 tool calls in parallel.
[2025-01-16 18:35:11 - semantic_kernel.kernel:378 - INFO] Calling Lights-get_lights function with args: {}
[2025-01-16 18:35:11 - semantic_kernel.functions.kernel_function:19 - INFO] Function Lights-get_lights invoking.
[2025-01-16 18:35:11 - semantic_kernel.functions.kernel_function:29 - INFO] Function Lights-get_lights succeeded.
[2025-01-16 18:35:11 - semantic_kernel.functions.kernel_function:53 - INFO] Function completed. Duration: 0.000696s
[2025-01-16 18:35:11 - httpx:1740 - INFO] HTTP Request: POST https://aiservices.xxxxxxxxxxx.com/openaitest/openai/deployments/chat/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 400 Bad Request"
Raw response: ("<class 'semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion.AzureChatCompletion'> service failed to complete the prompt", BadRequestError('Error code: 400 - {'statusCode': 400, 'message': "Unable to parse and estimate tokens from incoming request. Please ensure incoming request does not contain any images and is of one of the following types: 'Chat Completion', 'Completion', 'Embeddings' and works with current prompt estimation mode of 'Auto'."}'))
Traceback (most recent call last):
File "C:\Users\xxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\semantic_kernel\connectors\ai\open_ai\services\open_ai_handler.py", line 87, in _send_completion_request
response = await self.client.chat.completions.create(**settings_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai\resources\chat\completions.py", line 1661, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1843, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1537, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1638, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'statusCode': 400, 'message': "Unable to parse and estimate tokens from incoming request. Please ensure incoming request does not contain any images and is of one of the following types: 'Chat Completion', 'Completion', 'Embeddings' and works with current prompt estimation mode of 'Auto'."}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "c:\Users\xxxxxxxxx\repos\sk\main.py", line 141, in
asyncio.run(main())
File "C:\Users\xxxxxxx\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxxx\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxx\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "c:\Users\xxxxxxx\repos\sk\main.py", line 88, in main
raise ex
File "c:\Users\xxxxxxx\repos\sk\main.py", line 80, in main
result = (await chat_completion.get_chat_message_contents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\semantic_kernel\connectors\ai\chat_completion_client_base.py", line 147, in get_chat_message_contents
completions = await self._inner_get_chat_message_contents(chat_history, settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\semantic_kernel\utils\telemetry\model_diagnostics\decorators.py", line 83, in wrapper_decorator
return await completion_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\semantic_kernel\connectors\ai\open_ai\services\open_ai_chat_completion_base.py", line 88, in _inner_get_chat_message_contents
response = await self._send_request(settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\semantic_kernel\connectors\ai\open_ai\services\open_ai_handler.py", line 59, in _send_request
return await self._send_completion_request(settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxxxxxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\semantic_kernel\connectors\ai\open_ai\services\open_ai_handler.py", line 99, in _send_completion_request
raise ServiceResponseException(
semantic_kernel.exceptions.service_exceptions.ServiceResponseException: ("<class 'semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion.AzureChatCompletion'> service failed to complete the prompt", BadRequestError('Error code: 400 - {'statusCode': 400, 'message': "Unable to parse and estimate tokens from incoming request. Please ensure incoming request does not contain any images and is of one of the following types: 'Chat Completion', 'Completion', 'Embeddings' and works with current prompt estimation mode of 'Auto'."}'))

moonbox3 · 2025-01-21T06:52:00Z

Hi @awonglk. @fbinotto and I have been working together to track down what could be going on. I recently created an APIM resource and hooked it up to my AOAI instance, and am able to perform function calling. @fbinotto was hitting the same error, as you can see above, and we were able to get him up and running with function calling (we're still trying to determine exactly what we changed on his side to fix things).

I can try and give a high level overview about what I did, as I didn't do too much to get things working:

I created the APIM resource in the portal, and then created a new Azure OpenAI Service API based on the March preview OpenAPI spec. I clicked a box to "improve SDK compatibility" which adds the display name to the base url.

There are no secrets here, so I will share this image.

Then, in Semantic Kernel, I am doing the following:

chat = AzureChatCompletion(
    service_id="apim",
    endpoint="https://apim-ev3.azure-api.net/evan-apim/",
    deployment_name="gpt-4o",
    api_version="2024-12-01-preview",
    # default_headers={"api-key": "my APIM key"},
    api_key="my APIM key",
)

One other config that I did was add my APIM managed identity to the AOAI resource's access control. But because I am using the "all access APIM API key" I don't think this step is necessary.

When I run a simple example that leverages the TimePlugin, I get the proper function result content passed to the model, and its response:

User:> What is the current hour?
Mosscap:> The current hour is 15, which is 3 PM for those more inclined to conventional timekeeping. Is there anything else you might need assistance with during this fine hour?

moonbox3 · 2025-01-21T08:28:02Z

@awonglk the issue is occurring due to the configured policy in APIM. @fbinotto will respond tomorrow morning (Australia time) with the true root cause. We'll get you unblocked shortly, if you aren't able to resolve it first.

fbinotto · 2025-01-22T06:28:17Z

Hi @awonglk , @moonbox3

The issue was related to APIM policy, more specifically the azure-openai-token-limit. We had different policies applied to different products.

Basically, the attribute estimate-prompt-tokens for that policy has to be set to false. If set to true, it throws the 400 error stating it is unable to parse and estimate tokens from incoming request.

When that attribute is set to true, the number of tokens is estimated based on the prompt schema in the API. When it is set to false, the tokens are calculated based on the actual token usage from the response of the model.

moonbox3 · 2025-01-23T05:25:27Z

Hi @awonglk, both @fbinotto and I are able to use function calling with the APIM resource. Have a look at our latest responses - especially @fbinotto that shows the underlying root cause. Since we provided the answer, I will close this issue. Please ping back if extra help is needed.

markwallace-microsoft added the triage label Jan 14, 2025

crickman added bug Something isn't working question Further information is requested python Pull requests for the Python Semantic Kernel labels Jan 14, 2025

crickman added this to Semantic Kernel Jan 14, 2025

github-actions bot changed the title ~~Function Calling Error when Using APIM Managed Azure OpenAI~~ Python: Function Calling Error when Using APIM Managed Azure OpenAI Jan 14, 2025

moonbox3 self-assigned this Jan 14, 2025

moonbox3 removed the triage label Jan 14, 2025

moonbox3 closed this as completed Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Function Calling Error when Using APIM Managed Azure OpenAI #10182

Python: Function Calling Error when Using APIM Managed Azure OpenAI #10182

sophialagerkranspandey commented Jan 14, 2025

moonbox3 commented Jan 14, 2025 •

edited

Loading

awonglk commented Jan 15, 2025 •

edited

Loading

moonbox3 commented Jan 15, 2025

awonglk commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

awonglk commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

awonglk commented Jan 16, 2025

fbinotto commented Jan 16, 2025

fbinotto commented Jan 17, 2025

moonbox3 commented Jan 21, 2025

moonbox3 commented Jan 21, 2025

fbinotto commented Jan 22, 2025

moonbox3 commented Jan 23, 2025

Python: Function Calling Error when Using APIM Managed Azure OpenAI #10182

Python: Function Calling Error when Using APIM Managed Azure OpenAI #10182

Comments

sophialagerkranspandey commented Jan 14, 2025

Discussed in #10176

moonbox3 commented Jan 14, 2025 • edited Loading

awonglk commented Jan 15, 2025 • edited Loading

moonbox3 commented Jan 15, 2025

awonglk commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

awonglk commented Jan 15, 2025

moonbox3 commented Jan 15, 2025

awonglk commented Jan 16, 2025

fbinotto commented Jan 16, 2025

fbinotto commented Jan 17, 2025

moonbox3 commented Jan 21, 2025

moonbox3 commented Jan 21, 2025

fbinotto commented Jan 22, 2025

moonbox3 commented Jan 23, 2025

moonbox3 commented Jan 14, 2025 •

edited

Loading

awonglk commented Jan 15, 2025 •

edited

Loading