Skip to content

toolCallStreaming is not functioning as expected with Bedrock or Google models. #5544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DayuanJiang opened this issue Apr 4, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@DayuanJiang
Copy link

Description

While toolCallStreaming works correctly with openai("gpt-4o"), when I switch to google("gemini-2.0-flash-001") or bedrock('anthropic.claude-3-5-sonnet-20241022-v2:0'), toolInvocation.args does not stream. It appears to output only after all tokens have been gathered.

However, regular messages stream without issue.

You can try at the demo site: https://next-ai-draw-io.vercel.app/
And here is the repo: https://github.com/DayuanJiang/next-ai-draw-io

Here is my route.ts’ code:

const result = streamText({
    // model: google("gemini-2.5-pro-exp-03-25"),
    // model: google("gemini-2.0-flash-001"),
    model: bedrock('anthropic.claude-3-5-sonnet-20241022-v2:0'),
    // model: openai("gpt-4o"),
    toolCallStreaming: true,
    messages: enhancedMessages,
    tools: {
      // Client-side tool that will be executed on the client
      display_diagram: {
        description: `...,
        parameters: z.object({
          xml: z.string().describe("XML string to be displayed on draw.io")
        })
      },
    },
    temperature: 0,
  });
  return result.toDataStreamResponse({

  });

And here is client's code to render the tool info.

    const renderToolInvocation = (toolInvocation: any) => {
        const callId = toolInvocation.toolCallId;
        const { toolName, args, state } = toolInvocation;
        handleDisplayChart(args?.xml);

        return (
            <div
                key={callId}
                className="p-4 my-2 text-gray-500 border border-gray-300 rounded"
            >
                <div className="flex flex-col gap-2">
                    <div className="text-xs">Tool: display_diagram</div>
                    {args && (
                        <div className="mt-1 font-mono text-xs overflow-hidden">
                            {typeof args === "object" &&
                                Object.keys(args).length > 0 &&
                                `Args: ${JSON.stringify(args, null, 2)}`}
                        </div>
                    )}
                    <div className="mt-2 text-sm">
                        {state === "partial-call" ? (
                            <div className="h-4 w-4 border-2 border-primary border-t-transparent rounded-full animate-spin" />
                        ) : state === "result" ? (
                            <div className="text-green-600">
                                Diagram generated
                            </div>
                        ) : null}
                    </div>
                </div>
            </div>
        );
    };

Code example

No response

AI provider

No response

Additional context

No response

@DayuanJiang DayuanJiang added the bug Something isn't working label Apr 4, 2025
@techjason
Copy link

same issue here - gemini tool call always returns in a batch and can timeout and simply aborts if tool call takes too long

@techjason
Copy link

Hey, I am wondering have you managed to figure out tool streaming? Thanks in advance!

@iteratetograceness
Copy link
Collaborator

Hi @DayuanJiang (your app look awesome btw!) and @techjason! Looks like for Gemini, this is happening at the provider level; they send back a single chunk where the tool args have been completely resolved. I'm digging through Gemini docs + discussions to see if this is intended behavior.

For now: if it's crucial for your app to display the tool args streaming in, I would recommend sticking to openai or test using generateText w/ stream simulation middleware! I haven't tested the latter myself but can follow up here when I try.

@iteratetograceness
Copy link
Collaborator

iteratetograceness commented Apr 14, 2025

I have yet to find explicit documentation that the args are always sent in a single chunk. It seems when streaming, the structured JSON object that's returned often returns as a single chunk; see this diagram + description here:

It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object.

But as for the following:

can timeout and simply aborts if tool call takes too long

@techjason can you further contextualize your issue? A code snippet would help to better understand what's happening (e.g. are you executing the tool client-side, or are you passing an execute to the server-side tool?).

@DayuanJiang
Copy link
Author

@iteratetograceness Thank you for your investigation. I previously suspected the issue likely stemmed from the Provider's side. Additionally, I think it's best to correct the document, as it currently shows that all Google models are compatible with the Tool Streaming feature.

https://sdk.vercel.ai/providers/ai-sdk-providers/google-generative-ai#model-capabilities

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants