-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Problem Summary
When using Claude Code with Gemini models via CLIProxyAPI, tool calls fail because the finish_reason is incorrectly overwritten during streaming responses.
What Happens
- Chunk 1: Contains
functionCall→ should result infinish_reason: "tool_calls" - Chunk 2: Contains
finishReason: "STOP"+ usage metadata → overwrites tofinish_reason: "stop"❌
Claude Code (and other clients) rely on finish_reason: "tool_calls" to detect when the model wants to use a tool. When it sees "stop", it thinks the conversation ended normally, breaking the tool call flow.
Reproduction
curl -X POST http://your-server:8317/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-pro-preview",
"messages": [{"role": "user", "content": "List files in current directory"}],
"tools": [{"type": "function", "function": {"name": "list_files", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}}}],
"stream": true
}'Expected: Final chunk has "finish_reason": "tool_calls"
Actual: Final chunk has "finish_reason": "stop"
Root Cause Analysis
In internal/translator/antigravity/openai/chat-completions/antigravity_openai_response.go, the hasFunctionCall variable is local to each chunk - there's no memory across chunks.
When chunk 2 arrives with finishReason: "STOP" but no functionCall, the code sets finish_reason based only on the current chunk, overwriting the correct value.
Current problematic flow:
Chunk 1: functionCall present → hasFunctionCall=true → sets finish_reason="tool_calls"
Chunk 2: no functionCall → hasFunctionCall=false → sets finish_reason="stop" (WRONG!)
Proposed Solution
Add explicit state tracking that persists across chunks:
1. Update the state struct (lines 22-28)
type convertCliResponseToOpenAIChatParams struct {
UnixTimestamp int64
FunctionIndex int
SawToolCall bool // NEW: Tracks if any tool call was seen in the entire stream
UpstreamFinishReason string // NEW: Caches the upstream finish reason for final chunk
}2. Cache finish_reason instead of setting it immediately (lines 83-86)
Before:
if finishReasonResult := gjson.GetBytes(rawJSON, "response.candidates.0.finishReason"); finishReasonResult.Exists() {
template, _ = sjson.Set(template, "choices.0.finish_reason", strings.ToLower(finishReasonResult.String()))
template, _ = sjson.Set(template, "choices.0.native_finish_reason", strings.ToLower(finishReasonResult.String()))
}After:
// Cache the finish reason - do NOT set it in output yet (will be set on final chunk)
if finishReasonResult := gjson.GetBytes(rawJSON, "response.candidates.0.finishReason"); finishReasonResult.Exists() {
(*param).(*convertCliResponseToOpenAIChatParams).UpstreamFinishReason = strings.ToUpper(finishReasonResult.String())
}3. Persist tool call detection across chunks (line 141)
When a function call is detected, set the persistent flag:
(*param).(*convertCliResponseToOpenAIChatParams).SawToolCall = true4. Only emit finish_reason on the final chunk (new logic at end of function)
// Determine finish_reason only on the final chunk (has both finishReason and usage metadata)
params := (*param).(*convertCliResponseToOpenAIChatParams)
upstreamFinishReason := params.UpstreamFinishReason
sawToolCall := params.SawToolCall
usageExists := gjson.GetBytes(rawJSON, "response.usageMetadata").Exists()
isFinalChunk := upstreamFinishReason != "" && usageExists
if isFinalChunk {
var finishReason string
if sawToolCall {
finishReason = "tool_calls"
} else if upstreamFinishReason == "MAX_TOKENS" {
finishReason = "max_tokens"
} else {
finishReason = "stop"
}
template, _ = sjson.Set(template, "choices.0.finish_reason", finishReason)
template, _ = sjson.Set(template, "choices.0.native_finish_reason", strings.ToLower(upstreamFinishReason))
}Priority Logic
| Condition | finish_reason |
|---|---|
| Any tool call seen in stream | "tool_calls" |
| No tool call + MAX_TOKENS | "max_tokens" |
| No tool call + STOP | "stop" |
Unit Tests
We wrote 5 unit tests that validate the fix:
- TestFinishReasonToolCallsNotOverwritten - Tool call in chunk 1, STOP in chunk 2 →
"tool_calls" - TestFinishReasonStopForNormalText - Normal text only →
"stop" - TestFinishReasonMaxTokens - MAX_TOKENS without tool calls →
"max_tokens" - TestToolCallTakesPriorityOverMaxTokens - Tool call + MAX_TOKENS →
"tool_calls" - TestNoFinishReasonOnIntermediateChunks - Intermediate chunks have null finish_reason
All tests pass locally.
Related PR
We attempted to submit this as PR #874, but it was blocked by the translator-path-guard CI check. The fix is ready and tested - we're requesting the maintenance team apply these changes.
Files to Modify
internal/translator/antigravity/openai/chat-completions/antigravity_openai_response.gointernal/translator/antigravity/openai/chat-completions/antigravity_openai_response_test.go(new file with tests)
Verification
After applying the fix, this test should pass:
curl -s -X POST http://localhost:8317/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-flash-preview",
"messages": [{"role": "user", "content": "List files"}],
"tools": [{"type": "function", "function": {"name": "list_files", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}}}],
"stream": true
}' | grep finish_reasonExpected output should show "finish_reason":"tool_calls" on the final chunk.
Impact
This bug affects all users of Claude Code (and similar clients) when using Gemini models through CLIProxyAPI. Tool calls appear to fail or get stuck because the client never sees finish_reason: "tool_calls".
Thank you for your time reviewing this issue!