Skip to content

fix(kiro): fix image lost after tool call and OpenAI tool_calls format#597

Open
wuhua111 wants to merge 1 commit into
justlovemaki:mainfrom
wuhua111:fix/kiro-image-lost-after-tool-call
Open

fix(kiro): fix image lost after tool call and OpenAI tool_calls format#597
wuhua111 wants to merge 1 commit into
justlovemaki:mainfrom
wuhua111:fix/kiro-image-lost-after-tool-call

Conversation

@wuhua111
Copy link
Copy Markdown

Problem

When using Kiro provider with multimodal (image) input and tool calling:

  1. Image is lost after tool call - AI says 'I cannot see the image' in the second LLM request (after tool execution)
  2. OpenAI format tool_calls (content: null + tool_calls array) not handled, causing 500 error on second-round conversations

Root Causes

Image lost after tool call (3 locations)

  1. System prompt handler merges first user message using getContentText(), which only extracts text - images in the first user message are dropped
  2. History loop only handles Anthropic image format (type:'image'), ignores OpenAI format (type:'image_url')
  3. currentMessage handler same issue as 终端的授权url授权失败 #2

tool_calls 500 error

  • LangChain sends assistant messages with content:null + tool_calls array (OpenAI format), but buildCodewhispererRequest only handled Anthropic tool_use blocks inside content array

Fix

  1. System prompt + first user message merge: extract images from first user message and attach to userInputMessage.images
  2. History loop user message: add image_url handler alongside image handler
  3. currentMessage: add image_url handler alongside image handler
  4. Assistant message in history: handle OpenAI tool_calls array, convert to Kiro toolUses format
  5. _sanitizeToolInput: handle string input (parse JSON, fallback single-quote)

Reproduction

`python

test_img_tool.py - minimal reproduction

from openai import OpenAI
client = OpenAI(api_key=KEY, base_url=KIRO_URL)
tools = [{'type':'function','function':{'name':'get_time',...}}] messages = [{'role':'user','content':[
{'type':'image_url','image_url':{'url':'data:image/png;base64,...'}},
{'type':'text','text':'Call get_time then describe the image'}
]}]

First request: AI calls tool, image present

resp = client.chat.completions.create(model=MODEL, messages=messages, tools=tools) # Append tool result and send second request
messages.append(resp.choices[0].message)
messages.append({'role':'tool','tool_call_id':tc.id,'content':'2026-01-01'}) resp2 = client.chat.completions.create(model=MODEL, messages=messages, tools=tools) # BUG: resp2 says 'I cannot see the image' even though messages[0] still has it # FIXED: resp2 now correctly describes the image
`

Test

  • Test 1 (带工具+图片,不触发工具): image recognized correctly
  • Test 2 (带工具+图片,触发工具后): image still recognized correctly after tool execution

## Problem

When using Kiro provider with multimodal (image) input and tool calling:
1. Image is lost after tool call - AI says 'I cannot see the image' in the
   second LLM request (after tool execution)
2. OpenAI format tool_calls (content: null + tool_calls array) not handled,
   causing 500 error on second-round conversations

## Root Causes

### Image lost after tool call (3 locations)
1. System prompt handler merges first user message using getContentText(),
   which only extracts text - images in the first user message are dropped
2. History loop only handles Anthropic image format (type:'image'),
   ignores OpenAI format (type:'image_url')
3. currentMessage handler same issue as justlovemaki#2

### tool_calls 500 error
- LangChain sends assistant messages with content:null + tool_calls array
  (OpenAI format), but buildCodewhispererRequest only handled Anthropic
  tool_use blocks inside content array

## Fix

1. System prompt + first user message merge: extract images from first user
   message and attach to userInputMessage.images
2. History loop user message: add image_url handler alongside image handler
3. currentMessage: add image_url handler alongside image handler
4. Assistant message in history: handle OpenAI tool_calls array, convert to
   Kiro toolUses format
5. _sanitizeToolInput: handle string input (parse JSON, fallback single-quote)

## Reproduction

`python
# test_img_tool.py - minimal reproduction
from openai import OpenAI
client = OpenAI(api_key=KEY, base_url=KIRO_URL)
tools = [{'type':'function','function':{'name':'get_time',...}}]
messages = [{'role':'user','content':[
    {'type':'image_url','image_url':{'url':'data:image/png;base64,...'}},
    {'type':'text','text':'Call get_time then describe the image'}
]}]
# First request: AI calls tool, image present
resp = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
# Append tool result and send second request
messages.append(resp.choices[0].message)
messages.append({'role':'tool','tool_call_id':tc.id,'content':'2026-01-01'})
resp2 = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
# BUG: resp2 says 'I cannot see the image' even though messages[0] still has it
# FIXED: resp2 now correctly describes the image
`

## Test

- Test 1 (带工具+图片,不触发工具): image recognized correctly
- Test 2 (带工具+图片,触发工具后): image still recognized correctly after
  tool execution
@justlovemaki
Copy link
Copy Markdown
Owner

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41177c26bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const args = tc.function?.arguments;
if (typeof args === 'string') {
// 尝试解析 JSON,支持单引号格式
input = JSON.parse(args.replace(/'/g, '"'));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse tool-call arguments before normalizing quotes

This path rewrites every single quote in tc.function.arguments before parsing, which breaks valid JSON that contains apostrophes inside string values (for example {"query":"what's the weather"}). In that case JSON.parse throws, the code falls back to { raw_arguments: ... }, and the downstream tool input no longer matches the expected schema, so tool execution can fail whenever arguments include apostrophes.

Useful? React with 👍 / 👎.


// 调试:打印发给 Kiro 的请求体
logger.info('[Kiro Debug] Request history length: ' + (requestData.conversationState?.history?.length || 0));
const historyStr = JSON.stringify(requestData.conversationState?.history || [], null, 2);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid serializing full history for debug logging

This unconditionally JSON.stringifys the entire history object before truncating the log output, so large conversations (especially with base64 image payloads) still incur full serialization CPU/memory cost on every streaming request. In multimodal or long-thread traffic this can materially increase latency and memory pressure even though only the first 3000 characters are logged.

Useful? React with 👍 / 👎.

@justlovemaki
Copy link
Copy Markdown
Owner

System prompt + first user message merge: extract images from first user message and attach to userInputMessage.images

What if the image isn't in the first item?
The program has logic to retain images from the 5 most recent historical messages.

Regarding the other points, no one has reported any related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants