fix(kiro): fix image lost after tool call and OpenAI tool_calls format#597
fix(kiro): fix image lost after tool call and OpenAI tool_calls format#597wuhua111 wants to merge 1 commit into
Conversation
## Problem When using Kiro provider with multimodal (image) input and tool calling: 1. Image is lost after tool call - AI says 'I cannot see the image' in the second LLM request (after tool execution) 2. OpenAI format tool_calls (content: null + tool_calls array) not handled, causing 500 error on second-round conversations ## Root Causes ### Image lost after tool call (3 locations) 1. System prompt handler merges first user message using getContentText(), which only extracts text - images in the first user message are dropped 2. History loop only handles Anthropic image format (type:'image'), ignores OpenAI format (type:'image_url') 3. currentMessage handler same issue as justlovemaki#2 ### tool_calls 500 error - LangChain sends assistant messages with content:null + tool_calls array (OpenAI format), but buildCodewhispererRequest only handled Anthropic tool_use blocks inside content array ## Fix 1. System prompt + first user message merge: extract images from first user message and attach to userInputMessage.images 2. History loop user message: add image_url handler alongside image handler 3. currentMessage: add image_url handler alongside image handler 4. Assistant message in history: handle OpenAI tool_calls array, convert to Kiro toolUses format 5. _sanitizeToolInput: handle string input (parse JSON, fallback single-quote) ## Reproduction `python # test_img_tool.py - minimal reproduction from openai import OpenAI client = OpenAI(api_key=KEY, base_url=KIRO_URL) tools = [{'type':'function','function':{'name':'get_time',...}}] messages = [{'role':'user','content':[ {'type':'image_url','image_url':{'url':'data:image/png;base64,...'}}, {'type':'text','text':'Call get_time then describe the image'} ]}] # First request: AI calls tool, image present resp = client.chat.completions.create(model=MODEL, messages=messages, tools=tools) # Append tool result and send second request messages.append(resp.choices[0].message) messages.append({'role':'tool','tool_call_id':tc.id,'content':'2026-01-01'}) resp2 = client.chat.completions.create(model=MODEL, messages=messages, tools=tools) # BUG: resp2 says 'I cannot see the image' even though messages[0] still has it # FIXED: resp2 now correctly describes the image ` ## Test - Test 1 (带工具+图片,不触发工具): image recognized correctly - Test 2 (带工具+图片,触发工具后): image still recognized correctly after tool execution
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41177c26bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const args = tc.function?.arguments; | ||
| if (typeof args === 'string') { | ||
| // 尝试解析 JSON,支持单引号格式 | ||
| input = JSON.parse(args.replace(/'/g, '"')); |
There was a problem hiding this comment.
Parse tool-call arguments before normalizing quotes
This path rewrites every single quote in tc.function.arguments before parsing, which breaks valid JSON that contains apostrophes inside string values (for example {"query":"what's the weather"}). In that case JSON.parse throws, the code falls back to { raw_arguments: ... }, and the downstream tool input no longer matches the expected schema, so tool execution can fail whenever arguments include apostrophes.
Useful? React with 👍 / 👎.
|
|
||
| // 调试:打印发给 Kiro 的请求体 | ||
| logger.info('[Kiro Debug] Request history length: ' + (requestData.conversationState?.history?.length || 0)); | ||
| const historyStr = JSON.stringify(requestData.conversationState?.history || [], null, 2); |
There was a problem hiding this comment.
Avoid serializing full history for debug logging
This unconditionally JSON.stringifys the entire history object before truncating the log output, so large conversations (especially with base64 image payloads) still incur full serialization CPU/memory cost on every streaming request. In multimodal or long-thread traffic this can materially increase latency and memory pressure even though only the first 3000 characters are logged.
Useful? React with 👍 / 👎.
|
System prompt + first user message merge: extract images from first user message and attach to userInputMessage.images What if the image isn't in the first item? Regarding the other points, no one has reported any related issues. |
Problem
When using Kiro provider with multimodal (image) input and tool calling:
Root Causes
Image lost after tool call (3 locations)
tool_calls 500 error
Fix
Reproduction
`python
test_img_tool.py - minimal reproduction
from openai import OpenAI
client = OpenAI(api_key=KEY, base_url=KIRO_URL)
tools = [{'type':'function','function':{'name':'get_time',...}}] messages = [{'role':'user','content':[
{'type':'image_url','image_url':{'url':'data:image/png;base64,...'}},
{'type':'text','text':'Call get_time then describe the image'}
]}]
First request: AI calls tool, image present
resp = client.chat.completions.create(model=MODEL, messages=messages, tools=tools) # Append tool result and send second request
messages.append(resp.choices[0].message)
messages.append({'role':'tool','tool_call_id':tc.id,'content':'2026-01-01'}) resp2 = client.chat.completions.create(model=MODEL, messages=messages, tools=tools) # BUG: resp2 says 'I cannot see the image' even though messages[0] still has it # FIXED: resp2 now correctly describes the image
`
Test