Fix Merge tool results into single message to avoid per-user rate-limit interruption#184
Open
aalinyu wants to merge 3 commits into
Open
Fix Merge tool results into single message to avoid per-user rate-limit interruption#184aalinyu wants to merge 3 commits into
aalinyu wants to merge 3 commits into
Conversation
Two fixes for the WeChat channel outbound delivery: 1. sendMessage now parses the API response body and throws on non-zero ret/errcode, catching silent failures where the HTTP request succeeds but the WeChat backend returns an error. 2. Tool result payloads sent as chat-visible messages are truncated to 120 chars (non-error only). Previously, web_search results of 5000+ chars were echoed verbatim, flooding the chat and pushing the actual AI reply out of view. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… limiting Instead of sending each tool result as an individual WeChat message (which triggers rate limiting after ~10 rapid-fire sends), buffer all tool results and merge them into one summary message. Non-tool messages (final conclusions, etc.) are still sent individually. Also removes the per-delivery retry and throttle mechanisms that were added earlier — with only ~2 messages per turn instead of ~10, rate limiting is no longer a concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
I am having the similar issue: if a request triggers 2-3 tools call, the final summary message can be delivered to wechat successfully. But if a request triggers ~10 tools call, the final summary message can NOT be delivered. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Buffer and merge per-tool-result messages into a single delivery to stay
within the WeChat API per-user rate limit, preventing session interruption.
Problem
The WeChat Bot API enforces a per-user rate limit: when the same user
receives more than 10 messages within a 60-second window, the API
rejects further sends and the session is interrupted.
This was triggered during AI agent turns that involved multiple tool
calls. Each tool result (search, browse, compute, etc.) was dispatched
as an individual
sendMessageAPI call. A typical complex query couldproduce 8–10 tool results in rapid succession, exhausting the per-minute
budget before the final AI conclusion could be delivered.
Two additional issues compounded the problem:
API errors were silently ignored — the
sendMessagefunctionpreviously discarded the API response body, so rate-limit rejections
went undetected. Response validation (added in an earlier commit)
surfaced the errors but couldn't prevent them.
Dispatch aborted on failure — the
delivercallback re-threw anysend error, causing the framework's reply dispatcher to stop entirely.
The final AI conclusion (which always arrives after all tool results)
was frequently lost.
Per-message retry and throttle were attempted but only added latency
without addressing the root cause: the per-user, per-minute message
budget was being consumed by individual tool result echoes.
Solution
Intercept and merge tool results into a single summary message,
eliminating the burst pattern that triggers rate limiting.
The
delivercallback now checksdeliveryCtx.kind:(no API call).
merged summary, then send this message.
finallyblock) → final flush for any remainingbuffered results.
The merged message is formatted as a step list:
⚙ 执行步骤 (3 步):
搜索:重庆四代宅近三年价格…
浏览:正在获取网页内容…
计算:价格分析完成…
Other changes:
send.ts— with only ~2 messages perturn instead of ~10, the per-minute budget is no longer a concern.
without aborting the dispatch, so one bad message doesn't block the
rest.
Effect