Skip to content

Fix Merge tool results into single message to avoid per-user rate-limit interruption#184

Open
aalinyu wants to merge 3 commits into
Tencent:mainfrom
aalinyu:fix/merge-tool-results-single-message
Open

Fix Merge tool results into single message to avoid per-user rate-limit interruption#184
aalinyu wants to merge 3 commits into
Tencent:mainfrom
aalinyu:fix/merge-tool-results-single-message

Conversation

@aalinyu

@aalinyu aalinyu commented May 28, 2026

Copy link
Copy Markdown

Summary

Buffer and merge per-tool-result messages into a single delivery to stay
within the WeChat API per-user rate limit, preventing session interruption.

Problem

The WeChat Bot API enforces a per-user rate limit: when the same user
receives more than 10 messages within a 60-second window, the API
rejects further sends and the session is interrupted.

This was triggered during AI agent turns that involved multiple tool
calls. Each tool result (search, browse, compute, etc.) was dispatched
as an individual sendMessage API call. A typical complex query could
produce 8–10 tool results in rapid succession, exhausting the per-minute
budget before the final AI conclusion could be delivered.

Two additional issues compounded the problem:

  1. API errors were silently ignored — the sendMessage function
    previously discarded the API response body, so rate-limit rejections
    went undetected. Response validation (added in an earlier commit)
    surfaced the errors but couldn't prevent them.

  2. Dispatch aborted on failure — the deliver callback re-threw any
    send error, causing the framework's reply dispatcher to stop entirely.
    The final AI conclusion (which always arrives after all tool results)
    was frequently lost.

Per-message retry and throttle were attempted but only added latency
without addressing the root cause: the per-user, per-minute message
budget was being consumed by individual tool result echoes
.

Solution

Intercept and merge tool results into a single summary message,
eliminating the burst pattern that triggers rate limiting.

The deliver callback now checks deliveryCtx.kind:

  • Tool result → appended to an in-memory buffer, return immediately
    (no API call).
  • Non-tool message (e.g. final conclusion) → flush the buffer as one
    merged summary, then send this message.
  • Dispatch end (finally block) → final flush for any remaining
    buffered results.

The merged message is formatted as a step list:

⚙ 执行步骤 (3 步):

搜索:重庆四代宅近三年价格…
浏览:正在获取网页内容…
计算:价格分析完成…

Other changes:

  • Removed retry/throttle from send.ts — with only ~2 messages per
    turn instead of ~10, the per-minute budget is no longer a concern.
  • Deliver no longer re-throws — send failures send an error notice
    without aborting the dispatch, so one bad message doesn't block the
    rest.

Effect

Before After
Messages per turn (per user) ~10 (individual tool results + conclusion) ~2 (1 merged summary + conclusion)
Per-minute budget usage Exhausted in seconds Well under limit
Rate-limit risk Guaranteed for multi-tool turns Eliminated
Final conclusion delivery Frequently lost (dispatch aborted) Reliable
Latency Added by retry + throttle No artificial delays

linyu.wang and others added 3 commits May 28, 2026 14:45
Two fixes for the WeChat channel outbound delivery:

1. sendMessage now parses the API response body and throws on non-zero
   ret/errcode, catching silent failures where the HTTP request succeeds
   but the WeChat backend returns an error.

2. Tool result payloads sent as chat-visible messages are truncated to
   120 chars (non-error only). Previously, web_search results of 5000+
   chars were echoed verbatim, flooding the chat and pushing the actual
   AI reply out of view.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… limiting

Instead of sending each tool result as an individual WeChat message
(which triggers rate limiting after ~10 rapid-fire sends), buffer all
tool results and merge them into one summary message. Non-tool messages
(final conclusions, etc.) are still sent individually.

Also removes the per-delivery retry and throttle mechanisms that were
added earlier — with only ~2 messages per turn instead of ~10, rate
limiting is no longer a concern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@niruiyu

niruiyu commented May 29, 2026

Copy link
Copy Markdown

I am having the similar issue: if a request triggers 2-3 tools call, the final summary message can be delivered to wechat successfully. But if a request triggers ~10 tools call, the final summary message can NOT be delivered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants