feat(webhook): cap stored response body size#975
Conversation
A large destination response can make the attempt LogEntry exceed the log
queue's per-message size limit. Publishing then fails, the attempt never
lands in the logstore, and the event retries forever ("no prior attempt
found in logstore, will retry").
Add an opt-in cap (default 0 = no limit) on the response body stored on the
attempt, threaded through both webhook modes via WithMaxResponseBodyBytes.
When exceeded, the body is replaced wholesale with a placeholder rather than
truncated, so consumers never see a partial body. Reading uses a maxBytes+1
LimitReader to detect overflow without buffering the whole body.
Refs: #974
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Expose the webhook response body cap as config. Unset (0) keeps the current behavior of storing the full response body. Refs: #974 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drives the full path — config -> provider option -> response capture -> logstore -> /attempts — with a destination returning a 200 KB body against a 1 KB cap, asserting the stored body is the placeholder; a second destination under the cap confirms verbatim storage. Refs: #974 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Refs: #974 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Align fields after adding maxResponseBodyBytes in 56f8bf4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alexbouchardd
left a comment
There was a problem hiding this comment.
Should we have default max values based on the queue used instead of defaulting to 0? Pubsub and SQS both have per message limits, no point in allowing more then that
I'm not sure if that number is the right limit. Each entry log is the Event + Attempt. It has the body but also other bytes to wrap the data, so we can't be exact. But most importantly it contains the event.data which is sort of variable. That's why it could be a bit tricky to come up with a reasonable default here. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
If it's tricky for us, it's even harder for the user. In this case, I'd tend to default to something fairly conservative? The other option here which we haven't really discussed yet is to introduce compression... |
|
Fair, we can consider a conservative number for this. But in this context, what if the event payload exceeds the limit? We can't not store it because the data is required for retries. Should we have the same mechanism on the event publishing input validation? |
|
Right I do think one possibility here is that we have a default conservative response limit and the operator can tweak it depending on their max size they publish for the event data. Since it's operator controlled seems odd to have a explicit limit on that |
Bounds the destination response body stored on a delivery attempt. A large response could make the attempt's log message exceed the event queue's per-message size limit (e.g. SQS's 256 KB) — publishing then fails, the attempt never lands in the logstore, and the event retries forever (
no prior attempt found in logstore, will retry). #845 split oversized batches but a single oversized message still failed; this caps the body at the source.Opt-in via
DESTINATIONS_WEBHOOK_MAX_RESPONSE_BODY_BYTES(default0= no limit), applied to both webhook modes. Over the limit, the body is replaced with a placeholder rather than truncated, so consumers never see a partial body.Closes #974.
Heads-up — body is replaced, not truncated (56f8bf4): reads via a
maxBytes+1io.LimitReaderto detect overflow without buffering the whole body, which means we stop before draining the response, so that connection won't be keep-alive reused — deliberate, vs. downloading an arbitrarily large body just to discard it.🤖 Generated with Claude Code