Problem
The benchmark CI gate (BENCHMARK FAILED: 1 filter(s) produced more tokens than raw output) flags the curl filter as negative:
π΄ curl text β curl -s https://httpbin.org/robots.txt β rtk curl ... β 8 β 40 (-400%)
rtk curl wraps the response with status/header metadata. On a tiny payload (httpbin's robots.txt is ~8 tokens) that wrapper dominates, so the filtered output is larger than the raw command output.
Root cause
The never-worse guard added for grep in #2550 (fall back to the raw form when filtering doesn't shrink the output) is grep-only. Other filters β curl here β have no such cap, so they can emit more tokens than the underlying command on small inputs.
Proposal
Extract the never-worse comparison into a shared core helper and apply it across filters (start with curl, then audit the rest), so no filter ever emits more tokens than the raw command. This makes the benchmark gate hold structurally rather than per-filter.
Evidence
Problem
The benchmark CI gate (
BENCHMARK FAILED: 1 filter(s) produced more tokens than raw output) flags thecurlfilter as negative:rtk curlwraps the response with status/header metadata. On a tiny payload (httpbin'srobots.txtis ~8 tokens) that wrapper dominates, so the filtered output is larger than the raw command output.Root cause
The never-worse guard added for
grepin #2550 (fall back to the raw form when filtering doesn't shrink the output) is grep-only. Other filters βcurlhere β have no such cap, so they can emit more tokens than the underlying command on small inputs.Proposal
Extract the never-worse comparison into a shared
corehelper and apply it across filters (start withcurl, then audit the rest), so no filter ever emits more tokens than the raw command. This makes the benchmark gate hold structurally rather than per-filter.Evidence
π΄ curl text 8 β 40 (-400%)grep); the same run shows every grep shape positive:fn +94%,-l +99%,-c +95%.