Summary
rtk panics with byte index N is not a char boundary when the output filter truncates output that exceeds its size threshold and the truncation byte index lands inside a multibyte (CJK) character. The wrapped command then exits 127 and produces no output at all.
This is the same root cause as the closed #1837 (panic on large curl responses), but #1837 was only worked around by passing curl through verbatim. The underlying byte-index string slicing in the parser still affects every other wrapped command (test runners, compilers, linters, VCS, …) on CJK-heavy output. Related: #2318 (CJK panic in rtk gain --history).
Version / Environment
- rtk 0.42.4 (latest stable)
- Windows 11 (reproduced via both Git Bash and PowerShell). The faulty logic is OS-independent, so all platforms are expected to be affected.
Panic output
thread 'main' panicked at src\parser\mod.rs:191:39:
end byte index 13026 is not a char boundary; it is inside '시' (bytes 13024..13027 of string)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exit code: 127 → from the caller's perspective the command simply fails with no result.
Steps to reproduce (minimal, portable)
Create ~20 KB of dense Korean text so the ~13 KB truncation boundary is guaranteed to fall inside a multibyte char, then run any recognized command over it:
python -c "print(('가나다라마바사아자차카타파하 시험 데이터 출력 라인입니다 ' * 20 + chr(10)) * 40)" > big_cjk.txt
rtk read big_cjk.txt
# or: rtk grep 시험 big_cjk.txt
Real-world trigger that surfaced it: rtk vitest run <test> in a project whose Vitest logger output contains Korean — total output exceeded ~13 KB and rtk panicked, so the test command returned nothing.
Root cause / suggested fix
The truncation at src/parser/mod.rs:191 appears to slice the string by a raw byte index (e.g. &s[..n]) without ensuring n is a UTF-8 char boundary. Snap the index to a valid boundary before slicing:
while n > 0 && !s.is_char_boundary(n) { n -= 1; }, or
str::floor_char_boundary(n) (once stabilized), or
- truncate via
s.char_indices() / take by chars() instead of by bytes.
Because #1837 was patched only for curl (passthrough), the same byte-slicing bug remains live across the rest of the filter pipeline. A sweep for raw byte-index slicing on potentially non-ASCII strings would prevent further recurrences.
Impact / current workaround
- rtk becomes unusable for CJK (Korean / Japanese / Chinese) projects whenever a wrapped command emits large output (test / build / lint / log).
- Workaround: add the affected commands to
[hooks] exclude_commands so rtk doesn't wrap them — but that disables rtk for exactly the high-token commands it is meant to optimize.
Summary
rtk panics with
byte index N is not a char boundarywhen the output filter truncates output that exceeds its size threshold and the truncation byte index lands inside a multibyte (CJK) character. The wrapped command then exits 127 and produces no output at all.This is the same root cause as the closed #1837 (panic on large curl responses), but #1837 was only worked around by passing
curlthrough verbatim. The underlying byte-index string slicing in the parser still affects every other wrapped command (test runners, compilers, linters, VCS, …) on CJK-heavy output. Related: #2318 (CJK panic inrtk gain --history).Version / Environment
Panic output
Exit code: 127 → from the caller's perspective the command simply fails with no result.
Steps to reproduce (minimal, portable)
Create ~20 KB of dense Korean text so the ~13 KB truncation boundary is guaranteed to fall inside a multibyte char, then run any recognized command over it:
Real-world trigger that surfaced it:
rtk vitest run <test>in a project whose Vitest logger output contains Korean — total output exceeded ~13 KB and rtk panicked, so the test command returned nothing.Root cause / suggested fix
The truncation at
src/parser/mod.rs:191appears to slice the string by a raw byte index (e.g.&s[..n]) without ensuringnis a UTF-8 char boundary. Snap the index to a valid boundary before slicing:while n > 0 && !s.is_char_boundary(n) { n -= 1; }, orstr::floor_char_boundary(n)(once stabilized), ors.char_indices()/ take bychars()instead of by bytes.Because #1837 was patched only for
curl(passthrough), the same byte-slicing bug remains live across the rest of the filter pipeline. A sweep for raw byte-index slicing on potentially non-ASCII strings would prevent further recurrences.Impact / current workaround
[hooks] exclude_commandsso rtk doesn't wrap them — but that disables rtk for exactly the high-token commands it is meant to optimize.