Skip to content

Panic 'not a char boundary' in output filter (src/parser/mod.rs:191) on large CJK output — affects all wrapped commands (same root as #1837) #2509

Description

@kdhclear

Summary

rtk panics with byte index N is not a char boundary when the output filter truncates output that exceeds its size threshold and the truncation byte index lands inside a multibyte (CJK) character. The wrapped command then exits 127 and produces no output at all.

This is the same root cause as the closed #1837 (panic on large curl responses), but #1837 was only worked around by passing curl through verbatim. The underlying byte-index string slicing in the parser still affects every other wrapped command (test runners, compilers, linters, VCS, …) on CJK-heavy output. Related: #2318 (CJK panic in rtk gain --history).

Version / Environment

  • rtk 0.42.4 (latest stable)
  • Windows 11 (reproduced via both Git Bash and PowerShell). The faulty logic is OS-independent, so all platforms are expected to be affected.

Panic output

thread 'main' panicked at src\parser\mod.rs:191:39:
end byte index 13026 is not a char boundary; it is inside '시' (bytes 13024..13027 of string)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Exit code: 127 → from the caller's perspective the command simply fails with no result.

Steps to reproduce (minimal, portable)

Create ~20 KB of dense Korean text so the ~13 KB truncation boundary is guaranteed to fall inside a multibyte char, then run any recognized command over it:

python -c "print(('가나다라마바사아자차카타파하 시험 데이터 출력 라인입니다 ' * 20 + chr(10)) * 40)" > big_cjk.txt
rtk read big_cjk.txt
# or:  rtk grep 시험 big_cjk.txt

Real-world trigger that surfaced it: rtk vitest run <test> in a project whose Vitest logger output contains Korean — total output exceeded ~13 KB and rtk panicked, so the test command returned nothing.

Root cause / suggested fix

The truncation at src/parser/mod.rs:191 appears to slice the string by a raw byte index (e.g. &s[..n]) without ensuring n is a UTF-8 char boundary. Snap the index to a valid boundary before slicing:

  • while n > 0 && !s.is_char_boundary(n) { n -= 1; }, or
  • str::floor_char_boundary(n) (once stabilized), or
  • truncate via s.char_indices() / take by chars() instead of by bytes.

Because #1837 was patched only for curl (passthrough), the same byte-slicing bug remains live across the rest of the filter pipeline. A sweep for raw byte-index slicing on potentially non-ASCII strings would prevent further recurrences.

Impact / current workaround

  • rtk becomes unusable for CJK (Korean / Japanese / Chinese) projects whenever a wrapped command emits large output (test / build / lint / log).
  • Workaround: add the affected commands to [hooks] exclude_commands so rtk doesn't wrap them — but that disables rtk for exactly the high-token commands it is meant to optimize.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions