Skip to content

fix(json): tolerate raw control chars in strings instead of dropping output#2536

Open
dosthcpp wants to merge 1 commit into
rtk-ai:developfrom
dosthcpp:fix/json-tolerate-raw-control-chars
Open

fix(json): tolerate raw control chars in strings instead of dropping output#2536
dosthcpp wants to merge 1 commit into
rtk-ai:developfrom
dosthcpp:fix/json-tolerate-raw-control-chars

Conversation

@dosthcpp

Copy link
Copy Markdown

Summary

  • rtk json (and pipes like curl … | rtk json) used strict serde_json::from_str, which rejects unescaped control characters (U+0000–U+001F) inside string values per RFC 8259 §7. Some real-world producers emit them anyway (e.g. an API echoing a user-supplied newline into a field). rtk then printed nothing and exited 1, losing the whole payload and forcing a raw re-fetch — net-negative for a token-saving tool and contrary to the Never Block principle.
  • Adds parse_json_lenient: strict parse first (fast path, zero behavior change for valid JSON), and only on failure retry once with raw in-string control chars rewritten to their equivalent \uXXXX escapes via a string-aware scanner. Whitespace between tokens and already-escaped sequences stay byte-for-byte identical; genuinely malformed input still surfaces the original strict error.

Before / After

$ printf '{"body":"line1\nline2"}' | rtk json -      # raw newline inside the string
# before:  rtk: Failed to parse JSON: control character (...)   (exit 1, 0 bytes stdout)
# after:   { body: "line1
#           line2" }                                            (exit 0)

Test plan

  • cargo test json_cmd19 passed, 0 failed (9 new + 10 existing). New tests cover recovery in compact and schema modes, control char in keys, fast-path borrow, whitespace preservation, existing-escape preservation, and malformed-still-errors.
  • cargo fmt -- --check src/cmds/system/json_cmd.rs — clean
  • cargo clippy --lib — no new warnings on the changed file
  • Manual: printf '{"id":"abc","body":"a\nb"}' | rtk json - now renders the payload (exit 0); printf '{not json' | rtk json - still errors (exit 1); valid JSON output unchanged.

Targets develop per CONTRIBUTING.md.

…output

`rtk json` (and any pipe like `curl ... | rtk json`) called strict
`serde_json::from_str`, which rejects unescaped control characters
(U+0000–U+001F) inside string values per RFC 8259 §7. Some real-world
producers emit them anyway (e.g. an API echoing a user-supplied newline
verbatim into a field). When that happened rtk printed nothing and exited
1, losing the entire payload and forcing the user to re-fetch with a raw
passthrough — a net-negative outcome for a token-saving tool, and a
violation of the "Never Block" design principle.

Add `parse_json_lenient`: parse strictly first (fast path, zero behavior
change for valid JSON), and only on failure retry once with raw in-string
control characters rewritten to their equivalent `\uXXXX` escapes via a
string-aware scanner. Whitespace between tokens and already-escaped
sequences are left byte-for-byte identical, and genuinely malformed input
still surfaces the original strict error.

Adds 9 unit tests (recovery in both compact and schema modes, control char
in keys, fast-path borrow, whitespace preservation, existing-escape
preservation, and malformed-still-errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@CLAassistant

CLAassistant commented Jun 21, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants