-
Couldn't load subscription status.
- Fork 340
fix: Panic on CJK character truncation in MCP prompt descriptions #3227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
19411fa to
44bd8a3
Compare
| // If we found a valid boundary, use it; otherwise use the last character start | ||
| if truncate_at == 0 && !text.is_empty() { | ||
| // Edge case: even the first character is too long | ||
| truncate_at = text.char_indices().next().map(|(i, _)| i).unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: Should we delete this if check? char_indices().next() always returns the first character at index 0, so this line doesn't modify truncate_at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. Removed in c029a35. All tests still pass. ![]()
d9403af to
03d4d0d
Compare
|
I've made the corrections. Please review 🙇 |
03d4d0d to
c5d5b5e
Compare
- Consolidate ASCII and CJK test cases into single test function - Reduces diff size while maintaining comprehensive coverage - Ensures backward compatibility verification
Add comprehensive test coverage for edge cases: - Very small max_length values - CJK characters that don't fit in target length - Emoji (4-byte UTF-8 characters) - Mixed ASCII and CJK text - Single CJK character within limit All tests verify UTF-8 safe truncation behavior.
Remove the unnecessary if-check that was doing nothing. char_indices().next() always returns index 0 for the first character, so this code was just reassigning truncate_at = 0 without any effect. All tests pass without this code, confirming it was redundant.
Updated Cargo.lock to enable local testing and verification of the fix.
9300ee6 to
a070120
Compare
Replace custom truncate logic in truncate_description with the existing truncate_safe_in_place utility function to ensure consistency across the codebase and leverage tested UTF-8 safe truncation logic.
| } | ||
| let mut result = text.to_string(); | ||
|
|
||
| truncate_safe_in_place(&mut result, max_length, "..."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that truncate_safe_in_place already exists in the codebase, so I used that instead ![]()
Summary
Fixes panic when truncating MCP prompt descriptions containing CJK (Chinese, Japanese, Korean) characters.
Issues
Closes #3117
Closes #3170
Related to #3136, #3086
Problem
The
truncate_descriptionfunction inprompts.rswas using unsafe byte-index slicing (&text[..n]), which panics when the index falls in the middle of a multibyte UTF-8 character. CJK characters typically use 3 bytes in UTF-8, causing crashes when truncating at byte boundaries.Error example:
byte index 37 is not a char boundary; it is inside '한' (bytes 36..39)
Solution
char_indices()iterationChanges
truncate_description()function incrates/chat-cli/src/cli/chat/cli/prompts.rsNote
Testing
Verified with test cases from reported issues:
All tests pass without panics, respecting character boundaries.
Impact
/prompts listcommand)