Skip to content

Conversation

@mzkmnk
Copy link

@mzkmnk mzkmnk commented Oct 22, 2025

Summary

Fixes panic when truncating MCP prompt descriptions containing CJK (Chinese, Japanese, Korean) characters.

Issues

Closes #3117
Closes #3170
Related to #3136, #3086

Problem

The truncate_description function in prompts.rs was using unsafe byte-index slicing (&text[..n]), which panics when the index falls in the middle of a multibyte UTF-8 character. CJK characters typically use 3 bytes in UTF-8, causing crashes when truncating at byte boundaries.

Error example:

byte index 37 is not a char boundary; it is inside '한' (bytes 36..39)

Solution

  • Replaced unsafe byte-index slicing with UTF-8 safe char_indices() iteration
  • Finds the last valid character boundary before the target length
  • Handles edge cases (empty strings, very short max_length, emojis)
  • Maintains backward compatibility with ASCII text

Changes

  • Modified truncate_description() function in crates/chat-cli/src/cli/chat/cli/prompts.rs
  • Added comprehensive test cases for CJK characters
  • Updated existing tests to reflect UTF-8 safe behavior

Note

  • Cargo.lock was updated to allow local testing and verification of the changes.

Testing

Verified with test cases from reported issues:

All tests pass without panics, respecting character boundaries.

Impact

  • Scope: MCP prompt description display (/prompts list command)
  • Compatibility: Fully backward compatible
  • Risk: Low - only affects truncation logic for long descriptions
スクリーンショット 2025-10-22 午後9 36 39 スクリーンショット 2025-10-22 午後9 37 57

@mzkmnk mzkmnk marked this pull request as draft October 22, 2025 13:15
@mzkmnk mzkmnk force-pushed the fix/cjk-truncate-panic-3117-3170 branch from 19411fa to 44bd8a3 Compare October 22, 2025 13:18
@mzkmnk mzkmnk marked this pull request as ready for review October 22, 2025 13:46
// If we found a valid boundary, use it; otherwise use the last character start
if truncate_at == 0 && !text.is_empty() {
// Edge case: even the first character is too long
truncate_at = text.char_indices().next().map(|(i, _)| i).unwrap_or(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Should we delete this if check? char_indices().next() always returns the first character at index 0, so this line doesn't modify truncate_at

Copy link
Author

@mzkmnk mzkmnk Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evanliu048

You're absolutely right. Removed in c029a35. All tests still pass. :octocat:

@mzkmnk mzkmnk force-pushed the fix/cjk-truncate-panic-3117-3170 branch from d9403af to 03d4d0d Compare October 22, 2025 22:05
@mzkmnk mzkmnk changed the title Fix: Panic on CJK character truncation in MCP prompt descriptions fix: Panic on CJK character truncation in MCP prompt descriptions Oct 22, 2025
@mzkmnk
Copy link
Author

mzkmnk commented Oct 23, 2025

@evanliu048

I've made the corrections. Please review 🙇

@mzkmnk mzkmnk force-pushed the fix/cjk-truncate-panic-3117-3170 branch from 03d4d0d to c5d5b5e Compare October 24, 2025 13:30
- Replace unsafe byte-index slicing with UTF-8 safe char_indices()
- Fixes aws#3117, aws#3170 where truncate_description panicked on multibyte characters
- Ensures truncation respects character boundaries for CJK languages
- Maintains backward compatibility with ASCII text
- Consolidate ASCII and CJK test cases into single test function
- Reduces diff size while maintaining comprehensive coverage
- Ensures backward compatibility verification
Add comprehensive test coverage for edge cases:
- Very small max_length values
- CJK characters that don't fit in target length
- Emoji (4-byte UTF-8 characters)
- Mixed ASCII and CJK text
- Single CJK character within limit

All tests verify UTF-8 safe truncation behavior.
Remove the unnecessary if-check that was doing nothing.
char_indices().next() always returns index 0 for the first character,
so this code was just reassigning truncate_at = 0 without any effect.

All tests pass without this code, confirming it was redundant.
Updated Cargo.lock to enable local testing and verification of the fix.
@mzkmnk mzkmnk force-pushed the fix/cjk-truncate-panic-3117-3170 branch from 9300ee6 to a070120 Compare October 25, 2025 01:58
Replace custom truncate logic in truncate_description with the existing
truncate_safe_in_place utility function to ensure consistency across the
codebase and leverage tested UTF-8 safe truncation logic.
}
let mut result = text.to_string();

truncate_safe_in_place(&mut result, max_length, "...");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that truncate_safe_in_place already exists in the codebase, so I used that instead :octocat:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic when byte index is not a char boundary on non-ASCII strings Amazon Q CLI crashes when there is long CJK MCP Prompts description

2 participants