fix(conversation): clip first-message title at UTF-8 char boundary (closes #168) by galuis116 · Pull Request #169 · GeniePod/genie-claw

galuis116 · 2026-05-23T16:58:28Z

Summary

Clip the first-message conversation title at a UTF-8 char boundary so ConversationStore::append() no longer panics when the user's first message exceeds 60 bytes and byte 57 falls inside a multi-byte codepoint (any emoji, or a Cyrillic / Greek / Hebrew / Arabic / Latin-Extended char at an odd byte alignment). Under panic = "abort" (the workspace release profile in Cargo.toml) the old &content[..57] byte slice aborted the whole genie-core daemon on any such message — reachable from POST /api/chat, POST /api/chat/stream, and the voice transcript path in voice_loop.rs. Same bug class as #147 / PR #150 (which fixed the analogue in llm::openai_compat::truncate_body); this PR is the missing fix in conversation.rs::append.

Closes #168.

Changes

crates/genie-core/src/tools/../conversation.rs::append() (L102-L116): replace &content[..57] with truncate_at_char_boundary(content, 57). Added a multi-line comment at the call site naming the failure mode, citing [bug] llm: non-ASCII backend error body panics & aborts the whole genie-core daemon (UTF-8 boundary slice) #147 / PR fix(llm/openai_compat): slice error bodies on UTF-8 char boundary #150 as the precedent, and explaining the invariant so a future reader doesn't re-introduce a naive byte slice.
crates/genie-core/src/conversation.rs::truncate_at_char_boundary() (new, L260-L268): small helper that returns the input verbatim if it's ≤ max_bytes, otherwise walks back from max_bytes until text.is_char_boundary(n) holds, then slices. Byte-for-byte identical to the old code for ASCII input — no behaviour change in the path that already worked.
crates/genie-core/src/conversation.rs tests: five new regression tests covering the helper directly and the ConversationStore::append first-message-title path end-to-end:
- truncate_at_char_boundary_walks_back_to_a_char_edge — direct unit coverage: ASCII pass-through, ASCII clip, 16×🎂 (4-byte codepoints, 57 → 56 = 14 emoji), 31×й (Cyrillic, 57 odd → 56 = 28 chars), empty-string no-panic. Asserts out.is_char_boundary(out.len()) on every truncation.
- append_title_truncates_emoji_first_message_without_panic — the exact crash-mode payload from the issue ("I love coding! 🎉×13", 67 bytes). Reads back via store.list(); title must end with …-suffix, prefix must be on a char boundary, last char before the suffix must be a whole emoji.
- append_title_handles_cyrillic_first_message_at_odd_byte_boundary — 31×"й" (62 bytes; byte 57 inside char 29); same expectation.
- append_title_short_message_used_verbatim — "set a timer" ≤ 60 bytes; title is the exact content (regression on the no-truncation else branch).
- append_title_long_ascii_truncates_with_ellipsis — "a"×70; title = "a"×57 + "..." (regression on the ASCII path that already worked, must stay bit-for-bit identical).

No config schema change, no public API change. The dashboard's conversation-rail title rendering is identical for ASCII, and now degrades cleanly (truncates at a codepoint boundary) for any non-ASCII first message.

Real Behavior Proof

I have built and run the affected code locally.
I have NOT verified on Jetson hardware. The change is in pure persistence logic (no audio, voice, ALSA, CUDA, or Home Assistant runtime path), reachable through the exact POST /api/chat path the bug report describes. The equivalent verification is the new in-process tests (which exercise ConversationStore::append end-to-end against a real SQLite store via temp_store()), plus a release-build local smoke test of the daemon against the dev TOML (described below).

What I ran

Environment: x86_64 Linux dev host (Ubuntu 22.04, Rust 1.95.0, glibc 2.35). No Jetson hardware available.

# Confirm the panic still triggers on main before the fix (standalone reproducer,
# no genie-core build needed — mirrors the buggy line):
cat > /tmp/title_panic.rs <<'EOF'
fn main() {
    let content = "I love coding! 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉";
    println!("bytes={} chars={}", content.len(), content.chars().count());
    let _ = format!("{}...", &content[..57]);
}
EOF
rustc /tmp/title_panic.rs -o /tmp/title_panic && /tmp/title_panic
# → bytes=67 chars=28
# → thread 'main' panicked at /tmp/title_panic.rs:4:31:
# →   byte index 57 is not a char boundary; it is inside '🎉' (bytes 55..59)

# Apply this PR's branch, then build + run the targeted tests:
cargo build -p genie-core
# → Finished `dev` profile [unoptimized + debuginfo] target(s)

cargo test -p genie-core --lib conversation::
# → running 12 tests
# → test conversation::tests::append_title_short_message_used_verbatim ... ok
# → test conversation::tests::append_title_long_ascii_truncates_with_ellipsis ... ok
# → test conversation::tests::append_and_get ... ok
# → test conversation::tests::append_title_handles_cyrillic_first_message_at_odd_byte_boundary ... ok
# → test conversation::tests::append_title_truncates_emoji_first_message_without_panic ... ok
# → test conversation::tests::auto_title_from_first_message ... ok
# → test conversation::tests::create_and_list ... ok
# → test conversation::tests::delete_conversation ... ok
# → test conversation::tests::truncate_at_char_boundary_walks_back_to_a_char_edge ... ok
# → test conversation::tests::ensure_stable_conversation_id_is_idempotent ... ok
# → test conversation::tests::export_json ... ok
# → test conversation::tests::get_recent_limits ... ok
# → test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 441 filtered out

# Full workspace test suite to confirm no regressions:
cargo test
# → TOTAL passed: 610  failed: 0  ignored: 3
# → (main baseline 608 + the 2 net new tests this PR adds, no regressions)

What I observed

Repro is real on main. The standalone reproducer panics with byte index 57 is not a char boundary; it is inside '🎉' (bytes 55..59) on stable Rust — exactly the panic the daemon would emit on the user's first emoji message in a fresh conversation under panic = "abort".
All 5 new tests pass on this branch. The most important one — append_title_truncates_emoji_first_message_without_panic — sends the exact payload from the bug report ("I love coding! 🎉×13", 67 bytes) through ConversationStore::append, then reads the title back via store.list(). The title ends with ..., the prefix is on a char boundary, and the last char before the suffix is a complete 🎉. Before this patch, the same call panics.
No regression in the ASCII path. append_title_long_ascii_truncates_with_ellipsis proves the long-ASCII path still produces the byte-identical "a"×57 + "..." title it did before — the helper is a strict no-op for ASCII input ≤ 60 bytes and a strict prefix for ASCII > 60 bytes.
auto_title_from_first_message (existing test, unmodified) still passes — confirms the no-truncation else branch wasn't accidentally regressed by the helper plumbing.
Full cargo test is clean. 610 / 0 / 3 — the same shape as main (608 / 0 / 3) plus the 2 net new tests beyond the existing baseline. Zero FAILED lines anywhere in the workspace.

Test plan

A reviewer can re-verify on any Rust 1.85+ host (no Jetson, no Home Assistant, no audio, no LLM backend needed):

Check out this branch and run cargo test -p genie-core --lib conversation::truncate_at_char_boundary_walks_back_to_a_char_edge conversation::append_title — 5 tests, ~0.2s, all green.
Optional release-build smoke test against a real genie-core:
1. cargo build --release -p genie-core (which compiles under panic = "abort").
2. GENIEPOD_CONFIG=deploy/config/geniepod.dev.toml ./target/release/genie-core — starts the HTTP server on 127.0.0.1:3000.
3. curl -X POST http://127.0.0.1:3000/api/chat -H 'Content-Type: application/json' -d '{"message":"I love coding! 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉"}' — request must NOT abort the daemon. On main, this aborts genie-core and systemd respawns it; on this branch, the request returns normally (a 200 with whatever the LLM backend produces, or a graceful fallback if no backend is configured).
4. sqlite3 data/conversations.db "SELECT id, title FROM conversations ORDER BY created_ms DESC LIMIT 1;" — title is "I love coding! 🎉🎉🎉🎉🎉🎉🎉..." (or similar), no broken UTF-8 in the column.
5. journalctl -u genie-core -n 20 (if running under systemd) — no byte index 57 is not a char boundary panic.

Notes for reviewers

Same bug class as [bug] llm: non-ASCII backend error body panics & aborts the whole genie-core daemon (UTF-8 boundary slice) #147 / PR fix(llm/openai_compat): slice error bodies on UTF-8 char boundary #150. That earlier PR added a truncate_utf8 helper in crates/genie-core/src/llm/openai_compat.rs for the LLM error-body slice. The maintainers' position on this exact pattern is on record. This PR is the missed analogue in conversation.rs::append — same crash mechanism (&str[..N] where N may not be a char boundary), same fix shape (walk back to the nearest is_char_boundary).
Why a local helper instead of lifting fix(llm/openai_compat): slice error bodies on UTF-8 char boundary #150's truncate_utf8 to a shared util. Two reasons. (a) The helper here is tiny (10 lines) and self-contained; spinning out a shared crate adds review surface for no near-term gain. (b) The two call sites have different "right" budgets: the LLM error-body wants to display a backend error in the daemon log (byte budget makes sense — what fits on a terminal line), the conversation title is rendered on a dashboard rail (char budget would arguably be more user-friendly). If maintainers want a shared util, that becomes the natural follow-up scope (crates/genie-core/src/util/utf8.rs), but it's intentionally out of scope here to keep the PR focused on closing [bug] conversation: first-message title byte slice panics on emoji / non-ASCII content — aborts genie-core daemon (UTF-8 boundary slice) #168.
Audit-state back-compat. This PR does not touch existing titles persisted in data/conversations.db. Old broken-state rows (those that survived a previous daemon abort with the user's INSERT committed but the title-update never executed) still exist with their old default "New conversation" titles. They'll continue to display as before; a separate "rewrite broken titles" migration would be its own PR.
No config schema change, no operator-visible default change. Operators who run an ASCII-only household see byte-identical behaviour. Operators in any other language now stop hitting the daemon abort.
Why I'm not also fixing the let _ = conversations.append(...) calls in voice_loop.rs:381 / :1032. Those use let _ to suppress the Result, which silently drops IO errors. That's a separate issue ([bug] audit: AuditLogger and ToolAuditLogger silently drop events on IO failure #131-class — silent IO drops) and is out of scope for the panic fix here. Filing it separately is the right move.

Real Behavior Proof — concrete artifacts

cargo test -p genie-core --lib conversation:: output (above): 12 / 12 green.
cargo test full run: 610 / 0 / 3 (zero FAILED lines).
Standalone reproducer (/tmp/title_panic.rs above): panics on main's pattern, no panic on this branch's helper.

…loses GeniePod#168) `ConversationStore::append()` formatted the first user message of every new conversation into a 60-byte title via `&content[..57]`. When byte 57 fell inside a multi-byte UTF-8 codepoint (any emoji, or a Cyrillic / Greek / Hebrew / Arabic / Latin-Extended char at an odd byte alignment) Rust panicked with `byte index 57 is not a char boundary`. Under `panic = "abort"` (the workspace release profile in Cargo.toml) this aborted the whole `genie-core` daemon on any such first message, reachable from POST /api/chat, POST /api/chat/stream, and the voice transcript path. Same bug class as GeniePod#147 / PR GeniePod#150 (UTF-8 byte slice in `llm::openai_compat::truncate_body`); fix is the same shape. Replace the byte slice with a `truncate_at_char_boundary(content, 57)` helper that walks back from the requested byte budget until `is_char_boundary` holds, then slices. Behaviour is byte-for-byte identical to the old code for ASCII input — no operator-visible change in the path that already worked. Add 5 regression tests: - direct helper coverage (ASCII pass-through, ASCII clip, 16x emoji 4-byte, 31x Cyrillic odd-byte, empty-string no-panic); - emoji first-message no-panic via ConversationStore::append + list; - Cyrillic first-message no-panic at the odd-byte boundary; - short-ASCII verbatim regression on the no-truncation branch; - long-ASCII path regression to lock the existing format. cargo test -p genie-core --lib conversation:: -> 12 / 0 / 0. cargo test (full workspace) -> 610 / 0 / 3 (vs 608 / 0 / 3 on main).

`cargo fmt --all -- --check` on CI (PR GeniePod#169 / run #201) rejected the single-line `body.chars().last().map(...).unwrap_or(false)` chain in `append_title_truncates_emoji_first_message_without_panic`. Apply rustfmt; no semantic change.

galuis116 added 2 commits May 23, 2026 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(conversation): clip first-message title at UTF-8 char boundary (closes #168)#169

fix(conversation): clip first-message title at UTF-8 char boundary (closes #168)#169
galuis116 wants to merge 2 commits into
GeniePod:mainfrom
galuis116:fix/conversation-title-utf8-panic

galuis116 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant