fix(build): truncate search index descriptions by rune count not bytes#100
fix(build): truncate search index descriptions by rune count not bytes#100greynewell merged 2 commits intomainfrom
Conversation
…cation Adds build_test.go covering generateSearchIndex: - short description written verbatim - long ASCII description truncated to exactly 120 runes - multi-byte (é, 2-byte) description truncated at rune boundary → valid UTF-8 - search disabled → no file written Regression for the byte-slice truncation bug fixed in build.go. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
generateSearchIndex was using len(desc)/desc[:120] (byte operations) to limit descriptions to 120 characters. For multi-byte UTF-8 characters (é, ñ, ü, CJK, emoji) this could split a character in the middle, producing a replacement character (U+FFFD) when json.Marshal silently replaces invalid UTF-8 sequences in the output JSON. Fix: convert to []rune, check/slice by rune count, convert back to string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 42 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Summary
generateSearchIndexwas usinglen(desc) > 120/desc[:120]— byte operations — to cap description length before writing the search index JSONjson.Marshalsilently replaces invalid UTF-8 withU+FFFD(replacement character\uFFFD) — so the search index would contain corrupted descriptions with no error returned[]rune, check/slice by rune count, convert back to stringRegression test added in
build_test.go:é(2 bytes each) truncated at rune boundary → valid UTF-8Test plan
go test ./internal/archdocs/pssg/build/...passesgo build ./...passesTestGenerateSearchIndex_MultiByteDescriptionTruncationfails on the old byte-slice code and passes on the fix🤖 Generated with Claude Code