Skip to content

Fix: Correct chunk overlap calculation to use actual character count#6

Open
echobt wants to merge 1 commit intomainfrom
fix/issue-22-chunk-overlap-calculation
Open

Fix: Correct chunk overlap calculation to use actual character count#6
echobt wants to merge 1 commit intomainfrom
fix/issue-22-chunk-overlap-calculation

Conversation

@echobt
Copy link
Copy Markdown
Contributor

@echobt echobt commented Jan 19, 2026

Summary

This change corrects the chunk overlap calculation to properly count characters instead of using an arbitrary division by 40.

Problem

The original implementation calculated overlap by dividing the configured chunk_overlap value by 40, under the assumption that lines average 40 characters. This approach was flawed:

  • With the default chunk_overlap=64, the actual overlap was only 1 line (64/40=1), regardless of actual line lengths
  • This did not match the documented behavior where chunk_overlap specifies the overlap in characters
  • Files with very short or very long lines would have inconsistent overlap behavior

Solution

The fix implements proper character-based overlap calculation by:

  1. Iterating backwards through preceding lines
  2. Accumulating character counts until the target overlap is reached
  3. Using the actual number of lines needed to achieve the desired character overlap

This ensures the overlap accurately reflects the configured character count, improving indexing consistency across files with varying line lengths.

Testing

The fix maintains the same function signature and behavior contract. The changes affect:

  • src/core/indexer.rs: Indexer.chunk_content() method
  • src/core/indexer.rs: ServerIndexer.chunk_content() method
  • src/watcher.rs: FileWatcher.chunk_content() method

Related Issue

Fixes PlatformNetwork/bounty-challenge#22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Chunk Overlap Calculation Uses Wrong Formula (Divides by 40)

1 participant