Skip to content

[BUG] Empty File Extension Pattern Matches Binary Files #87

@olddev94

Description

@olddev94

Project

vgrep

Description

The should_index() function in src/core/indexer.rs line 310 includes an empty string "" in the list of indexable file extensions. This causes files without extensions (including compiled binaries, executables, and other non-text files) to be indexed, leading to errors or corrupted embeddings.

Error Message

Error: Failed to read file
Caused by: stream did not contain valid UTF-8

Or silently produces garbage embeddings for binary content.

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Create a project with compiled binaries:
    cd /tmp/test_project
    echo 'fn main() { println!("hello"); }' > main.rs
    rustc main.rs -o my_binary
  2. Run indexer: vgrep index
  3. Observe that my_binary (the compiled executable) is attempted to be indexed

Expected Behavior

Files without extensions should NOT be indexed by default, except for specific known filenames (Makefile, Dockerfile, etc.) which are already handled in the filename check.

Actual Behavior

All files without extensions are considered indexable, causing:

  1. UTF-8 decode errors for binary files
  2. Wasted processing time attempting to read binaries
  3. Potentially corrupted embeddings if binary content is partially UTF-8 valid
  4. Index bloat from non-code files

Additional Context

The same bug exists in:

  • src/core/indexer.rs:686 (ServerIndexer)
  • src/watcher.rs:244 (FileWatcher)

All three locations need to be fixed consistently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingvalidValid issuevgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions