-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
v3.0.10: fix embedding token length bug
- was erroneously comparing token length to byte count in an assert Break query into chunks, create embeddings, find similar chunks summary of diff --git a/v3/core/chunk.go b/v3/core/chunk.go - Break the query into chunks using `chunksFromString` instead of directly creating embeddings from the query - Append each chunk's text to `queryStrings` array for embedding - Create embeddings from the `queryStrings` array - Check if the `embeddings` array is empty and return if true - Calculate the average of the embeddings using `util.MeanVector` - Use the averaged embedding to find the most similar chunks Add comments to improve Cli, reference gitea/tea, consider urfave summary of diff --git a/v3/core/cli.go b/v3/core/cli.go - Add comments to the Cli function noting potential improvements by referencing gitea/tea's approach and considering the use of urfave over kong Import envi, add token count checks, enhance debugging summary of diff --git a/v3/core/document.go b/v3/core/document.go - Import `github.com/stevegt/envi` to use environment variables - Add checks to verify chunk text length using token count before setting chunks and for new chunks - Assert that token count is below `g.embeddingTokenLimit` for both existing chunks and new chunks to prevent exceeding limits - Utilize `envi.Bool` to conditionally perform debug checks based on the `DEBUG` environment variable being set - Enhance debugging by ensuring chunk token counts do not exceed defined limits Update grokker.go version from 3.0.9 to 3.0.10 summary of diff --git a/v3/core/grokker.go b/v3/core/grokker.go - Update version from 3.0.9 to 3.0.10 in grokker.go Remove comments and enable debug logging in createEmbeddings summary of diff --git a/v3/core/openai.go b/v3/core/openai.go - Remove unnecessary comments about exceeding max tokens in `createEmbeddings` function - Enable debug logging for creating embeddings for each text chunk in `createEmbeddings` function Move go-diff to own block; add envi v0.2.0 to require block summary of diff --git a/v3/go.mod b/v3/go.mod - Move `github.com/sergi/go-diff v1.3.1` into its own require block - Add `github.com/stevegt/envi v0.2.0` to the require block Add envi v0.2.0 and goadapt v0.0.13 module info to go.sum summary of diff --git a/v3/go.sum b/v3/go.sum - Add github.com/stevegt/envi v0.2.0 checksum and module information - Add github.com/stevegt/goadapt v0.0.13 module information
- Loading branch information
Showing
7 changed files
with
45 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters