-
Notifications
You must be signed in to change notification settings - Fork 270
DOC-5801: search: add new FT.HYBRID command #2210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat-ros-8.4
Are you sure you want to change the base?
Conversation
|
Staging links: |
adrianoamaral
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusting implementation details, command information and fixing errors
| ## Complexity | ||
|
|
||
| FT.HYBRID complexity depends on both the text search and vector similarity components: | ||
| - Text search: O(n) where n is the number of matching documents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Text search: O(n) where n is the number of matching documents | |
| - Text search: O(n) for simple term searches, where n is the number of matching documents. In multi-term queries with INTERSECT or UNION, or when using fuzzy or prefix matches, complexity increases proportionally to the total number of entries scanned across all participating terms. |
|
|
||
| Performs hybrid search combining text search and vector similarity with configurable fusion methods. | ||
|
|
||
| FT.HYBRID simplifies the onboarding of new developers who want to explore hybrid search for semantic retrieval, RAG (Retrieval-Augmented Generation), and agent applications. This command re-uses some of the query terms syntax from FT.SEARCH and FT.AGGREGATE while simplifying and splitting the vector similarity functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| FT.HYBRID simplifies the onboarding of new developers who want to explore hybrid search for semantic retrieval, RAG (Retrieval-Augmented Generation), and agent applications. This command re-uses some of the query terms syntax from FT.SEARCH and FT.AGGREGATE while simplifying and splitting the vector similarity functionality. | |
| `FT.HYBRID` provides a unified interface for combining traditional full-text and vector-based search within a single query. It supports hybrid retrieval use cases such as semantic search, Retrieval-Augmented Generation (RAG), and intelligent agent applications. The command builds on the familiar query syntax of `FT.SEARCH` and `FT.AGGREGATE`, simplifying hybrid query construction while enabling flexible post-processing through aggregation capabilities. |
| FT.HYBRID simplifies the onboarding of new developers who want to explore hybrid search for semantic retrieval, RAG (Retrieval-Augmented Generation), and agent applications. This command re-uses some of the query terms syntax from FT.SEARCH and FT.AGGREGATE while simplifying and splitting the vector similarity functionality. | ||
|
|
||
| {{< note >}} | ||
| This command will only return keys to which the user has read access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This command will only return keys to which the user has read access. | |
| This command will only return keys to which the user has read access. | |
| This command retrieves documents IDs (`keyid`) and scores. To retrieve entire documents, use projections with `LOAD *` or `LOAD <count> field...`. |
| <details open> | ||
| <summary><code>VSIM @vector_field "vector-data"</code></summary> | ||
|
|
||
| defines the vector similarity component of the hybrid query. The `@vector_field` specifies which vector field in the index to search against, and `vector-data` contains the query vector for similarity comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| defines the vector similarity component of the hybrid query. The `@vector_field` specifies which vector field in the index to search against, and `vector-data` contains the query vector for similarity comparison. | |
| defines the vector similarity component of the hybrid query. The `@vector_field` specifies which vector field in the index to search against (for example, `$vector`), and `vector-data` contains the query vector for similarity comparison (for example, `PARAMS 2 $vector <vector-blob>`). |
| <details open> | ||
| <summary><code>FILTER "filter-expression"</code></summary> | ||
|
|
||
| applies pre-filtering to vector search results. This filter affects which documents are considered for vector similarity but doesn't impact scoring, unlike the `SEARCH` component which affects both filtering and scoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| applies pre-filtering to vector search results. This filter affects which documents are considered for vector similarity but doesn't impact scoring, unlike the `SEARCH` component which affects both filtering and scoring. | |
| applies pre-filtering to vector search results or post-filtering when used after the `COMBINE` step as post-processing. This filter affects which documents are considered for vector similarity but doesn't impact scoring. In contrast, the `SEARCH` component affects both filtering and scoring. The `FILTER` syntax uses a search expression with the same syntax as [`FT.SEARCH`]({{< relref "/commands/ft.search" >}}), supporting all text search capabilities including field-specific searches, boolean operations, and phrase matching |
| - `@__key` - reserved for loading key IDs when required | ||
| - `@__score` - reserved for the combined score (can be aliased) | ||
| - `@vector_distance` - yields the vector distance (can be aliased) | ||
| - `@__combined_score` - fused score from the COMBINE step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `@__combined_score` - fused score from the COMBINE step |
| * [Map]({{< relref "/develop/reference/protocol-spec#maps" >}}) with the following fields: | ||
| - `total_results`: [Integer]({{< relref "/develop/reference/protocol-spec#integers" >}}) - total number of results | ||
| - `results`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of [maps]({{< relref "/develop/reference/protocol-spec#maps" >}}) containing document information | ||
| - `attributes`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of attribute names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `attributes`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of attribute names |
| - `total_results`: [Integer]({{< relref "/develop/reference/protocol-spec#integers" >}}) - total number of results | ||
| - `results`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of [maps]({{< relref "/develop/reference/protocol-spec#maps" >}}) containing document information | ||
| - `attributes`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of attribute names | ||
| - `format`: [Simple string]({{< relref "/develop/reference/protocol-spec#simple-strings" >}}) - result format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `format`: [Simple string]({{< relref "/develop/reference/protocol-spec#simple-strings" >}}) - result format |
| - `results`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of [maps]({{< relref "/develop/reference/protocol-spec#maps" >}}) containing document information | ||
| - `attributes`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of attribute names | ||
| - `format`: [Simple string]({{< relref "/develop/reference/protocol-spec#simple-strings" >}}) - result format | ||
| - `warning`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of warning messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `warning`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of warning messages | |
| - `warning`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of warning messages indicating partial results due index error or MAXPREFIXEXPANSIONS and TIMEOUT reached | |
| - `results`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of [maps]({{< relref "/develop/reference/protocol-spec#maps" >}}) containing document information |
| One of the following: | ||
| * [Map]({{< relref "/develop/reference/protocol-spec#maps" >}}) with the following fields: | ||
| - `total_results`: [Integer]({{< relref "/develop/reference/protocol-spec#integers" >}}) - total number of results | ||
| - `results`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of [maps]({{< relref "/develop/reference/protocol-spec#maps" >}}) containing document information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `results`: [Array]({{< relref "/develop/reference/protocol-spec#arrays" >}}) of [maps]({{< relref "/develop/reference/protocol-spec#maps" >}}) containing document information | |
| - `execution_time`: [doubles]({{< relref "/develop/reference/protocol-spec#doubles" >}}) containing hybrid query execution time |
@adrianoamaral: I've posted this PR as a draft, as I think we might need to iterate on it a bit. Also note that I used the time complexity and return information from the FT.SEARCH command, as (1) I don't have access to a system that includes this new command so I couldn't test, and (2) no JSON file (i.e., commands.json) was provided and I had to improvise. The syntax might be a bit wonky (again, no JSON file provided).