Skip to content

Commit

Permalink
Python: Fixes to Cosmos DB NoSQL query syntax generation. (#10373)
Browse files Browse the repository at this point in the history
### Motivation and Context

Review information:
1. Why is this change required?

There were several SQL query syntax generation issue for both
text_search() and vectorized_search() methods

2. What problem does it solve?

a) In `_build_where_clauses_from_filter()` the WHERE clause creation did
not quote string values and did not properly handle list[] data model
attribures,
b) for `_build_vector_query()` method the WHERE clause was placed after
the ORDER BY clause causing query syntax errors,
c) for `_build_search_text_query()` method the data_model_definition
items were not being interrogated, thus added `CONTAINS()` in the WHERE
clause.

3. What scenario does it contribute to?

The ability to use Azure Cosmos DB NoSQL for `text_search()` and
`vectorized_search()`.

4. Issue resolution: 
#10368 


### Description

The errors noted in the Issue and the bug fixes noted above correct the
SQL query syntax generation for the `text_search()` and
`vectorized_search()` methods and now produce accurate results for
performing both types of queries with and without filters (aka WHERE
clause).

### Contribution Checklist

- ✅ The code builds clean without any errors or
warnings
- ✅ The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- ✅ All unit tests pass, and I have added new tests
where possible
- 🙏 I didn't break anyone 😄

---------

Co-authored-by: Evan Mattson <[email protected]>
  • Loading branch information
davidatorres and moonbox3 authored Feb 6, 2025
1 parent c445f5c commit 3461ed6
Showing 1 changed file with 24 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -178,23 +178,25 @@ def _build_search_text_query(self, options: VectorSearchOptions) -> str:
where_clauses = self._build_where_clauses_from_filter(options.filter)
contains_clauses = " OR ".join(
f"CONTAINS(c.{field}, @search_text)"
for field in self.data_model_definition.fields
if isinstance(field, VectorStoreRecordDataField) and field.is_full_text_searchable
for field, field_def in self.data_model_definition.fields.items()
if isinstance(field_def, VectorStoreRecordDataField) and field_def.is_full_text_searchable
)
if where_clauses:
where_clauses = f" {where_clauses} AND"
return (
f"SELECT TOP @top {self._build_select_clause(options.include_vectors)} " # nosec: B608
f"FROM c WHERE ({contains_clauses}) AND {where_clauses}" # nosec: B608
f"FROM c WHERE{where_clauses} ({contains_clauses})" # nosec: B608
)

def _build_vector_query(self, options: VectorSearchOptions) -> str:
where_clauses = self._build_where_clauses_from_filter(options.filter)
if where_clauses:
where_clauses = f"WHERE {where_clauses}"
where_clauses = f"WHERE {where_clauses} "
vector_field_name: str = self.data_model_definition.try_get_vector_field(options.vector_field_name).name # type: ignore
return (
f"SELECT TOP @top {self._build_select_clause(options.include_vectors)}," # nosec: B608
f" VectorDistance(c.{vector_field_name}, @vector) AS distance FROM c ORDER " # nosec: B608
f"BY VectorDistance(c.{vector_field_name}, @vector) {where_clauses}" # nosec: B608
f"SELECT TOP @top {self._build_select_clause(options.include_vectors)}, " # nosec: B608
f"VectorDistance(c.{vector_field_name}, @vector) AS distance FROM c " # nosec: B608
f"{where_clauses}ORDER BY VectorDistance(c.{vector_field_name}, @vector)" # nosec: B608
)

def _build_select_clause(self, include_vectors: bool) -> str:
Expand All @@ -218,11 +220,24 @@ def _build_where_clauses_from_filter(self, filters: VectorSearchFilter | None) -
return ""
clauses = []
for filter in filters.filters:
field_def = self.data_model_definition.fields[filter.field_name]
match filter:
case EqualTo():
clauses.append(f"c.{filter.field_name} = {filter.value}")
clause = ""
if field_def.property_type in ["int", "float"]:
clause = f"c.{filter.field_name} = {filter.value}"
if field_def.property_type == "str":
clause = f"c.{filter.field_name} = '{filter.value}'"
if field_def.property_type == "list[str]":
filter_value = f"ARRAY_CONTAINS(c.{filter.field_name}, '{filter.value}')"
if field_def.property_type in ["list[int]", "list[float]"]:
filter_value = f"ARRAY_CONTAINS(c.{filter.field_name}, {filter.value})"
clauses.append(clause)
case AnyTagsEqualTo():
clauses.append(f"{filter.value} IN c.{filter.field_name}")
filter_value = filter.value
if field_def.property_type == "list[str]":
filter_value = f"'{filter.value}'"
clauses.append(f"{filter_value} IN c.{filter.field_name}")
case _:
raise ValueError(f"Unsupported filter: {filter}")
return " AND ".join(clauses)
Expand Down

0 comments on commit 3461ed6

Please sign in to comment.