Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 75 additions & 125 deletions Quickstart-Semantic-Search/semantic-search-quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,52 +5,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Semantic search using the azure.search.documents library in the Azure SDK for Python"
"# Semantic ranking using the azure.search.documents library in the Azure SDK for Python"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This Jupyter Notebook adds semantic search, using pre-trained models from Microsoft to re-rank results based on a semantic match to the query. "
"This notebook demonstrates a semantic configuration in a search index and the semanti query syntax for reranking search results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import the libraries needed to create a search index, upload documents, and query the index\n",
"%pip install azure-search-documents\n",
"%pip show azure-search-documents\n",
"%pip install python-dotenv\n",
"\n",
"import os\n",
"from azure.core.credentials import AzureKeyCredential\n",
"from azure.search.documents.indexes import SearchIndexClient \n",
"from azure.search.documents import SearchClient\n",
"from azure.search.documents.indexes.models import ( \n",
" SearchIndex, \n",
" SearchFieldDataType, \n",
" SimpleField, \n",
" SearchableField,\n",
" ComplexField,\n",
" SearchIndex, \n",
" SemanticConfiguration, \n",
" PrioritizedFields, \n",
" SemanticField, \n",
" SemanticSettings, \n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In this step, initialize the search client used to make each request. Provide the name and admin API key of your search service. If you get ConnectionError \"Failed to establish a new connection\", verify that the api-key is a primary or secondary admin key, and not a query key."
"## Install packages and set variables"
]
},
{
Expand All @@ -59,30 +29,9 @@
"metadata": {},
"outputs": [],
"source": [
"# Set the service endpoint and API key from the environment\n",
"\n",
"service_name = \"<YOUR-SEARCH-SERVICE-NAME>\"\n",
"admin_key = \"<YOUR-SEARCH-SERVICE-ADMIN-KEY>\"\n",
"\n",
"index_name = \"hotels-quickstart\"\n",
"\n",
"# Create an SDK client\n",
"endpoint = \"https://{}.search.windows.net/\".format(service_name)\n",
"admin_client = SearchIndexClient(endpoint=endpoint,\n",
" index_name=index_name,\n",
" credential=AzureKeyCredential(admin_key))\n",
"\n",
"search_client = SearchClient(endpoint=endpoint,\n",
" index_name=index_name,\n",
" credential=AzureKeyCredential(admin_key))\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In the next cell, the index \"hotels-quickstart\" will be deleted if it previously existed. This step allows you to reuse the index name."
"! pip install azure-search-documents==11.6.0b1 --quiet\n",
"! pip install azure-identity --quiet\n",
"! pip install python-dotenv --quiet"
]
},
{
Expand All @@ -91,20 +40,18 @@
"metadata": {},
"outputs": [],
"source": [
"# Delete the index if it exists\n",
"try:\n",
" result = admin_client.delete_index(index_name)\n",
" print ('Index', index_name, 'Deleted')\n",
"except Exception as ex:\n",
" print (ex)\n"
"# Provide variables\n",
"search_endpoint: str = \"PUT-YOUR-SEARCH-ENDPOINT-HERE\"\n",
"search_api_key: str = \"PUT-YOUR-SEARCH-API-KEY-HERE\"\n",
"index_name: str = \"hotels-quickstart\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Specify the index definition, including the fields that define each search document. This schema adds a semantic configuration that specifies how to use search fields during semantic ranking."
"## Create an index"
]
},
{
Expand All @@ -113,15 +60,34 @@
"metadata": {},
"outputs": [],
"source": [
"# Specify the index schema\n",
"name = index_name\n",
"from azure.core.credentials import AzureKeyCredential\n",
"\n",
"credential = AzureKeyCredential(search_api_key)\n",
"\n",
"from azure.search.documents.indexes import SearchIndexClient\n",
"from azure.search.documents import SearchClient\n",
"from azure.search.documents.indexes.models import (\n",
" ComplexField,\n",
" SimpleField,\n",
" SearchFieldDataType,\n",
" SearchableField,\n",
" SearchIndex,\n",
" SemanticConfiguration,\n",
" SemanticField,\n",
" SemanticPrioritizedFields,\n",
" SemanticSearch\n",
")\n",
"\n",
"# Create a search schema\n",
"index_client = SearchIndexClient(\n",
" endpoint=search_endpoint, credential=credential)\n",
"fields = [\n",
" SimpleField(name=\"HotelId\", type=SearchFieldDataType.String, key=True),\n",
" SearchableField(name=\"HotelName\", type=SearchFieldDataType.String, sortable=True),\n",
" SearchableField(name=\"Description\", type=SearchFieldDataType.String, analyzer_name=\"en.lucene\"),\n",
" SearchableField(name=\"Description_fr\", type=SearchFieldDataType.String, analyzer_name=\"fr.lucene\"),\n",
" SearchableField(name=\"Category\", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),\n",
" \n",
"\n",
" SearchableField(name=\"Tags\", collection=True, type=SearchFieldDataType.String, facetable=True, filterable=True),\n",
"\n",
" SimpleField(name=\"ParkingIncluded\", type=SearchFieldDataType.Boolean, facetable=True, filterable=True, sortable=True),\n",
Expand All @@ -136,54 +102,35 @@
" SearchableField(name=\"Country\", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),\n",
" ])\n",
" ]\n",
"\n",
"semantic_config = SemanticConfiguration(\n",
" name=\"my-semantic-config\",\n",
" prioritized_fields=PrioritizedFields(\n",
" prioritized_fields=SemanticPrioritizedFields(\n",
" title_field=SemanticField(field_name=\"HotelName\"),\n",
" prioritized_keywords_fields=[SemanticField(field_name=\"Category\")],\n",
" prioritized_content_fields=[SemanticField(field_name=\"Description\")]\n",
" keywords_fields=[SemanticField(field_name=\"Category\")],\n",
" content_fields=[SemanticField(field_name=\"Description\")]\n",
" )\n",
")\n",
"\n",
"semantic_settings = SemanticSettings(configurations=[semantic_config])\n",
"# Create the semantic settings with the configuration\n",
"semantic_search = SemanticSearch(configurations=[semantic_config])\n",
"\n",
"semantic_settings = SemanticSearch(configurations=[semantic_config])\n",
"scoring_profiles = []\n",
"suggester = [{'name': 'sg', 'source_fields': ['Tags', 'Address/City', 'Address/Country']}]\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Formulate the create_index request. This request targets the indexes collection of your search service and creates an index using the index schema from the previous cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"index = SearchIndex(\n",
" name=name,\n",
" fields=fields,\n",
" semantic_settings=semantic_settings,\n",
" scoring_profiles=scoring_profiles,\n",
" suggesters = suggester)\n",
"suggester = [{'name': 'sg', 'source_fields': ['Tags', 'Address/City', 'Address/Country']}]\n",
"\n",
"try:\n",
" result = admin_client.create_index(index)\n",
" print ('Index', result.name, 'created')\n",
"except Exception as ex:\n",
" print (ex)"
"# Create the search index with the semantic settings\n",
"index = SearchIndex(name=index_name, fields=fields, suggesters=suggester, scoring_profiles=scoring_profiles, semantic_search=semantic_search)\n",
"result = index_client.create_or_update_index(index)\n",
"print(f' {result.name} created')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, set up documents to include four hotel documents conforming to the schema."
"## Create a documents payload"
]
},
{
Expand All @@ -192,6 +139,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Create a documents payload\n",
"documents = [\n",
" {\n",
" \"@search.action\": \"upload\",\n",
Expand Down Expand Up @@ -277,7 +225,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Formulate the request. This upload_documents request targets the docs collection of the hotels-quickstart index and pushes the documents from the previous step into the search index."
"## Upload documents"
]
},
{
Expand All @@ -286,23 +234,27 @@
"metadata": {},
"outputs": [],
"source": [
"# Upload documents to the index\n",
"search_client = SearchClient(endpoint=search_endpoint,\n",
" index_name=index_name,\n",
" credential=credential)\n",
"try:\n",
" result = search_client.upload_documents(documents=documents)\n",
" print(\"Upload of new document succeeded: {}\".format(result[0].succeeded))\n",
"except Exception as ex:\n",
" print (ex.message)"
" print (ex.message)\n",
"\n",
"\n",
" index_client = SearchIndexClient(\n",
" endpoint=search_endpoint, credential=credential)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You're now ready to run some queries. For this operation, use search_client. \n",
"\n",
"### Empty query with unscored results\n",
"\n",
"The next cell contains a query expression that executes an empty search (`search=*`), returning an unranked list (search score = 1.0) of arbitrary documents. Because there is no criteria, all documents are included in results. This query prints fields from each document. It also adds `include_total_count=True` to get a count of all documents (4) in the results."
"## Run your first query"
]
},
{
Expand All @@ -311,6 +263,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Run an empty query (returns selected fields, all documents)\n",
"results = search_client.search(query_type='simple',\n",
" search_text=\"*\" ,\n",
" select='HotelName,Description',\n",
Expand All @@ -328,9 +281,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Full text search with BM25 ranking\n",
"\n",
"The previous query used an empty search string, which bypasses the search engine. In this query, search for \"what hotel has a good restaurant on site\". The query string undergoes lexical analysis and tokenization. The search engine scans for matches and assigns a search score based on term frequency and proximity. Higher scoring matches are returned first. In this query for \"what hotel has a good restaurant on site\", Sublime Cliff Hotel comes out on top because it's description includes \"site\". Terms that occur infrequently raise the search score of the document."
"## Run a term query"
]
},
{
Expand All @@ -339,10 +290,12 @@
"metadata": {},
"outputs": [],
"source": [
"# Run a text query (returns a BM25-scored result set)\n",
"results = search_client.search(query_type='simple',\n",
" search_text=\"what hotel has a good restaurant on site\" ,\n",
" select='HotelName,HotelId,Description')\n",
"\n",
" select='HotelName,HotelId,Description',\n",
" include_total_count=True)\n",
" \n",
"for result in results:\n",
" print(result[\"@search.score\"])\n",
" print(result[\"HotelName\"])\n",
Expand All @@ -354,9 +307,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Semantic search with captions\n",
"\n",
"Here's the same query, but with semantic ranking. Notice that the semantic ranker correctly identifies Triple Landscape Hotel as a more relevant result given the initial query. This query also returns captions generated by the models. The inputs are too minimal in this sample to create interesting captions, but the example succeeds in demonstrating the syntax."
"## Run a semantic query"
]
},
{
Expand All @@ -365,6 +316,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Runs a semantic query (runs a BM25-ranked query and promotes the most relevant matches to the top)\n",
"results = search_client.search(query_type='semantic', semantic_configuration_name='my-semantic-config',\n",
" search_text=\"what hotel has a good restaurant on site\", \n",
" select='HotelName,Description,Category', query_caption='extractive')\n",
Expand All @@ -373,7 +325,7 @@
" print(result[\"@search.reranker_score\"])\n",
" print(result[\"HotelName\"])\n",
" print(f\"Description: {result['Description']}\")\n",
" \n",
"\n",
" captions = result[\"@search.captions\"]\n",
" if captions:\n",
" caption = captions[0]\n",
Expand All @@ -384,13 +336,10 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add semantic answers\n",
"\n",
"Semantic search can generate answers to a query string that has the characteristics of a question. The generated answer is extracted verbatim from your content. To get a semantic answer, the question and answer must be closely aligned, and the model must find content that clearly answers the question. If potential answers don't have a high enough confidence score, the model won't return an answer. For demonstration purposes, the question in this example is designed to get a response so that you can see the syntax."
"## Return semantic answers"
]
},
{
Expand All @@ -399,9 +348,10 @@
"metadata": {},
"outputs": [],
"source": [
"# Run a semantic query that returns semantic answers \n",
"results = search_client.search(query_type='semantic', semantic_configuration_name='my-semantic-config',\n",
" search_text=\"what hotel stands out for its gastronomic excellence\", \n",
" select='HotelName,Description,Category', query_caption='extractive', query_answer=\"extractive\",)\n",
" search_text=\"what hotel is in a historic building\",\n",
" select='HotelName,Description,Category', query_caption='extractive', query_answer=\"extractive\",)\n",
"\n",
"semantic_answers = results.get_answers()\n",
"for answer in semantic_answers:\n",
Expand All @@ -415,7 +365,7 @@
" print(result[\"@search.reranker_score\"])\n",
" print(result[\"HotelName\"])\n",
" print(f\"Description: {result['Description']}\")\n",
" \n",
"\n",
" captions = result[\"@search.captions\"]\n",
" if captions:\n",
" caption = captions[0]\n",
Expand Down Expand Up @@ -448,7 +398,7 @@
"outputs": [],
"source": [
"try:\n",
" result = admin_client.delete_index(index_name)\n",
" result = index_client.delete_index(index_name)\n",
" print ('Index', index_name, 'Deleted')\n",
"except Exception as ex:\n",
" print (ex)"
Expand All @@ -469,7 +419,7 @@
"outputs": [],
"source": [
"try:\n",
" result = admin_client.get_index(index_name)\n",
" result = index_client.get_index(index_name)\n",
" print (result)\n",
"except Exception as ex:\n",
" print (ex)\n"
Expand Down
Loading