Skip to content

Commit 9013110

Browse files
authored
docs: update ai-rate-limiting and ai-rag docs (#12107)
1 parent 26bff9a commit 9013110

File tree

2 files changed

+882
-93
lines changed

2 files changed

+882
-93
lines changed

docs/en/latest/plugins/ai-rag.md

Lines changed: 106 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ keywords:
55
- API Gateway
66
- Plugin
77
- ai-rag
8-
description: This document contains information about the Apache APISIX ai-rag Plugin.
8+
- AI
9+
- LLM
10+
description: The ai-rag Plugin enhances LLM outputs with Retrieval-Augmented Generation (RAG), efficiently retrieving relevant documents to improve accuracy and contextual relevance in responses.
911
---
1012

1113
<!--
@@ -27,57 +29,61 @@ description: This document contains information about the Apache APISIX ai-rag P
2729
#
2830
-->
2931

32+
<head>
33+
<link rel="canonical" href="https://docs.api7.ai/hub/ai-rag" />
34+
</head>
35+
3036
## Description
3137

32-
The `ai-rag` plugin integrates Retrieval-Augmented Generation (RAG) capabilities with AI models.
33-
It allows efficient retrieval of relevant documents or information from external data sources and
34-
augments the LLM responses with that data, improving the accuracy and context of generated outputs.
38+
The `ai-rag` Plugin provides Retrieval-Augmented Generation (RAG) capabilities with LLMs. It facilitates the efficient retrieval of relevant documents or information from external data sources, which are used to enhance the LLM responses, thereby improving the accuracy and contextual relevance of the generated outputs.
39+
40+
The Plugin supports using [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service) and [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) services for generating embeddings and performing vector search.
3541

3642
**_As of now only [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service) and [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) services are supported for generating embeddings and performing vector search respectively. PRs for introducing support for other service providers are welcomed._**
3743

38-
## Plugin Attributes
44+
## Attributes
3945

40-
| **Field** | **Required** | **Type** | **Description** |
46+
| Name | Required | Type | Description |
4147
| ----------------------------------------------- | ------------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
42-
| embeddings_provider | Yes | object | Configurations of the embedding models provider |
43-
| embeddings_provider.azure_openai | Yes | object | Configurations of [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service) as the embedding models provider. |
44-
| embeddings_provider.azure_openai.endpoint | Yes | string | Azure OpenAI endpoint |
45-
| embeddings_provider.azure_openai.api_key | Yes | string | Azure OpenAI API key |
46-
| vector_search_provider | Yes | object | Configuration for the vector search provider |
47-
| vector_search_provider.azure_ai_search | Yes | object | Configuration for Azure AI Search |
48-
| vector_search_provider.azure_ai_search.endpoint | Yes | string | Azure AI Search endpoint |
49-
| vector_search_provider.azure_ai_search.api_key | Yes | string | Azure AI Search API key |
48+
| embeddings_provider | True | object | Configurations of the embedding models provider. |
49+
| embeddings_provider.azure_openai | True | object | Configurations of [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service) as the embedding models provider. |
50+
| embeddings_provider.azure_openai.endpoint | True | string | Azure OpenAI embedding model endpoint. |
51+
| embeddings_provider.azure_openai.api_key | True | string | Azure OpenAI API key. |
52+
| vector_search_provider | True | object | Configuration for the vector search provider. |
53+
| vector_search_provider.azure_ai_search | True | object | Configuration for Azure AI Search. |
54+
| vector_search_provider.azure_ai_search.endpoint | True | string | Azure AI Search endpoint. |
55+
| vector_search_provider.azure_ai_search.api_key | True | string | Azure AI Search API key. |
5056

5157
## Request Body Format
5258

5359
The following fields must be present in the request body.
5460

55-
| **Field** | **Type** | **Description** |
61+
| Field | Type | Description |
5662
| -------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
57-
| ai_rag | object | Configuration for AI-RAG (Retrieval Augmented Generation) |
63+
| ai_rag | object | Request body RAG specifications. |
5864
| ai_rag.embeddings | object | Request parameters required to generate embeddings. Contents will depend on the API specification of the configured provider. |
5965
| ai_rag.vector_search | object | Request parameters required to perform vector search. Contents will depend on the API specification of the configured provider. |
6066

6167
- Parameters of `ai_rag.embeddings`
6268

6369
- Azure OpenAI
6470

65-
| **Name** | **Required** | **Type** | **Description** |
71+
| Name | Required | Type | Description |
6672
| --------------- | ------------ | -------- | -------------------------------------------------------------------------------------------------------------------------- |
67-
| input | Yes | string | Input text used to compute embeddings, encoded as a string. |
68-
| user | No | string | A unique identifier representing your end-user, which can help in monitoring and detecting abuse. |
69-
| encoding_format | No | string | The format to return the embeddings in. Can be either `float` or `base64`. Defaults to `float`. |
70-
| dimensions | No | integer | The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models. |
73+
| input | True | string | Input text used to compute embeddings, encoded as a string. |
74+
| user | False | string | A unique identifier representing your end-user, which can help in monitoring and detecting abuse. |
75+
| encoding_format | False | string | The format to return the embeddings in. Can be either `float` or `base64`. Defaults to `float`. |
76+
| dimensions | False | integer | The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models. |
7177

7278
For other parameters please refer to the [Azure OpenAI embeddings documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings).
7379

7480
- Parameters of `ai_rag.vector_search`
7581

7682
- Azure AI Search
7783

78-
| **Field** | **Required** | **Type** | **Description** |
84+
| Field | Required | Type | Description |
7985
| --------- | ------------ | -------- | ---------------------------- |
80-
| fields | Yes | String | Fields for the vector search |
86+
| fields | True | String | Fields for the vector search. |
8187

8288
For other parameters please refer the [Azure AI Search documentation](https://learn.microsoft.com/en-us/rest/api/searchservice/documents/search-post).
8389

@@ -95,106 +101,135 @@ Example request body:
95101
}
96102
```
97103

98-
## Example usage
104+
## Example
105+
106+
To follow along the example, create an [Azure account](https://portal.azure.com) and complete the following steps:
99107

100-
First initialise these shell variables:
108+
* In [Azure AI Foundry](https://oai.azure.com/portal), deploy a generative chat model, such as `gpt-4o`, and an embedding model, such as `text-embedding-3-large`. Obtain the API key and model endpoints.
109+
* Follow [Azure's example](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/basic-vector-workflow/azure-search-vector-python-sample.ipynb) to prepare for a vector search in [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) using Python. The example will create a search index called `vectest` with the desired schema and upload the [sample data](https://github.com/Azure/azure-search-vector-samples/blob/main/data/text-sample.json) which contains 108 descriptions of various Azure services, for embeddings `titleVector` and `contentVector` to be generated based on `title` and `content`. Complete all the setups before performing vector searches in Python.
110+
* In [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search), [obtain the Azure vector search API key and the search service endpoint](https://learn.microsoft.com/en-us/azure/search/search-get-started-vector?tabs=api-key#retrieve-resource-information).
111+
112+
Save the API keys and endpoints to environment variables:
101113

102114
```shell
103-
ADMIN_API_KEY=edd1c9f034335f136f87ad84b625c8f1
104-
AZURE_OPENAI_ENDPOINT=https://name.openai.azure.com/openai/deployments/gpt-4o/chat/completions
105-
VECTOR_SEARCH_ENDPOINT=https://name.search.windows.net/indexes/indexname/docs/search?api-version=2024-07-01
106-
EMBEDDINGS_ENDPOINT=https://name.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15
107-
EMBEDDINGS_KEY=secret-azure-openai-embeddings-key
108-
SEARCH_KEY=secret-azureai-search-key
109-
AZURE_OPENAI_KEY=secret-azure-openai-key
115+
# replace with your values
116+
117+
AZ_OPENAI_DOMAIN=https://ai-plugin-developer.openai.azure.com
118+
AZ_OPENAI_API_KEY=9m7VYroxITMDEqKKEnpOknn1rV7QNQT7DrIBApcwMLYJQQJ99ALACYeBjFXJ3w3AAABACOGXGcd
119+
AZ_CHAT_ENDPOINT=${AZ_OPENAI_DOMAIN}/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-15-preview
120+
AZ_EMBEDDING_MODEL=text-embedding-3-large
121+
AZ_EMBEDDINGS_ENDPOINT=${AZ_OPENAI_DOMAIN}/openai/deployments/${AZ_EMBEDDING_MODEL}/embeddings?api-version=2023-05-15
122+
123+
AZ_AI_SEARCH_SVC_DOMAIN=https://ai-plugin-developer.search.windows.net
124+
AZ_AI_SEARCH_KEY=IFZBp3fKVdq7loEVe9LdwMvVdZrad9A4lPH90AzSeC06SlR
125+
AZ_AI_SEARCH_INDEX=vectest
126+
AZ_AI_SEARCH_ENDPOINT=${AZ_AI_SEARCH_SVC_DOMAIN}/indexes/${AZ_AI_SEARCH_INDEX}/docs/search?api-version=2024-07-01
127+
```
128+
129+
:::note
130+
131+
You can fetch the `admin_key` from `config.yaml` and save to an environment variable with the following command:
132+
133+
```bash
134+
admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g')
110135
```
111136

112-
Create a route with the `ai-rag` and `ai-proxy` plugin like so:
137+
:::
138+
139+
### Integrate with Azure for RAG-Enhaned Responses
140+
141+
The following example demonstrates how you can use the [`ai-proxy`](./ai-proxy.md) Plugin to proxy requests to Azure OpenAI LLM and use the `ai-rag` Plugin to generate embeddings and perform vector search to enhance LLM responses.
142+
143+
Create a Route as such:
113144

114145
```shell
115-
curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
146+
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
116147
-H "X-API-KEY: ${ADMIN_API_KEY}" \
117148
-d '{
149+
"id": "ai-rag-route",
118150
"uri": "/rag",
119151
"plugins": {
120152
"ai-rag": {
121153
"embeddings_provider": {
122154
"azure_openai": {
123-
"endpoint": "'"$EMBEDDINGS_ENDPOINT"'",
124-
"api_key": "'"$EMBEDDINGS_KEY"'"
155+
"endpoint": "'"$AZ_EMBEDDINGS_ENDPOINT"'",
156+
"api_key": "'"$AZ_OPENAI_API_KEY"'"
125157
}
126158
},
127159
"vector_search_provider": {
128160
"azure_ai_search": {
129-
"endpoint": "'"$VECTOR_SEARCH_ENDPOINT"'",
130-
"api_key": "'"$SEARCH_KEY"'"
161+
"endpoint": "'"$AZ_AI_SEARCH_ENDPOINT"'",
162+
"api_key": "'"$AZ_AI_SEARCH_KEY"'"
131163
}
132164
}
133165
},
134166
"ai-proxy": {
167+
"provider": "openai",
135168
"auth": {
136169
"header": {
137-
"api-key": "'"$AZURE_OPENAI_KEY"'"
138-
},
139-
"query": {
140-
"api-version": "2023-03-15-preview"
141-
}
142-
},
143-
"model": {
144-
"provider": "openai",
145-
"name": "gpt-4",
146-
"options": {
147-
"max_tokens": 512,
148-
"temperature": 1.0
170+
"api-key": "'"$AZ_OPENAI_API_KEY"'"
149171
}
150172
},
173+
"model": "gpt-4o",
151174
"override": {
152-
"endpoint": "'"$AZURE_OPENAI_ENDPOINT"'"
175+
"endpoint": "'"$AZ_CHAT_ENDPOINT"'"
153176
}
154177
}
155-
},
156-
"upstream": {
157-
"type": "roundrobin",
158-
"nodes": {
159-
"someupstream.com:443": 1
160-
},
161-
"scheme": "https",
162-
"pass_host": "node"
163178
}
164179
}'
165180
```
166181

167-
The `ai-proxy` plugin is used here as it simplifies access to LLMs. Alternatively, you may configure the LLM service address in the upstream configuration and update the route URI as well.
168-
169-
Now send a request:
182+
Send a POST request to the Route with the vector fields name, embedding model dimensions, and an input prompt in the request body:
170183

171184
```shell
172-
curl http://127.0.0.1:9080/rag -XPOST -H 'Content-Type: application/json' -d '{"ai_rag":{"vector_search":{"fields":"contentVector"},"embeddings":{"input":"which service is good for devops","dimensions":1024}}}'
185+
curl "http://127.0.0.1:9080/rag" -X POST \
186+
-H "Content-Type: application/json" \
187+
-d '{
188+
"ai_rag":{
189+
"vector_search":{
190+
"fields":"contentVector"
191+
},
192+
"embeddings":{
193+
"input":"Which Azure services are good for DevOps?",
194+
"dimensions":1024
195+
}
196+
}
197+
}'
173198
```
174199

175-
You will receive a response like this:
200+
You should receive an `HTTP/1.1 200 OK` response similar to the following:
176201

177202
```json
178203
{
179204
"choices": [
180205
{
206+
"content_filter_results": {
207+
...
208+
},
181209
"finish_reason": "length",
182210
"index": 0,
211+
"logprobs": null,
183212
"message": {
184-
"content": "Here are the details for some of the services you inquired about from your Azure search context:\n\n ... <rest of the response>",
213+
"content": "Here is a list of Azure services categorized along with a brief description of each based on the provided JSON data:\n\n### Developer Tools\n- **Azure DevOps**: A suite of services that help you plan, build, and deploy applications, including Azure Boards, Azure Repos, Azure Pipelines, Azure Test Plans, and Azure Artifacts.\n- **Azure DevTest Labs**: A fully managed service to create, manage, and share development and test environments in Azure, supporting custom templates, cost management, and integration with Azure DevOps.\n\n### Containers\n- **Azure Kubernetes Service (AKS)**: A managed container orchestration service based on Kubernetes, simplifying deployment and management of containerized applications with features like automatic upgrades and scaling.\n- **Azure Container Instances**: A serverless container runtime to run and scale containerized applications without managing the underlying infrastructure.\n- **Azure Container Registry**: A fully managed Docker registry service to store and manage container images and artifacts.\n\n### Web\n- **Azure App Service**: A fully managed platform for building, deploying, and scaling web apps, mobile app backends, and RESTful APIs with support for multiple programming languages.\n- **Azure SignalR Service**: A fully managed real-time messaging service to build and scale real-time web applications.\n- **Azure Static Web Apps**: A serverless hosting service for modern web applications using static front-end technologies and serverless APIs.\n\n### Compute\n- **Azure Virtual Machines**: Infrastructure-as-a-Service (IaaS) offering for deploying and managing virtual machines in the cloud.\n- **Azure Functions**: A serverless compute service to run event-driven code without managing infrastructure.\n- **Azure Batch**: A job scheduling service to run large-scale parallel and high-performance computing (HPC) applications.\n- **Azure Service Fabric**: A platform to build, deploy, and manage scalable and reliable microservices and container-based applications.\n- **Azure Quantum**: A quantum computing service to build and run quantum applications.\n- **Azure Stack Edge**: A managed edge computing appliance to run Azure services and AI workloads on-premises or at the edge.\n\n### Security\n- **Azure Bastion**: A fully managed service providing secure and scalable remote access to virtual machines.\n- **Azure Security Center**: A unified security management service to protect workloads across Azure and on-premises infrastructure.\n- **Azure DDoS Protection**: A cloud-based service to protect applications and resources from distributed denial-of-service (DDoS) attacks.\n\n### Databases\n",
185214
"role": "assistant"
186215
}
187216
}
188217
],
189-
"created": 1727079764,
190-
"id": "chatcmpl-AAYdA40YjOaeIHfgFBkaHkUFCWxfc",
218+
"created": 1740625850,
219+
"id": "chatcmpl-B54gQdumpfioMPIybFnirr6rq9ZZS",
191220
"model": "gpt-4o-2024-05-13",
192221
"object": "chat.completion",
193-
"system_fingerprint": "fp_67802d9a6d",
222+
"prompt_filter_results": [
223+
{
224+
"prompt_index": 0,
225+
"content_filter_results": {
226+
...
227+
}
228+
}
229+
],
230+
"system_fingerprint": "fp_65792305e4",
194231
"usage": {
195-
"completion_tokens": 512,
196-
"prompt_tokens": 6560,
197-
"total_tokens": 7072
232+
...
198233
}
199234
}
200235
```

0 commit comments

Comments
 (0)