Skip to content

Commit 2854af9

Browse files
Add multimodal embeddings guide (#3364)
--------- Co-authored-by: Louis Dureuil <[email protected]>
1 parent 6b42a0a commit 2854af9

File tree

3 files changed

+260
-10
lines changed

3 files changed

+260
-10
lines changed

docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@
167167
"learn/ai_powered_search/getting_started_with_ai_search",
168168
"learn/ai_powered_search/configure_rest_embedder",
169169
"learn/ai_powered_search/document_template_best_practices",
170+
"learn/ai_powered_search/image_search_with_multimodal_embeddings",
170171
"learn/ai_powered_search/image_search_with_user_provided_embeddings",
171172
"learn/ai_powered_search/search_with_user_provided_embeddings",
172173
"learn/ai_powered_search/retrieve_related_search_results",
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
---
2+
title: Image search with multimodal embeddings
3+
description: This article shows you the main steps for performing multimodal text-to-image searches
4+
---
5+
6+
This guide shows the main steps to search through a database of images using Meilisearch's experimental multimodal embeddings.
7+
8+
## Requirements
9+
10+
- A database of images
11+
- A Meilisearch project
12+
- Access to a multimodal embedding provider (for example, [VoyageAI multimodal embeddings](https://docs.voyageai.com/reference/multimodal-embeddings-api))
13+
14+
## Enable multimodal embeddings
15+
16+
First, enable the `multimodal` experimental feature:
17+
18+
```sh
19+
curl \
20+
-X PATCH 'MEILISEARCH_URL/experimental-features/' \
21+
-H 'Content-Type: application/json' \
22+
--data-binary '{
23+
"multimodal": true
24+
}'
25+
```
26+
27+
You may also enable multimodal in your Meilisearch Cloud project's general settings, under "Experimental features".
28+
29+
## Configure a multimodal embedder
30+
31+
Much like other embedders, multimodal embedders must set their `source` to `rest` and explicitly declare their `url`. Depending on your chosen provider, you may also have to specify `apiKey`.
32+
33+
All multimodal embedders must contain an `indexingFragments` field and a `searchFragments` field. Fragments are sets of embeddings built out of specific parts of document data.
34+
35+
Fragments must follow the structure defined by the REST API of your chosen provider.
36+
37+
### `indexingFragments`
38+
39+
Use `indexingFragments` to tell Meilisearch how to send document data to the provider's API when generating document embeddings.
40+
41+
For example, when using VoyageAI's multimodal model, an indexing fragment might look like this:
42+
43+
```json
44+
"indexingFragments": {
45+
"TEXTUAL_FRAGMENT_NAME": {
46+
"value": {
47+
"content": [
48+
{
49+
"type": "text",
50+
"text": "A document named {{doc.title}} described as {{doc.description}}"
51+
}
52+
]
53+
}
54+
},
55+
"IMAGE_FRAGMENT_NAME": {
56+
"value": {
57+
"content": [
58+
{
59+
"type": "image_url",
60+
"image_url": "{{doc.poster_url}}"
61+
}
62+
]
63+
}
64+
}
65+
}
66+
```
67+
68+
The example above requests Meilisearch to create two sets of embeddings during indexing: one for the textual description of an image, and another for the actual image.
69+
70+
Any JSON string value appearing in a fragment is handled as a Liquid template, where you interpolate document data present in `doc`. In `IMAGE_FRAGMENT_NAME`, that's `image_url` which outputs the plain URL string in the document field `poster_url`. In `TEXT_FRAGMENT_NAME`, `text` contains a longer string contextualizing two document fields, `title` and `description`.
71+
72+
### `searchFragments`
73+
74+
Use `searchFragments` to tell Meilisearch how to send search query data to the chosen provider's REST API when converting them into embeddings:
75+
76+
```json
77+
"searchFragments": {
78+
"USER_TEXT_FRAGMENT": {
79+
"value": {
80+
"content": [
81+
{
82+
"type": "text",
83+
"text": "{{q}}"
84+
}
85+
]
86+
}
87+
},
88+
"USER_SUBMITTED_IMAGE_FRAGMENT": {
89+
"value": {
90+
"content": [
91+
{
92+
"type": "image_base64",
93+
"image_base64": "data:{{media.image.mime}};base64,{{media.image.data}}"
94+
}
95+
]
96+
}
97+
}
98+
}
99+
```
100+
101+
In this example, two modes of search are configured:
102+
103+
1. A textual search based on the `q` parameter, which will be embedded as text
104+
2. An image search based on [data url](https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data) rebuilt from the `image.mime` and `image.data` field in the `media` field of the query
105+
106+
Search fragments have access to data present in the query parameters `media` and `q`.
107+
108+
Each semantic search query for this embedder should match exactly one search fragment of this embedder, so the fragments should each have at least one disambiguating field
109+
110+
### Complete embedder configuration
111+
112+
Your embedder should look similar to this example with all fragments and embedding provider data:
113+
114+
```sh
115+
curl \
116+
-X PATCH 'MEILISEARCH_URL/indexes/INDEX_NAME/settings' \
117+
-H 'Content-Type: application/json' \
118+
--data-binary '{
119+
"embedders": {
120+
"MULTIMODAL_EMBEDDER_NAME": {
121+
"source": "rest",
122+
"url": "https://api.voyageai.com/v1/multimodal-embeddings",
123+
"apiKey": "VOYAGE_API_KEY",
124+
"indexingFragments": {
125+
"TEXTUAL_FRAGMENT_NAME": {
126+
"value": {
127+
"content": [
128+
{
129+
"type": "text",
130+
"text": "A document named {{doc.title}} described as {{doc.description}}"
131+
}
132+
]
133+
}
134+
},
135+
"IMAGE_FRAGMENT_NAME": {
136+
"value": {
137+
"content": [
138+
{
139+
"type": "image_url",
140+
"image_url": "{{doc.poster_url}}"
141+
}
142+
]
143+
}
144+
}
145+
},
146+
"searchFragments": {
147+
"USER_TEXT_FRAGMENT": {
148+
"value": {
149+
"content": [
150+
{
151+
"type": "text",
152+
"text": "{{q}}"
153+
}
154+
]
155+
}
156+
},
157+
"USER_SUBMITTED_IMAGE_FRAGMENT": {
158+
"value": {
159+
"content": [
160+
{
161+
"type": "image_base64",
162+
"image_base64": "data:{{media.image.mime}};base64,{{media.image.data}}"
163+
}
164+
]
165+
}
166+
}
167+
}
168+
}
169+
}
170+
}'
171+
```
172+
173+
## Add documents
174+
175+
Once your embedder is configured, you can [add documents to your index](/learn/getting_started/cloud_quick_start) with the [`/documents` endpoint](/reference/api/documents).
176+
177+
During indexing, Meilisearch will automatically generate multimodal embeddings for each document using the configured `indexingFragments`.
178+
179+
## Perform searches
180+
181+
The final step is to perform searches using different types of content.
182+
183+
### Use text to search for images
184+
185+
Use the following search query to retrieve a mix of documents with images matching the description, documents with and documents containing the specified keywords:
186+
187+
```sh
188+
curl -X POST 'http://localhost:7700/indexes/INDEX_NAME/search' \
189+
-H 'Content-Type: application/json' \
190+
--data-binary '{
191+
"q": "a mountain sunset with snow",
192+
"hybrid": {
193+
"embedder": "MULTIMODAL_EMBEDDER_NAME"
194+
}
195+
}'
196+
```
197+
198+
### Use an image to search for images
199+
200+
You can also use an image to search for other, similar images:
201+
202+
```sh
203+
curl -X POST 'http://localhost:7700/indexes/INDEX_NAME/search' \
204+
-H 'Content-Type: application/json' \
205+
--data-binary '{
206+
"media": {
207+
"image": {
208+
"mime": "image/jpeg",
209+
"data": "<BASE64_ENCODED_IMAGE>"
210+
}
211+
},
212+
"hybrid": {
213+
"embedder": "MULTIMODAL_EMBEDDER_NAME"
214+
}
215+
}'
216+
```
217+
218+
<Tip>
219+
In most cases you will need a GUI interface that allows users to submit their images and converts these images to Base64 format. Creating this is outside the scope of this guide.
220+
</Tip>
221+
222+
## Conclusion
223+
224+
With multimodal embedders you can:
225+
226+
1. Configure Meilisearch to embed both images and queries
227+
2. Add image documents — Meilisearch automatically generates embeddings
228+
3. Accept text or image input from users
229+
4. Run hybrid searches using a mix of textual and input from other types of media, or run pure semantic semantic searches using only non-textual input

reference/api/settings.mdx

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2939,6 +2939,14 @@ For example, for [VoyageAI's multimodal embedding route](https://docs.voyageai.c
29392939

29402940
Use Liquid templates to interpolate document data into the fragment fields, where `doc` gives you access to all fields within a document.
29412941

2942+
<Warning>
2943+
If a Liquid template appearing inside of a fragment cannot be rendered, no embedding will be generated for that fragment and that document. If a document has no indexing fragments, it will not be returned in multimodal searches. In most cases, a fragment is not rendered because a field it references is missing in the document.
2944+
2945+
This is different from embeddings based on `documentTemplate`, which abort the indexing task if the document template cannot be rendered for a document.
2946+
2947+
You can check which documents have embeddings for a given fragment using [vector filters](/learn/filtering_and_sorting/filter_expression_reference#vector-filters).
2948+
</Warning>
2949+
29422950
`indexingFragments` is optional when using the `rest` source.
29432951

29442952
`indexingFragments` is incompatible with all other embedder sources.
@@ -2974,19 +2982,31 @@ curl \
29742982

29752983
As with `indexingFragments`, the content of `value` should follow your model's specification.
29762984

2977-
Use Liquid templates to interpolate search query data into the fragment fields, where `media` gives you access to all multimodal data received with a query:
2985+
Use Liquid templates to interpolate search query data into the fragment fields, where `{{media.*}}` gives you access to all [multimodal data received with a query](/reference/api/search#media) and `{{q}}` gives you access to the regular textual query:
29782986

29792987
```json
2980-
"SEARCH_FRAGMENT_A": {
2981-
"value": {
2982-
"content": [
2983-
{
2984-
"type": "image_base64",
2985-
"image_base64": "data:{{media.image.mime}};base64,{{media.image.data}}"
2986-
}
2987-
]
2988+
{
2989+
"SEARCH_FRAGMENT_A": {
2990+
"value": {
2991+
"content": [
2992+
{
2993+
"type": "image_base64",
2994+
"image_base64": "data:{{media.image.mime}};base64,{{media.image.data}}"
2995+
}
2996+
]
2997+
}
2998+
},
2999+
"SEARCH_FRAGMENT_B": {
3000+
"value": {
3001+
"content": [
3002+
{
3003+
"type": "text",
3004+
"text": "{{q}}"
3005+
}
3006+
]
3007+
}
29883008
}
2989-
},
3009+
}
29903010
```
29913011

29923012
`searchFragments` is optional when using the `rest` source.

0 commit comments

Comments
 (0)