You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- EmbeddingClient implementation that computes, locally, sentence embeddings with SBERT transformers.
- Uses pre-trained transformer models, serialized into Open Neural Network Exchange (ONNX) format.
- Deep Java Library and the Microsoft ONNX Java Runtime are used to run
the ONNX models and compute the embeddings efficiently.
- Add default tokenizer.json and model.onnx for sentence-transformers/all-MiniLM-L6-v2.
- Add, configurable resource caching service to allow caching
remote (http/https) resources to the local FS.
- README.md provides information on how to serialize ONNX models.
- add Git LFS configuration for large onnx model files.
The `TransformersEmbeddingClient` is a `EmbeddingClient` implementation that computes, locally, [sentence embeddings](https://www.sbert.net/examples/applications/computing-embeddings/README.html#sentence-embeddings-with-transformers) using a selected [sentence transformer](https://www.sbert.net/).
4
+
5
+
It uses [pre-trained](https://www.sbert.net/docs/pretrained_models.html) transformer models, serialized into the [Open Neural Network Exchange (ONNX)](https://onnx.ai/) format.
6
+
7
+
The [Deep Java Library](https://djl.ai/) and the Microsoft [ONNX Java Runtime](https://onnxruntime.ai/docs/get-started/with-java.html) libraries are applied to run the ONNX models and compute the embeddings in Java.
8
+
9
+
## Serialize the Tokenizer and the Transformer Model
10
+
11
+
To run things in Java, we need to serialize the Tokenizer and the Transformer Model into ONNX format.
12
+
13
+
### Serialize with optimum-cli
14
+
15
+
One, quick, way to achieve this, is to use the [optimum-cli](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) command line tool.
16
+
17
+
Following snippet creates an python virtual environment, installs the required packages and runs the optimum-cli to serialize (e.g. export) the models:
The `optimum-cli` command exports the [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) transformer into the `onnx-output-folder` folder. Later includes the `tokenizer.json` and `model.onnx` files used by the embedding client.
28
+
29
+
## Apply the ONNX model
30
+
31
+
Use the `setTokenizerResource(tokenizerJsonUri)` and `setModelResource(modelOnnxUri)` methods to set the URI locations of the exported `tokenizer.json` and `model.onnx` files.
32
+
The `classpath:`, `file:` or `https:` URI schemas are supported.
33
+
34
+
If no other model is explicitly set, the `TransformersEmbeddingClient` defaults to [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model:
35
+
36
+
|||
37
+
| -------- | ------- |
38
+
| Dimensions |384 |
39
+
| Avg. performance | 58.80 |
40
+
| Speed | 14200 sentences/sec |
41
+
| Size | 80MB |
42
+
43
+
44
+
Following snippet illustrates how to use the `TransformersEmbeddingClient`:
0 commit comments