Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ONNX & OpenVINO backend support, and torch dtype kwargs in Sentence Transformers Components #8813

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

lbux
Copy link
Contributor

@lbux lbux commented Feb 5, 2025

Related Issues

Proposed Changes:

Sentence Transformer based components now expose a backend parameter that allows the user to specify a different backend besides the default of torch. Supported backends are onnx and openvino. Documentation for these backends can be found here.

How did you test it?

Integration tests were added for each of the supported backends. I am making this PR early on to have the CI assist with some tests as I can not run them locally.

Notes for the reviewer

  • There is a major dependency conflict in the test environment. We can not install both the onnx and openvino backends at the same time due to a limitation with optimum-intel[openvino]==1.21.0 which does not support transformers>=4.47. The next optimum-intel update should add support for transformers>=4.47. For now, openvino tests are skipped.
  • I am looking into implementing explicit onnx-gpu support
  • I am looking into also implementing the changes into the SentenceTransformersDiversityRanker
  • I am going to add tests to support torch dtype quantization

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@lbux lbux requested review from a team as code owners February 5, 2025 00:58
@lbux lbux requested review from dfokina and anakin87 and removed request for a team February 5, 2025 00:58
@coveralls
Copy link
Collaborator

coveralls commented Feb 5, 2025

Pull Request Test Coverage Report for Build 13208168523

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 12 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.01%) to 92.288%

Files with Coverage Reduction New Missed Lines %
components/embedders/sentence_transformers_document_embedder.py 2 96.83%
components/embedders/sentence_transformers_text_embedder.py 2 96.36%
components/rankers/sentence_transformers_diversity.py 8 94.84%
Totals Coverage Status
Change from base Build 13203068476: -0.01%
Covered Lines: 9155
Relevant Lines: 9920

💛 - Coveralls

@lbux
Copy link
Contributor Author

lbux commented Feb 5, 2025

Current failing tests are due to #8811. PR is still WIP but the tests pass locally.

@lbux lbux marked this pull request as draft February 5, 2025 01:14
@anakin87
Copy link
Member

anakin87 commented Feb 5, 2025

@lbux to fix failing tests, try doing something similar to #8809. It should work....

@github-actions github-actions bot added the type:documentation Improvements on the docs label Feb 5, 2025
@lbux lbux changed the title feat: expose backend parameter in Sentence Transformer components for onnx and openvino models feat: Add ONNX & OpenVINO backend support, and torch dtype kwargs in Sentence Transformers Components Feb 6, 2025
@lbux
Copy link
Contributor Author

lbux commented Feb 7, 2025

Okay, after taking a look at how we could possibly do onnxruntime-gpu support, I don't think it makes sense for us to explicitly "support" or add tests for it. While it should technically work, there are some requirements that make testing for it a bit complicated. Currently, when using onnx, everything will be run on the CPU as it accelerates CPU inference. Additionally, this allows for using quantized models. If a user installs the onnxruntime-gpu package instead of the onnxruntime, they can pass in model_kwargs={"provider": "CUDAExecutionProvider"} or model_kwargs={"provider": "TensorrtExecutionProvider" depending on whether their objective is to offload some of the operations to the GPU while using an unquantized model or to offload and accelerate some of the operations while using static quantized models, respectively. More information on the limitations can be found here.

At this point, I will undraft this PR and take any suggestions/reviews for what I should fix or modify.

@lbux lbux marked this pull request as ready for review February 7, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sentence Transformers components do not support ONNX or OpenVINO formats
3 participants