-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a clear working example of how to orchestrate multiple services #1272
Comments
@omenking Is this what you want? |
(I'm not OPEA maintainer, but I've dabbled a bit with this project.)
It refers to this: https://pypi.org/project/opea-comps/
You use it to set up the backend service routing graph, in your AI service frontend. All the example applications in (Debugging what the The subservices need started separately, but there are example Docker compose and Kubernetes Helm charts for that. Compose files are in |
I think they can, but it may require a bit of work. There are separate builds / container images for different accelerators (Gaudi devices, Intel, Nvidia and AMD GPUs). I think only example for using Intel GPU (through OpenVINO) is this one: https://github.com/opea-project/GenAIExamples/tree/main/EdgeCraftRAG Default models used by OPEA can be a bit too large to run on consumer GPU cards, e.g. the 7b / 8b FP16 model data will require 14-16GB of vRAM, and more with KV-cache etc configured, so some smaller model would need to be used with the amount of vRAM available on consumer GPUs. As to client CPUs, the issue is that CPU builds for the inferencing engines may be configured to use AVX512, which is normally supported only on Xeons. One would need to do a CPU build of the engine with that disabled, to run them on client CPUs. Here's relevant build setting for vLLM: https://github.com/vllm-project/vllm/blob/main/Dockerfile.cpu#L52
AFAIK:
While those inference engine's network APIs are very similar, e.g. how you start them differs. OPEA provides examples for them, and tries to abstract the differences, but you still need to tell OPEA services which inference engine/endpoint they should use / connect to. PS. I would be interested to hear about experiences on running OPEA services on consumer HW. Making OPEA more usable for teaching & students sounds like a nice goal, and there's an RFC process for such things: https://github.com/opea-project/docs/blob/main/community/CONTRIBUTING.md |
@omenking @eero-t the PR a llama.cpp LLM Component #1052 is almost ready. With it I've been able to run a couple small models using OPEA on my consumer grade laptop e.g. phi3.5, phi4, Qwen2.5-1.5b, and hope this will make OPEA more accessible for teaching + students to learn on consumer hardware.
|
There are some code examples how how you could create a mega service from multiple micro-services
These examples appear to be incomplete in explaining on how to get them to work.
https://opea-project.github.io/latest/GenAIComps/README.html
pip install opea-comps
but my library can't find the comps directoryOther Notes
The text was updated successfully, but these errors were encountered: