Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a clear working example of how to orchestrate multiple services #1272

Open
omenking opened this issue Feb 7, 2025 · 4 comments
Open
Assignees

Comments

@omenking
Copy link

omenking commented Feb 7, 2025

There are some code examples how how you could create a mega service from multiple micro-services
These examples appear to be incomplete in explaining on how to get them to work.

https://opea-project.github.io/latest/GenAIComps/README.html

  • It appears you can do pip install opea-comps but my library can't find the comps directory
  • It shows also to clone the repo but it downloads as GenAIComps so its my assumpution we have to reference it via GenAIComps
  • We have a megaservice example of class. What do you do with it? How do you run it? Do I just create an instance of the class and it should spin up expected resources

Other Notes

  • TGI suggests its only for Xeon and Gaudi but reviewing the code doesn't suggest it can run on consumer grade Intel hardware or GPUs.
  • Ollama is lacking documentation, I thought maybe I should not use TGI locally for teaching but then when I read about LLMs comp it suggests you have to use vLLM and TGI
  • Further investigation suggests that these three models all follow the OpenAI API schema so they likely will be interchangable.
@xiguiw
Copy link
Collaborator

xiguiw commented Feb 17, 2025

@omenking
For a clear working example of how to orchestrate multiple services, please refer to the GenAIExample repo.
We have vLLM and TGI in the GenAIExamples, and ollama for ChatQna AIPC.

Is this what you want?

@xiguiw xiguiw self-assigned this Feb 17, 2025
@eero-t
Copy link
Contributor

eero-t commented Feb 24, 2025

(I'm not OPEA maintainer, but I've dabbled a bit with this project.)

There are some code examples how how you could create a mega service from multiple micro-services These examples appear to be incomplete in explaining on how to get them to work.

https://opea-project.github.io/latest/GenAIComps/README.html

  • It appears you can do pip install opea-comps but my library can't find the comps directory

It refers to this: https://pypi.org/project/opea-comps/

  • It shows also to clone the repo but it downloads as GenAIComps so its my assumpution we have to reference it via GenAIComps

comps is directory inside GenAIComps. If only latter is in your library Python search path (e.g. because you did git clone on this repo), you need to prefix comps with it.

  • We have a megaservice example of class. What do you do with it? How do you run it? Do I just create an instance of the class and it should spin up expected resources

You use it to set up the backend service routing graph, in your AI service frontend. All the example applications in GenAIExamples project use megaservice for this, see e.g: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/chatqna.py#L191

(Debugging what the GenAIComps/comps/ code does is a bit hard because so much is done implicitly with decorators, and async is used a lot.)

The subservices need started separately, but there are example Docker compose and Kubernetes Helm charts for that. Compose files are in GenAIExamples project, and Helm charts in GenAIInfra project, see e.g: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/chatqna/README.md

@eero-t
Copy link
Contributor

eero-t commented Feb 24, 2025

  • TGI suggests its only for Xeon and Gaudi but reviewing the code doesn't suggest it can run on consumer grade Intel hardware or GPUs.

I think they can, but it may require a bit of work.

There are separate builds / container images for different accelerators (Gaudi devices, Intel, Nvidia and AMD GPUs). I think only example for using Intel GPU (through OpenVINO) is this one: https://github.com/opea-project/GenAIExamples/tree/main/EdgeCraftRAG

Default models used by OPEA can be a bit too large to run on consumer GPU cards, e.g. the 7b / 8b FP16 model data will require 14-16GB of vRAM, and more with KV-cache etc configured, so some smaller model would need to be used with the amount of vRAM available on consumer GPUs.

As to client CPUs, the issue is that CPU builds for the inferencing engines may be configured to use AVX512, which is normally supported only on Xeons. One would need to do a CPU build of the engine with that disabled, to run them on client CPUs. Here's relevant build setting for vLLM: https://github.com/vllm-project/vllm/blob/main/Dockerfile.cpu#L52

  • Ollama is lacking documentation, I thought maybe I should not use TGI locally for teaching but then when I read about LLMs comp it suggests you have to use vLLM and TGI

AFAIK:

  • Further investigation suggests that these three models all follow the OpenAI API schema so they likely will be interchangable.

While those inference engine's network APIs are very similar, e.g. how you start them differs. OPEA provides examples for them, and tries to abstract the differences, but you still need to tell OPEA services which inference engine/endpoint they should use / connect to.


PS. I would be interested to hear about experiences on running OPEA services on consumer HW. Making OPEA more usable for teaching & students sounds like a nice goal, and there's an RFC process for such things: https://github.com/opea-project/docs/blob/main/community/CONTRIBUTING.md

@edlee123
Copy link

edlee123 commented Feb 24, 2025

@omenking @eero-t the PR a llama.cpp LLM Component #1052 is almost ready. With it I've been able to run a couple small models using OPEA on my consumer grade laptop e.g. phi3.5, phi4, Qwen2.5-1.5b, and hope this will make OPEA more accessible for teaching + students to learn on consumer hardware.

PS. I would be interested to hear about experiences on running OPEA services on consumer HW. Making OPEA more usable for teaching & students sounds like a nice goal, and there's an RFC process for such things: https://github.com/opea-project/docs/blob/main/community/CONTRIBUTING.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants