Provide a clear working example of how to orchestrate multiple services #1272

omenking · 2025-02-07T18:00:16Z

There are some code examples how how you could create a mega service from multiple micro-services
These examples appear to be incomplete in explaining on how to get them to work.

https://opea-project.github.io/latest/GenAIComps/README.html

It appears you can do pip install opea-comps but my library can't find the comps directory
It shows also to clone the repo but it downloads as GenAIComps so its my assumpution we have to reference it via GenAIComps
We have a megaservice example of class. What do you do with it? How do you run it? Do I just create an instance of the class and it should spin up expected resources

Other Notes

TGI suggests its only for Xeon and Gaudi but reviewing the code doesn't suggest it can run on consumer grade Intel hardware or GPUs.
Ollama is lacking documentation, I thought maybe I should not use TGI locally for teaching but then when I read about LLMs comp it suggests you have to use vLLM and TGI
Further investigation suggests that these three models all follow the OpenAI API schema so they likely will be interchangable.

The text was updated successfully, but these errors were encountered:

xiguiw · 2025-02-17T01:40:20Z

@omenking
For a clear working example of how to orchestrate multiple services, please refer to the GenAIExample repo.
We have vLLM and TGI in the GenAIExamples, and ollama for ChatQna AIPC.

Is this what you want?

eero-t · 2025-02-24T14:50:26Z

(I'm not OPEA maintainer, but I've dabbled a bit with this project.)

There are some code examples how how you could create a mega service from multiple micro-services These examples appear to be incomplete in explaining on how to get them to work.

https://opea-project.github.io/latest/GenAIComps/README.html

It appears you can do pip install opea-comps but my library can't find the comps directory

It refers to this: https://pypi.org/project/opea-comps/

It shows also to clone the repo but it downloads as GenAIComps so its my assumpution we have to reference it via GenAIComps

comps is directory inside GenAIComps. If only latter is in your library Python search path (e.g. because you did git clone on this repo), you need to prefix comps with it.

We have a megaservice example of class. What do you do with it? How do you run it? Do I just create an instance of the class and it should spin up expected resources

You use it to set up the backend service routing graph, in your AI service frontend. All the example applications in GenAIExamples project use megaservice for this, see e.g: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/chatqna.py#L191

(Debugging what the GenAIComps/comps/ code does is a bit hard because so much is done implicitly with decorators, and async is used a lot.)

The subservices need started separately, but there are example Docker compose and Kubernetes Helm charts for that. Compose files are in GenAIExamples project, and Helm charts in GenAIInfra project, see e.g: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/chatqna/README.md

eero-t · 2025-02-24T15:22:07Z

TGI suggests its only for Xeon and Gaudi but reviewing the code doesn't suggest it can run on consumer grade Intel hardware or GPUs.

I think they can, but it may require a bit of work.

There are separate builds / container images for different accelerators (Gaudi devices, Intel, Nvidia and AMD GPUs). I think only example for using Intel GPU (through OpenVINO) is this one: https://github.com/opea-project/GenAIExamples/tree/main/EdgeCraftRAG

Default models used by OPEA can be a bit too large to run on consumer GPU cards, e.g. the 7b / 8b FP16 model data will require 14-16GB of vRAM, and more with KV-cache etc configured, so some smaller model would need to be used with the amount of vRAM available on consumer GPUs.

As to client CPUs, the issue is that CPU builds for the inferencing engines may be configured to use AVX512, which is normally supported only on Xeons. One would need to do a CPU build of the engine with that disabled, to run them on client CPUs. Here's relevant build setting for vLLM: https://github.com/vllm-project/vllm/blob/main/Dockerfile.cpu#L52

Ollama is lacking documentation, I thought maybe I should not use TGI locally for teaching but then when I read about LLMs comp it suggests you have to use vLLM and TGI

AFAIK:

OPEA supported originally only HuggingFace TGI inferencing engine for LLM, vLLM support for that was added later
For now, HuggingFace TEI is used for embedding and reranking, but I think vLLM support for those is coming
There are some Ollama examples, but not as much as TGI and vLLM ones:
- https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/ollama
- https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/cpu/aipc
There's also PR for adding llama.cpp support: Adding a llama.cpp LLM Component #1052

Further investigation suggests that these three models all follow the OpenAI API schema so they likely will be interchangable.

While those inference engine's network APIs are very similar, e.g. how you start them differs. OPEA provides examples for them, and tries to abstract the differences, but you still need to tell OPEA services which inference engine/endpoint they should use / connect to.

PS. I would be interested to hear about experiences on running OPEA services on consumer HW. Making OPEA more usable for teaching & students sounds like a nice goal, and there's an RFC process for such things: https://github.com/opea-project/docs/blob/main/community/CONTRIBUTING.md

edlee123 · 2025-02-24T15:51:02Z

@omenking @eero-t the PR a llama.cpp LLM Component #1052 is almost ready. With it I've been able to run a couple small models using OPEA on my consumer grade laptop e.g. phi3.5, phi4, Qwen2.5-1.5b, and hope this will make OPEA more accessible for teaching + students to learn on consumer hardware.

PS. I would be interested to hear about experiences on running OPEA services on consumer HW. Making OPEA more usable for teaching & students sounds like a nice goal, and there's an RFC process for such things: https://github.com/opea-project/docs/blob/main/community/CONTRIBUTING.md

xiguiw self-assigned this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a clear working example of how to orchestrate multiple services #1272

Provide a clear working example of how to orchestrate multiple services #1272

omenking commented Feb 7, 2025

xiguiw commented Feb 17, 2025

eero-t commented Feb 24, 2025 •

edited

Loading

eero-t commented Feb 24, 2025 •

edited

Loading

edlee123 commented Feb 24, 2025 •

edited

Loading

Provide a clear working example of how to orchestrate multiple services #1272

Provide a clear working example of how to orchestrate multiple services #1272

Comments

omenking commented Feb 7, 2025

Other Notes

xiguiw commented Feb 17, 2025

eero-t commented Feb 24, 2025 • edited Loading

eero-t commented Feb 24, 2025 • edited Loading

edlee123 commented Feb 24, 2025 • edited Loading

eero-t commented Feb 24, 2025 •

edited

Loading

eero-t commented Feb 24, 2025 •

edited

Loading

edlee123 commented Feb 24, 2025 •

edited

Loading