-
Notifications
You must be signed in to change notification settings - Fork 34
Support streaming from multiple pipeline components #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
a6e51c4
Support streaming from multiple pipeline components
mpangrazzi 0e982d8
Lint
mpangrazzi 9ae5b0f
Fix example
mpangrazzi 3b31206
Update example (cleaner) and docs
mpangrazzi 38be9b0
Support for granular / 'all' / env var based streaming components con…
mpangrazzi ac9d902
Update src/hayhooks/server/pipelines/utils.py
mpangrazzi c3d7f7d
Lint
mpangrazzi 53b5b41
Fix type issues with Literal ; Make streaming concurrency-safe
mpangrazzi 459c899
Refactoring
mpangrazzi c708f0c
Refine example README
mpangrazzi abf2f2b
Use a list instead of a dict for streaming_components (whitelist)
mpangrazzi b9fca78
Reformat
mpangrazzi a36221a
remove unneeded 'template_variables' ; add required_variables * to tu…
mpangrazzi d89bd0f
Fix docs
mpangrazzi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
107 changes: 107 additions & 0 deletions
107
examples/pipeline_wrappers/multi_llm_streaming/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| # Multi-LLM Streaming Example | ||
|
|
||
| This example demonstrates hayhooks' configurable multi-component streaming support. | ||
|
|
||
| ## Overview | ||
|
|
||
| The pipeline contains **two LLM components in sequence**: | ||
|
|
||
| 1. **LLM 1** (`gpt-5-nano` with `reasoning_effort: low`): Provides a short, concise initial answer to the user's question | ||
| 2. **LLM 2** (`gpt-5-nano` with `reasoning_effort: medium`): Refines and expands the answer into a detailed, professional response | ||
|
|
||
| This example uses `streaming_components` to enable streaming for **both** LLMs. By default, only the last component would stream. | ||
|
|
||
|  | ||
|
|
||
| ## How It Works | ||
|
|
||
| ### Streaming Configuration | ||
|
|
||
| By default, hayhooks streams only the **last** streaming-capable component (in this case, LLM 2). However, this example demonstrates using the `streaming_components` parameter to enable streaming for both components: | ||
|
|
||
| ```python | ||
| streaming_generator( | ||
| pipeline=self.pipeline, | ||
| pipeline_run_args={...}, | ||
| streaming_components=["llm_1", "llm_2"] # or streaming_components="all" | ||
| ) | ||
| ``` | ||
|
|
||
| **Available options:** | ||
|
|
||
| - **Default behavior** (no `streaming_components` or `None`): Only the last streaming component streams | ||
| - **Stream all components**: `streaming_components=["llm_1", "llm_2"]` (same as `streaming_components="all"`) | ||
| - **Stream only first**: `streaming_components=["llm_1"]` | ||
| - **Stream only last** (same as default): `streaming_components=["llm_2"]` | ||
|
|
||
| ### Pipeline Architecture | ||
|
|
||
| The pipeline connects LLM 1's replies directly to the second prompt builder. Using Jinja2 template syntax, the second prompt builder can access the `ChatMessage` attributes directly: `{{previous_response[0].text}}`. This approach is simple and doesn't require any custom extraction components. | ||
|
|
||
| This example also demonstrates injecting a visual separator (`**[LLM 2 - Refining the response]**`) between the two LLM outputs using `StreamingChunk.component_info` to detect component transitions. | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Deploy with Hayhooks | ||
|
|
||
| ```bash | ||
| # Set your OpenAI API key | ||
| export OPENAI_API_KEY=your_api_key_here | ||
|
|
||
| # Deploy the pipeline | ||
| hayhooks deploy examples/pipeline_wrappers/multi_llm_streaming | ||
|
|
||
| # Test it via OpenAI-compatible API | ||
| curl -X POST http://localhost:1416/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model": "multi_llm_streaming", | ||
| "messages": [{"role": "user", "content": "What is machine learning?"}], | ||
| "stream": true | ||
| }' | ||
| ``` | ||
|
|
||
| ### Use Directly in Code | ||
|
|
||
| ```python | ||
| from haystack import Pipeline | ||
| from haystack.components.builders import ChatPromptBuilder | ||
| from haystack.dataclasses import ChatMessage | ||
| from hayhooks import streaming_generator | ||
|
|
||
| # Create your pipeline with multiple streaming components | ||
| pipeline = Pipeline() | ||
| # ... add LLM 1 and prompt_builder_1 ... | ||
|
|
||
| # Add second prompt builder that accesses ChatMessage attributes via Jinja2 | ||
| pipeline.add_component( | ||
| "prompt_builder_2", | ||
| ChatPromptBuilder( | ||
| template=[ | ||
| ChatMessage.from_system("You are a helpful assistant."), | ||
| ChatMessage.from_user("Previous: {{previous_response[0].text}}\n\nRefine this.") | ||
| ] | ||
| ) | ||
| ) | ||
| # ... add LLM 2 ... | ||
|
|
||
| # Connect: LLM 1 replies directly to prompt_builder_2 | ||
| pipeline.connect("llm_1.replies", "prompt_builder_2.previous_response") | ||
|
|
||
| # Enable streaming for both LLMs (by default, only the last would stream) | ||
| for chunk in streaming_generator( | ||
| pipeline=pipeline, | ||
| pipeline_run_args={"prompt_builder_1": {"query": "Your question"}}, | ||
| streaming_components=["llm_1", "llm_2"] # Stream both components | ||
| ): | ||
| print(chunk.content, end="", flush=True) | ||
| ``` | ||
|
|
||
| ## Integration with OpenWebUI | ||
|
|
||
| This pipeline works seamlessly with OpenWebUI: | ||
|
|
||
| 1. Configure OpenWebUI to connect to hayhooks (see [OpenWebUI Integration docs](https://deepset-ai.github.io/hayhooks/features/openwebui-integration)) | ||
| 2. Deploy this pipeline | ||
| 3. Select it as a model in OpenWebUI | ||
| 4. Watch both LLMs stream their responses in real-time |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.