You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support streaming from multiple pipeline components (#178)
* Support streaming from multiple pipeline components
* Lint
* Fix example
* Update example (cleaner) and docs
* Support for granular / 'all' / env var based streaming components configuration
* Update src/hayhooks/server/pipelines/utils.py
Co-authored-by: Stefano Fiorucci <[email protected]>
* Lint
* Fix type issues with Literal ; Make streaming concurrency-safe
* Refactoring
* Refine example README
* Use a list instead of a dict for streaming_components (whitelist)
* Reformat
* remove unneeded 'template_variables' ; add required_variables * to turn off warnings
* Fix docs
---------
Co-authored-by: Stefano Fiorucci <[email protected]>
By default, Hayhooks streams only the **last** streaming-capable component in your pipeline. This is usually what you want - the final output streaming to users.
183
+
184
+
For advanced use cases, you can control which components stream using the `streaming_components` parameter.
185
+
186
+
When your pipeline contains multiple components that support streaming (e.g., multiple LLMs), you can control which ones stream their outputs as the pipeline executes.
187
+
188
+
### Default Behavior: Stream Only the Last Component
189
+
190
+
By default, only the last streaming-capable component will stream:
191
+
192
+
```python
193
+
classMultiLLMWrapper(BasePipelineWrapper):
194
+
defsetup(self) -> None:
195
+
from haystack.components.builders import ChatPromptBuilder
196
+
from haystack.components.generators.chat import OpenAIChatGenerator
197
+
from haystack.dataclasses import ChatMessage
198
+
199
+
self.pipeline = Pipeline()
200
+
201
+
# First LLM - initial answer
202
+
self.pipeline.add_component(
203
+
"prompt_1",
204
+
ChatPromptBuilder(
205
+
template=[
206
+
ChatMessage.from_system("You are a helpful assistant."),
**What happens:** Only `llm_2` (the last streaming-capable component) streams its responses token by token. The first LLM (`llm_1`) executes normally without streaming, and only the final refined output streams to the user.
243
+
244
+
### Advanced: Stream Multiple Components with `streaming_components`
245
+
246
+
For advanced use cases where you want to see outputs from multiple components, use the `streaming_components` parameter:
streaming_components=["llm_1", "llm_2"] # Stream both components
257
+
)
258
+
```
259
+
260
+
**What happens:** Both LLMs stream their responses token by token. First you'll see the initial answer from `llm_1` streaming, then the refined answer from `llm_2` streaming.
261
+
262
+
You can also selectively enable streaming for specific components:
263
+
264
+
```python
265
+
# Stream only the first LLM
266
+
streaming_components=["llm_1"]
267
+
268
+
# Stream only the second LLM (same as default)
269
+
streaming_components=["llm_2"]
270
+
271
+
# Stream ALL capable components (shorthand)
272
+
streaming_components="all"
273
+
274
+
# Stream ALL capable components (specific list)
275
+
streaming_components=["llm_1", "llm_2"]
276
+
```
277
+
278
+
### Using the "all" Keyword
279
+
280
+
The `"all"` keyword is a convenient shorthand to enable streaming for all capable components:
281
+
282
+
```python
283
+
return streaming_generator(
284
+
pipeline=self.pipeline,
285
+
pipeline_run_args={...},
286
+
streaming_components="all"# Enable all streaming components
287
+
)
288
+
```
289
+
290
+
This is equivalent to explicitly enabling every streaming-capable component in your pipeline.
291
+
292
+
### Global Configuration via Environment Variable
293
+
294
+
You can set a global default using the `HAYHOOKS_STREAMING_COMPONENTS` environment variable. This applies to all pipelines unless overridden:
YAML configuration follows the same priority rules: YAML setting > environment variable > default.
362
+
363
+
See the [Multi-LLM Streaming Example](https://github.com/deepset-ai/hayhooks/tree/main/examples/pipeline_wrappers/multi_llm_streaming) for a complete working implementation.
364
+
179
365
## File Upload Support
180
366
181
367
Hayhooks can handle file uploads by adding a `files` parameter:
Copy file name to clipboardExpand all lines: docs/reference/environment-variables.md
+27Lines changed: 27 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,6 +44,32 @@ Hayhooks can be configured via environment variables (loaded with prefix `HAYHOO
44
44
- Default: `false`
45
45
- Description: Include tracebacks in error messages (server and MCP)
46
46
47
+
### HAYHOOKS_STREAMING_COMPONENTS
48
+
49
+
- Default: `""` (empty string)
50
+
- Description: Global configuration for which pipeline components should stream
51
+
- Options:
52
+
-`""` (empty): Stream only the last capable component (default)
53
+
-`"all"`: Stream all streaming-capable components
54
+
- Comma-separated list: `"llm_1,llm_2"` to enable specific components
55
+
56
+
!!! note "Priority Order"
57
+
Pipeline-specific settings (via `streaming_components` parameter or YAML) override this global default.
58
+
59
+
!!! tip "Component-Specific Control"
60
+
For component-specific control, use the `streaming_components` parameter in your code or YAML configuration instead of the environment variable to specify exactly which components should stream.
61
+
62
+
**Examples:**
63
+
64
+
```bash
65
+
# Stream all components globally
66
+
export HAYHOOKS_STREAMING_COMPONENTS="all"
67
+
68
+
# Stream specific components (comma-separated, spaces are trimmed)
|[multi_llm_streaming](./pipeline_wrappers/multi_llm_streaming/)| Multiple LLM components with automatic streaming | • Two sequential LLMs<br/>• Automatic multi-component streaming<br/>• No special configuration needed<br/>• Shows default streaming behavior | Demonstrating how hayhooks automatically streams from all components in a pipeline |
9
10
|[async_question_answer](./pipeline_wrappers/async_question_answer/)| Async question-answering pipeline with streaming support | • Async pipeline execution<br/>• Streaming responses<br/>• OpenAI Chat Generator<br/>• Both API and chat completion interfaces | Building conversational AI systems that need async processing and real-time streaming responses |
10
11
|[chat_with_website](./pipeline_wrappers/chat_with_website/)| Answer questions about website content | • Web content fetching<br/>• HTML to document conversion<br/>• Content-based Q&A<br/>• Configurable URLs | Creating AI assistants that can answer questions about specific websites or web-based documentation |
11
12
|[chat_with_website_mcp](./pipeline_wrappers/chat_with_website_mcp/)| MCP-compatible website chat pipeline | • MCP (Model Context Protocol) support<br/>• Website content analysis<br/>• API-only interface<br/>• Simplified deployment | Integrating website analysis capabilities into MCP-compatible AI systems and tools |
By default, hayhooks streams only the **last** streaming-capable component (in this case, LLM 2). However, this example demonstrates using the `streaming_components` parameter to enable streaming for both components:
21
+
22
+
```python
23
+
streaming_generator(
24
+
pipeline=self.pipeline,
25
+
pipeline_run_args={...},
26
+
streaming_components=["llm_1", "llm_2"] # or streaming_components="all"
27
+
)
28
+
```
29
+
30
+
**Available options:**
31
+
32
+
-**Default behavior** (no `streaming_components` or `None`): Only the last streaming component streams
33
+
-**Stream all components**: `streaming_components=["llm_1", "llm_2"]` (same as `streaming_components="all"`)
34
+
-**Stream only first**: `streaming_components=["llm_1"]`
35
+
-**Stream only last** (same as default): `streaming_components=["llm_2"]`
36
+
37
+
### Pipeline Architecture
38
+
39
+
The pipeline connects LLM 1's replies directly to the second prompt builder. Using Jinja2 template syntax, the second prompt builder can access the `ChatMessage` attributes directly: `{{previous_response[0].text}}`. This approach is simple and doesn't require any custom extraction components.
40
+
41
+
This example also demonstrates injecting a visual separator (`**[LLM 2 - Refining the response]**`) between the two LLM outputs using `StreamingChunk.component_info` to detect component transitions.
0 commit comments