deepset-ai
diff --git a/‎docs/concepts/pipeline-wrapper.md‎
Lines changed: 186 additions & 0 deletions b/‎docs/concepts/pipeline-wrapper.md‎
Lines changed: 186 additions & 0 deletions
diff --git a/‎docs/reference/environment-variables.md‎
Lines changed: 27 additions & 0 deletions b/‎docs/reference/environment-variables.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎examples/README.md‎
Lines changed: 1 addition & 0 deletions b/‎examples/README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/pipeline_wrappers/multi_llm_streaming/README.md‎
Lines changed: 107 additions & 0 deletions b/‎examples/pipeline_wrappers/multi_llm_streaming/README.md‎
Lines changed: 107 additions & 0 deletions
diff --git a/‎examples/pipeline_wrappers/multi_llm_streaming/multi_stream.gif‎
1.13 MB b/‎examples/pipeline_wrappers/multi_llm_streaming/multi_stream.gif‎
1.13 MB
@@ -176,6 +176,192 @@ async def run_chat_completion_async(self, model: str, messages: List[dict], body
     )
 ```
 
+## Streaming from Multiple Components
+
+!!! info "Smart Streaming Behavior"
+    By default, Hayhooks streams only the **last** streaming-capable component in your pipeline. This is usually what you want - the final output streaming to users.
+
+For advanced use cases, you can control which components stream using the `streaming_components` parameter.
+
+When your pipeline contains multiple components that support streaming (e.g., multiple LLMs), you can control which ones stream their outputs as the pipeline executes.
+
+### Default Behavior: Stream Only the Last Component
+
+By default, only the last streaming-capable component will stream:
+
+```python
+class MultiLLMWrapper(BasePipelineWrapper):
+    def setup(self) -> None:
+        from haystack.components.builders import ChatPromptBuilder
+        from haystack.components.generators.chat import OpenAIChatGenerator
+        from haystack.dataclasses import ChatMessage
+
+        self.pipeline = Pipeline()
+
+        # First LLM - initial answer
+        self.pipeline.add_component(
+            "prompt_1",
+            ChatPromptBuilder(
+                template=[
+                    ChatMessage.from_system("You are a helpful assistant."),
+                    ChatMessage.from_user("{{query}}")
+                ]
+            )
+        )
+        self.pipeline.add_component("llm_1", OpenAIChatGenerator(model="gpt-4o-mini"))
+
+        # Second LLM - refines the answer using Jinja2 to access ChatMessage attributes
+        self.pipeline.add_component(
+            "prompt_2",
+            ChatPromptBuilder(
+                template=[
+                    ChatMessage.from_system("You are a helpful assistant that refines responses."),
+                    ChatMessage.from_user(
+                        "Previous response: {{previous_response[0].text}}\n\nRefine this."
+                    )
+                ]
+            )
+        )
+        self.pipeline.add_component("llm_2", OpenAIChatGenerator(model="gpt-4o-mini"))
+
+        # Connect components - LLM 1's replies go directly to prompt_2
+        self.pipeline.connect("prompt_1.prompt", "llm_1.messages")
+        self.pipeline.connect("llm_1.replies", "prompt_2.previous_response")
+        self.pipeline.connect("prompt_2.prompt", "llm_2.messages")
+
+    def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Generator:
+        question = get_last_user_message(messages)
+
+        # By default, only llm_2 (the last streaming component) will stream
+        return streaming_generator(
+            pipeline=self.pipeline,
+            pipeline_run_args={"prompt_1": {"query": question}}
+        )
+```
+
+**What happens:** Only `llm_2` (the last streaming-capable component) streams its responses token by token. The first LLM (`llm_1`) executes normally without streaming, and only the final refined output streams to the user.
+
+### Advanced: Stream Multiple Components with `streaming_components`
+
+For advanced use cases where you want to see outputs from multiple components, use the `streaming_components` parameter:
+
+```python
+def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Generator:
+    question = get_last_user_message(messages)
+
+    # Enable streaming for BOTH LLMs
+    return streaming_generator(
+        pipeline=self.pipeline,
+        pipeline_run_args={"prompt_1": {"query": question}},
+        streaming_components=["llm_1", "llm_2"]  # Stream both components
+    )
+```
+
+**What happens:** Both LLMs stream their responses token by token. First you'll see the initial answer from `llm_1` streaming, then the refined answer from `llm_2` streaming.
+
+You can also selectively enable streaming for specific components:
+
+```python
+# Stream only the first LLM
+streaming_components=["llm_1"]
+
+# Stream only the second LLM (same as default)
+streaming_components=["llm_2"]
+
+# Stream ALL capable components (shorthand)
+streaming_components="all"
+
+# Stream ALL capable components (specific list)
+streaming_components=["llm_1", "llm_2"]
+```
+
+### Using the "all" Keyword
+
+The `"all"` keyword is a convenient shorthand to enable streaming for all capable components:
+
+```python
+return streaming_generator(
+    pipeline=self.pipeline,
+    pipeline_run_args={...},
+    streaming_components="all"  # Enable all streaming components
+)
+```
+
+This is equivalent to explicitly enabling every streaming-capable component in your pipeline.
+
+### Global Configuration via Environment Variable
+
+You can set a global default using the `HAYHOOKS_STREAMING_COMPONENTS` environment variable. This applies to all pipelines unless overridden:
+
+```bash
+# Stream all components by default
+export HAYHOOKS_STREAMING_COMPONENTS="all"
+
+# Stream specific components (comma-separated)
+export HAYHOOKS_STREAMING_COMPONENTS="llm_1,llm_2"
+```
+
+**Priority order:**
+
+1. Explicit `streaming_components` parameter (highest priority)
+2. `HAYHOOKS_STREAMING_COMPONENTS` environment variable
+3. Default behavior: stream only last component (lowest priority)
+
+!!! tip "When to Use Each Approach"
+    - **Default (last component only)**: Best for most use cases - users see only the final output
+    - **"all" keyword**: Useful for debugging, demos, or transparent multi-step workflows
+    - **List of components**: Enable multiple specific components by name
+    - **Environment variable**: For deployment-wide defaults without code changes
+
+!!! note "Async Streaming"
+    All streaming_components options work identically with `async_streaming_generator()` for async pipelines.
+
+### YAML Pipeline Streaming Configuration
+
+You can also specify streaming configuration in YAML pipeline definitions:
+
+```yaml
+components:
+  prompt_1:
+    type: haystack.components.builders.PromptBuilder
+    init_parameters:
+      template: "Answer this question: {{query}}"
+  llm_1:
+    type: haystack.components.generators.OpenAIGenerator
+  prompt_2:
+    type: haystack.components.builders.PromptBuilder
+    init_parameters:
+      template: "Refine this response: {{previous_reply}}"
+  llm_2:
+    type: haystack.components.generators.OpenAIGenerator
+
+connections:
+  - sender: prompt_1.prompt
+    receiver: llm_1.prompt
+  - sender: llm_1.replies
+    receiver: prompt_2.previous_reply
+  - sender: prompt_2.prompt
+    receiver: llm_2.prompt
+
+inputs:
+  query: prompt_1.query
+
+outputs:
+  replies: llm_2.replies
+
+# Option 1: List specific components
+streaming_components:
+  - llm_1
+  - llm_2
+
+# Option 2: Stream all components
+# streaming_components: all
+```
+
+YAML configuration follows the same priority rules: YAML setting > environment variable > default.
+
+See the [Multi-LLM Streaming Example](https://github.com/deepset-ai/hayhooks/tree/main/examples/pipeline_wrappers/multi_llm_streaming) for a complete working implementation.
+
 ## File Upload Support
 
 Hayhooks can handle file uploads by adding a `files` parameter:
 
@@ -44,6 +44,32 @@ Hayhooks can be configured via environment variables (loaded with prefix `HAYHOO
 - Default: `false`
 - Description: Include tracebacks in error messages (server and MCP)
 
+### HAYHOOKS_STREAMING_COMPONENTS
+
+- Default: `""` (empty string)
+- Description: Global configuration for which pipeline components should stream
+- Options:
+  - `""` (empty): Stream only the last capable component (default)
+  - `"all"`: Stream all streaming-capable components
+  - Comma-separated list: `"llm_1,llm_2"` to enable specific components
+
+!!! note "Priority Order"
+    Pipeline-specific settings (via `streaming_components` parameter or YAML) override this global default.
+
+!!! tip "Component-Specific Control"
+    For component-specific control, use the `streaming_components` parameter in your code or YAML configuration instead of the environment variable to specify exactly which components should stream.
+
+**Examples:**
+
+```bash
+# Stream all components globally
+export HAYHOOKS_STREAMING_COMPONENTS="all"
+
+# Stream specific components (comma-separated, spaces are trimmed)
+export HAYHOOKS_STREAMING_COMPONENTS="llm_1,llm_2"
+export HAYHOOKS_STREAMING_COMPONENTS="llm_1, llm_2, llm_3"
+```
+
 ## MCP
 
 ### HAYHOOKS_MCP_HOST
@@ -154,6 +180,7 @@ HAYHOOKS_ADDITIONAL_PYTHON_PATH=./custom_code
 HAYHOOKS_USE_HTTPS=false
 HAYHOOKS_DISABLE_SSL=false
 HAYHOOKS_SHOW_TRACEBACKS=false
+HAYHOOKS_STREAMING_COMPONENTS=all
 HAYHOOKS_CORS_ALLOW_ORIGINS=["*"]
 LOG=INFO
 ```
 
@@ -6,6 +6,7 @@ This directory contains various examples demonstrating different use cases and f
 
 | Example | Description | Key Features | Use Case |
 |---------|-------------|--------------|----------|
+| [multi_llm_streaming](./pipeline_wrappers/multi_llm_streaming/) | Multiple LLM components with automatic streaming | • Two sequential LLMs<br/>• Automatic multi-component streaming<br/>• No special configuration needed<br/>• Shows default streaming behavior | Demonstrating how hayhooks automatically streams from all components in a pipeline |
 | [async_question_answer](./pipeline_wrappers/async_question_answer/) | Async question-answering pipeline with streaming support | • Async pipeline execution<br/>• Streaming responses<br/>• OpenAI Chat Generator<br/>• Both API and chat completion interfaces | Building conversational AI systems that need async processing and real-time streaming responses |
 | [chat_with_website](./pipeline_wrappers/chat_with_website/) | Answer questions about website content | • Web content fetching<br/>• HTML to document conversion<br/>• Content-based Q&A<br/>• Configurable URLs | Creating AI assistants that can answer questions about specific websites or web-based documentation |
 | [chat_with_website_mcp](./pipeline_wrappers/chat_with_website_mcp/) | MCP-compatible website chat pipeline | • MCP (Model Context Protocol) support<br/>• Website content analysis<br/>• API-only interface<br/>• Simplified deployment | Integrating website analysis capabilities into MCP-compatible AI systems and tools |
 
@@ -0,0 +1,107 @@
+# Multi-LLM Streaming Example
+
+This example demonstrates hayhooks' configurable multi-component streaming support.
+
+## Overview
+
+The pipeline contains **two LLM components in sequence**:
+
+1. **LLM 1** (`gpt-5-nano` with `reasoning_effort: low`): Provides a short, concise initial answer to the user's question
+2. **LLM 2** (`gpt-5-nano` with `reasoning_effort: medium`): Refines and expands the answer into a detailed, professional response
+
+This example uses `streaming_components` to enable streaming for **both** LLMs. By default, only the last component would stream.
+
+![Multi-LLM Streaming Example](./multi_stream.gif)
+
+## How It Works
+
+### Streaming Configuration
+
+By default, hayhooks streams only the **last** streaming-capable component (in this case, LLM 2). However, this example demonstrates using the `streaming_components` parameter to enable streaming for both components:
+
+```python
+streaming_generator(
+    pipeline=self.pipeline,
+    pipeline_run_args={...},
+    streaming_components=["llm_1", "llm_2"]  # or streaming_components="all"
+)
+```
+
+**Available options:**
+
+- **Default behavior** (no `streaming_components` or `None`): Only the last streaming component streams
+- **Stream all components**: `streaming_components=["llm_1", "llm_2"]` (same as `streaming_components="all"`)
+- **Stream only first**: `streaming_components=["llm_1"]`
+- **Stream only last** (same as default): `streaming_components=["llm_2"]`
+
+### Pipeline Architecture
+
+The pipeline connects LLM 1's replies directly to the second prompt builder. Using Jinja2 template syntax, the second prompt builder can access the `ChatMessage` attributes directly: `{{previous_response[0].text}}`. This approach is simple and doesn't require any custom extraction components.
+
+This example also demonstrates injecting a visual separator (`**[LLM 2 - Refining the response]**`) between the two LLM outputs using `StreamingChunk.component_info` to detect component transitions.
+
+## Usage
+
+### Deploy with Hayhooks
+
+```bash
+# Set your OpenAI API key
+export OPENAI_API_KEY=your_api_key_here
+
+# Deploy the pipeline
+hayhooks deploy examples/pipeline_wrappers/multi_llm_streaming
+
+# Test it via OpenAI-compatible API
+curl -X POST http://localhost:1416/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "multi_llm_streaming",
+    "messages": [{"role": "user", "content": "What is machine learning?"}],
+    "stream": true
+  }'
+```
+
+### Use Directly in Code
+
+```python
+from haystack import Pipeline
+from haystack.components.builders import ChatPromptBuilder
+from haystack.dataclasses import ChatMessage
+from hayhooks import streaming_generator
+
+# Create your pipeline with multiple streaming components
+pipeline = Pipeline()
+# ... add LLM 1 and prompt_builder_1 ...
+
+# Add second prompt builder that accesses ChatMessage attributes via Jinja2
+pipeline.add_component(
+    "prompt_builder_2",
+    ChatPromptBuilder(
+        template=[
+            ChatMessage.from_system("You are a helpful assistant."),
+            ChatMessage.from_user("Previous: {{previous_response[0].text}}\n\nRefine this.")
+        ]
+    )
+)
+# ... add LLM 2 ...
+
+# Connect: LLM 1 replies directly to prompt_builder_2
+pipeline.connect("llm_1.replies", "prompt_builder_2.previous_response")
+
+# Enable streaming for both LLMs (by default, only the last would stream)
+for chunk in streaming_generator(
+    pipeline=pipeline,
+    pipeline_run_args={"prompt_builder_1": {"query": "Your question"}},
+    streaming_components=["llm_1", "llm_2"]  # Stream both components
+):
+    print(chunk.content, end="", flush=True)
+```
+
+## Integration with OpenWebUI
+
+This pipeline works seamlessly with OpenWebUI:
+
+1. Configure OpenWebUI to connect to hayhooks (see [OpenWebUI Integration docs](https://deepset-ai.github.io/hayhooks/features/openwebui-integration))
+2. Deploy this pipeline
+3. Select it as a model in OpenWebUI
+4. Watch both LLMs stream their responses in real-time