Skip to content

Commit cc416fb

Browse files
committed
docs(streaming): update streaming configuration documentation
feature: #1538 Revise streaming docs to clarify usage of stream_async(), remove outdated global streaming config, and add CLI usage instructions. Explain output rails streaming requirements and deprecation of StreamingHandler. Improve examples and guidance for token usage tracking.
1 parent 5c0c461 commit cc416fb

File tree

1 file changed

+31
-50
lines changed

1 file changed

+31
-50
lines changed

docs/configure-rails/yaml-schema/streaming/global-streaming.md

Lines changed: 31 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,33 @@
11
---
2-
title: Global Streaming
3-
description: Enable streaming mode for LLM token generation in config.yml.
2+
title: Streaming
3+
description: Using streaming mode for LLM token generation in NeMo Guardrails.
44
---
55

6-
# Global Streaming
6+
# Streaming
77

8-
Enable streaming mode for the main LLM generation at the top level of `config.yml`.
8+
NeMo Guardrails supports streaming LLM responses via the `stream_async()` method. No configuration is required to enable streaming—simply use `stream_async()` instead of `generate_async()`.
99

10-
## Configuration
10+
## Basic Usage
1111

12-
```yaml
13-
streaming: True
14-
```
15-
16-
## What It Does
17-
18-
When enabled, global streaming:
12+
```python
13+
from nemoguardrails import LLMRails, RailsConfig
1914

20-
- Sets `streaming = True` on the underlying LLM model
21-
- Enables `stream_usage = True` for token usage tracking
22-
- Allows using the `stream_async()` method on `LLMRails`
23-
- Makes the LLM produce tokens incrementally instead of all at once
15+
config = RailsConfig.from_path("./config")
16+
rails = LLMRails(config)
2417

25-
## Default
18+
messages = [{"role": "user", "content": "Hello!"}]
2619

27-
`False`
20+
async for chunk in rails.stream_async(messages=messages):
21+
print(chunk, end="", flush=True)
22+
```
2823

2924
---
3025

31-
## When to Use
32-
33-
### Streaming Without Output Rails
34-
35-
If you do not have output rails configured, only global streaming is needed:
36-
37-
```yaml
38-
streaming: True
39-
```
40-
41-
### Streaming With Output Rails
26+
## Streaming With Output Rails
4227

43-
When using output rails with streaming, you must also configure [output rail streaming](output-rail-streaming.md):
28+
When using output rails with streaming, you must configure [output rail streaming](output-rail-streaming.md):
4429

4530
```yaml
46-
streaming: True
47-
4831
rails:
4932
output:
5033
flows:
@@ -53,27 +36,15 @@ rails:
5336
enabled: True
5437
```
5538
56-
---
39+
If output rails are configured but `rails.output.streaming.enabled` is not set to `True`, calling `stream_async()` will raise an `StreamingNotSupportedError`.
5740

58-
## Python API Usage
41+
---
5942

60-
### Simple Streaming
43+
## Streaming With Handler (Deprecated)
6144

62-
```python
63-
from nemoguardrails import LLMRails, RailsConfig
64-
65-
config = RailsConfig.from_path("./config")
66-
rails = LLMRails(config)
45+
> **Warning:** Using `StreamingHandler` directly is deprecated and will be removed in a future release. Use `stream_async()` instead.
6746

68-
messages = [{"role": "user", "content": "Hello!"}]
69-
70-
async for chunk in rails.stream_async(messages=messages):
71-
print(chunk, end="", flush=True)
72-
```
73-
74-
### Streaming With Handler
75-
76-
For more control, use a `StreamingHandler`:
47+
For advanced use cases requiring more control over token processing, you can use a `StreamingHandler` with `generate_async()`:
7748

7849
```python
7950
from nemoguardrails import LLMRails, RailsConfig
@@ -113,9 +84,19 @@ Enable streaming in the request body by setting `stream` to `true`:
11384

11485
---
11586

87+
## CLI Usage
88+
89+
Use the `--streaming` flag with the chat command:
90+
91+
```bash
92+
nemoguardrails chat path/to/config --streaming
93+
```
94+
95+
---
96+
11697
## Token Usage Tracking
11798

118-
When streaming is enabled, NeMo Guardrails automatically enables token usage tracking by setting `stream_usage = True` for the underlying LLM model.
99+
When using `stream_async()`, NeMo Guardrails automatically enables token usage tracking by setting `stream_usage = True` on the underlying LLM model.
119100

120101
Access token usage through the `log` generation option:
121102

0 commit comments

Comments
 (0)