Skip to content

Commit 7d6014d

Browse files
committed
Docs update to reflect return_failures changes
1 parent f867e6f commit 7d6014d

5 files changed

Lines changed: 73 additions & 4 deletions

File tree

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,12 +172,17 @@ ingestor = (
172172

173173
print("Starting ingestion..")
174174
t0 = time.time()
175-
results = ingestor.ingest(show_progress=True)
175+
# Return both successes and failures so you can inspect partial errors without stopping uploads
176+
results, failures = ingestor.ingest(show_progress=True, return_failures=True)
176177
t1 = time.time()
177178
print(f"Time taken: {t1 - t0} seconds")
178179

179180
# results blob is directly inspectable
180181
print(ingest_json_results_to_blob(results[0]))
182+
183+
# (optional) Review any failures that were returned
184+
if failures:
185+
print(f"There were {len(failures)} failures. Sample: {failures[0]}")
181186
```
182187

183188
You can see the extracted text that represents the content of the ingested test document.
@@ -284,6 +289,14 @@ So, according to this whimsical analysis, both the **Giraffe** and the **Cat** a
284289
>
285290
> Please also checkout our [demo using a retrieval pipeline on build.nvidia.com](https://build.nvidia.com/nvidia/multimodal-pdf-data-extraction-for-enterprise-rag) to query over document content pre-extracted w/ NVIDIA Ingest.
286291
292+
> [!IMPORTANT]
293+
> About `return_failures`
294+
>
295+
> - `ingestor.ingest(..., return_failures=False)` (default): returns only successful results. Any failed jobs are omitted from the return value and are logged. If you have configured `.vdb_upload(...)` and any failures are present, `ingest()` will raise a `RuntimeError` and will NOT upload. This preserves the previous all-or-nothing bulk upload behavior.
296+
> - `ingestor.ingest(..., return_failures=True)`: returns a tuple `(results, failures)`. If `.vdb_upload(...)` is configured and some jobs failed, `ingest()` will still proceed to upload only the successful results to your vector database and will not raise. You can then inspect `failures` to decide whether to retry or remediate.
297+
>
298+
> This makes `return_failures=True` ideal for large batches where you want successful chunks/pages to be committed while still collecting detailed diagnostics for anything that didn’t complete.
299+
287300
288301
## GitHub Repository Structure
289302

docs/docs/extraction/data-store.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,25 @@ It does not store the embeddings for images.
2121
NeMo Retriever extraction supports uploading data by using the [Ingestor.vdb_upload API](nv-ingest-python-api.md).
2222
Currently, data upload is not supported through the [NV Ingest CLI](nv-ingest_cli.md).
2323

24+
### Partial Failures and Upload Semantics
25+
26+
When chaining `.vdb_upload(...)` on an `Ingestor`, upload behavior depends on the `return_failures` flag passed to `ingest()`:
27+
28+
- `return_failures=False` (default): If any ingestion jobs fail, `ingest()` raises a `RuntimeError` and no upload occurs (all-or-nothing).
29+
- `return_failures=True`: `ingest()` returns `(results, failures)` and uploads only the successful results; it does not raise. Inspect `failures` and retry as needed.
30+
31+
Example:
32+
33+
```python
34+
results, failures = (
35+
Ingestor(client=client)
36+
.files(["doc1.pdf", "doc2.pdf"]).extract().embed()
37+
.vdb_upload(collection_name="my_collection", milvus_uri="milvus.db")
38+
.ingest(return_failures=True)
39+
)
40+
print(f"Uploaded {len(results)} successes; {len(failures)} failures")
41+
```
42+
2443

2544

2645
## Upload to Milvus

docs/docs/extraction/nv-ingest-python-api.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,30 @@ For large document batches, you can enable a progress bar by setting `show_progr
115115
Use the following code.
116116

117117
```python
118-
result = ingestor.extract().ingest(show_progress=True)
118+
results, failures = ingestor.extract().ingest(show_progress=True, return_failures=True)
119+
print(len(results), "successful documents")
120+
if failures:
121+
print("Failures:", failures[:1])
122+
```
123+
124+
## Ingest Semantics with vdb_upload
125+
126+
If you chain `.vdb_upload(...)` on the `Ingestor`, uploads are performed after ingestion completes. Behavior depends on `return_failures`:
127+
128+
- `return_failures=False` (default): If any jobs fail, `ingest()` raises a `RuntimeError` and does not upload (all-or-nothing).
129+
- `return_failures=True`: `ingest()` returns `(results, failures)` and uploads only the successful results; it does not raise. You can inspect `failures` and retry selectively.
130+
131+
Example:
132+
133+
```python
134+
ingestor = (
135+
Ingestor(client=client)
136+
.files(["/path/doc1.pdf", "/path/doc2.pdf"]).extract().embed()
137+
.vdb_upload(collection_name="my_collection", milvus_uri="milvus.db")
138+
)
139+
140+
results, failures = ingestor.ingest(return_failures=True)
141+
print(f"Uploaded {len(results)} successful docs; {len(failures)} failures")
119142
```
120143

121144

docs/docs/extraction/quickstart-guide.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,17 +179,24 @@ ingestor = (
179179
)
180180
print("Starting ingestion..")
181181
t0 = time.time()
182-
results = ingestor.ingest()
182+
results, failures = ingestor.ingest(return_failures=True)
183183
t1 = time.time()
184184
print(f"Time taken: {t1-t0} seconds")
185185
# results blob is directly inspectable
186186
print(ingest_json_results_to_blob(results[0]))
187+
if failures:
188+
print(f"There were {len(failures)} failures. Sample: {failures[0]}")
187189
```
188190
189191
!!! note
190192
191193
To use library mode with nemoretriever_parse, uncomment `extract_method="nemoretriever_parse"` in the previous code. For more information, refer to [Use Nemo Retriever Extraction with nemoretriever-parse](nemoretriever-parse.md).
192194
195+
!!! important "About return_failures and vdb_upload"
196+
197+
- `ingestor.ingest(..., return_failures=False)` (default): returns only successful results. If `.vdb_upload(...)` is configured and any jobs fail, `ingest()` raises `RuntimeError` and does not upload (all-or-nothing).
198+
- `ingestor.ingest(..., return_failures=True)`: returns `(results, failures)`. If `.vdb_upload(...)` is configured and some jobs fail, `ingest()` uploads only the successful results and does not raise; inspect `failures` for remediation.
199+
193200
194201
```
195202
Starting ingestion..

docs/docs/extraction/quickstart-library-mode.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,18 +131,25 @@ ingestor = (
131131
132132
print("Starting ingestion..")
133133
t0 = time.time()
134-
results = ingestor.ingest(show_progress=True)
134+
results, failures = ingestor.ingest(show_progress=True, return_failures=True)
135135
t1 = time.time()
136136
print(f"Time taken: {t1 - t0} seconds")
137137
138138
# results blob is directly inspectable
139139
print(ingest_json_results_to_blob(results[0]))
140+
if failures:
141+
print(f"There were {len(failures)} failures. Sample: {failures[0]}")
140142
```
141143

142144
!!! note
143145

144146
To use library mode with nemoretriever_parse, uncomment `extract_method="nemoretriever_parse"` in the previous code. For more information, refer to [Use Nemo Retriever Extraction with nemoretriever-parse](nemoretriever-parse.md).
145147

148+
!!! important "About return_failures and vdb_upload"
149+
150+
- `ingestor.ingest(..., return_failures=False)` (default): returns only successful results. If `.vdb_upload(...)` is configured and any jobs fail, `ingest()` raises `RuntimeError` and does not upload (all-or-nothing).
151+
- `ingestor.ingest(..., return_failures=True)`: returns `(results, failures)`. If `.vdb_upload(...)` is configured and some jobs fail, `ingest()` uploads only the successful results and does not raise; inspect `failures` for remediation.
152+
146153
You can see the extracted text that represents the content of the ingested test document.
147154

148155
```shell

0 commit comments

Comments
 (0)