You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# (optional) Review any failures that were returned
184
+
if failures:
185
+
print(f"There were {len(failures)} failures. Sample: {failures[0]}")
181
186
```
182
187
183
188
You can see the extracted text that represents the content of the ingested test document.
@@ -284,6 +289,14 @@ So, according to this whimsical analysis, both the **Giraffe** and the **Cat** a
284
289
>
285
290
> Please also checkout our [demo using a retrieval pipeline on build.nvidia.com](https://build.nvidia.com/nvidia/multimodal-pdf-data-extraction-for-enterprise-rag) to query over document content pre-extracted w/ NVIDIA Ingest.
286
291
292
+
> [!IMPORTANT]
293
+
> About `return_failures`
294
+
>
295
+
> - `ingestor.ingest(..., return_failures=False)` (default): returns only successful results. Any failed jobs are omitted from the return value and are logged. If you have configured `.vdb_upload(...)` and any failures are present, `ingest()` will raise a `RuntimeError` and will NOT upload. This preserves the previous all-or-nothing bulk upload behavior.
296
+
> - `ingestor.ingest(..., return_failures=True)`: returns a tuple `(results, failures)`. If `.vdb_upload(...)` is configured and some jobs failed, `ingest()` will still proceed to upload only the successful results to your vector database and will not raise. You can then inspect `failures` to decide whether to retry or remediate.
297
+
>
298
+
> This makes `return_failures=True` ideal for large batches where you want successful chunks/pages to be committed while still collecting detailed diagnostics for anything that didn’t complete.
Copy file name to clipboardExpand all lines: docs/docs/extraction/data-store.md
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,25 @@ It does not store the embeddings for images.
21
21
NeMo Retriever extraction supports uploading data by using the [Ingestor.vdb_upload API](nv-ingest-python-api.md).
22
22
Currently, data upload is not supported through the [NV Ingest CLI](nv-ingest_cli.md).
23
23
24
+
### Partial Failures and Upload Semantics
25
+
26
+
When chaining `.vdb_upload(...)` on an `Ingestor`, upload behavior depends on the `return_failures` flag passed to `ingest()`:
27
+
28
+
-`return_failures=False` (default): If any ingestion jobs fail, `ingest()` raises a `RuntimeError` and no upload occurs (all-or-nothing).
29
+
-`return_failures=True`: `ingest()` returns `(results, failures)` and uploads only the successful results; it does not raise. Inspect `failures` and retry as needed.
If you chain `.vdb_upload(...)` on the `Ingestor`, uploads are performed after ingestion completes. Behavior depends on `return_failures`:
127
+
128
+
-`return_failures=False` (default): If any jobs fail, `ingest()` raises a `RuntimeError` and does not upload (all-or-nothing).
129
+
-`return_failures=True`: `ingest()` returns `(results, failures)` and uploads only the successful results; it does not raise. You can inspect `failures` and retry selectively.
print(f"There were {len(failures)} failures. Sample: {failures[0]}")
187
189
```
188
190
189
191
!!! note
190
192
191
193
To use library mode with nemoretriever_parse, uncomment `extract_method="nemoretriever_parse"`in the previous code. For more information, refer to [Use Nemo Retriever Extraction with nemoretriever-parse](nemoretriever-parse.md).
192
194
195
+
!!! important "About return_failures and vdb_upload"
196
+
197
+
- `ingestor.ingest(..., return_failures=False)` (default): returns only successful results. If `.vdb_upload(...)` is configured and any jobs fail, `ingest()` raises `RuntimeError` and does not upload (all-or-nothing).
198
+
- `ingestor.ingest(..., return_failures=True)`: returns `(results, failures)`. If `.vdb_upload(...)` is configured and some jobs fail, `ingest()` uploads only the successful results and does not raise; inspect `failures`for remediation.
print(f"There were {len(failures)} failures. Sample: {failures[0]}")
140
142
```
141
143
142
144
!!! note
143
145
144
146
To use library mode with nemoretriever_parse, uncomment `extract_method="nemoretriever_parse"` in the previous code. For more information, refer to [Use Nemo Retriever Extraction with nemoretriever-parse](nemoretriever-parse.md).
145
147
148
+
!!! important "About return_failures and vdb_upload"
149
+
150
+
- `ingestor.ingest(..., return_failures=False)` (default): returns only successful results. If `.vdb_upload(...)` is configured and any jobs fail, `ingest()` raises `RuntimeError` and does not upload (all-or-nothing).
151
+
- `ingestor.ingest(..., return_failures=True)`: returns `(results, failures)`. If `.vdb_upload(...)` is configured and some jobs fail, `ingest()` uploads only the successful results and does not raise; inspect `failures` for remediation.
152
+
146
153
You can see the extracted text that represents the content of the ingested test document.
0 commit comments