Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions docs/docs/extraction/chunking.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,16 @@ The `split` task uses a tokenizer to count the number of tokens in the document,
and splits the document based on the desired maximum chunk size and chunk overlap.
We recommend that you use the `meta-llama/Llama-3.2-1B` tokenizer,
because it's the same tokenizer as the llama-3.2 embedding model that we use for embedding.
However, you can use any tokenizer from any HuggingFace model that includes a tokenizer file.

You can use any tokenizer from a Hugging Face model that provides a tokenizer file. Tokenizers run locally (local HF) and can be downloaded directly from the [Hugging Face Hub](https://huggingface.co/models).

Use the `split` method to chunk large documents as shown in the following code.

!!! note

The default tokenizer (`meta-llama/Llama-3.2-1B`) requires a [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens). You must set `hf_access_token": "hf_***` to authenticate.
The default tokenizer (meta-llama/Llama-3.2-1B) runs locally (local HF) and requires a [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens) for authentication. Set "hf_access_token": "hf_***" to provide your token.



```python
ingestor = ingestor.split(
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/extraction/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ NeMo Retriever Library is a scalable, performance-oriented document content and
NeMo Retriever Library uses specialized NVIDIA NIM microservices
to find, contextualize, and extract text, tables, charts and infographics that you can use in downstream generative applications.

!!! note
!!! tip "Get Started Recommendation"

This library is the NeMo Retriever Library.
**[Deploy without containers (Library Mode)](quickstart-library-mode.md)** is the recommended approach for workloads with fewer than 100 PDFs. It’s best suited for local development, experimentation, and small-scale ingestion.

NeMo Retriever Library enables parallelization of splitting documents into pages where artifacts are classified (such as text, tables, charts, and infographics), extracted, and further contextualized through optical character recognition (OCR) into a well defined JSON schema.
From there, NeMo Retriever Library can optionally manage computation of embeddings for the extracted content,
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/extraction/python-api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ ingestor = ingestor.embed()

!!! note

By default, `embed` uses the [llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2) model.
By default, `embed` uses the [llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2) model. Embedding supports **hosted NIM** (default), **local Hugging Face** models, or a **self-hosted** endpoint.

To use a different embedding model, such as [nv-embedqa-e5-v5](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5), specify a different `model_name` and `endpoint_url`.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/extraction/quickstart-library-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ In addition, you can use library mode, which is intended for the following cases

By default, library mode depends on NIMs that are hosted on build.nvidia.com.
In library mode you launch the main pipeline service directly within a Python process,
while all other services (such as embedding and storage) are hosted remotely in the cloud.
while embedding and reranking use hosted NIMs; you can also use local Hugging Face models or self-hosted endpoints by setting custom NIM endpoints (see [FAQ](faq.md)).

To get started using library mode, you need the following:

Expand Down
4 changes: 2 additions & 2 deletions docs/docs/extraction/support-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Before you begin using [NeMo Retriever Library](overview.md), ensure that you ha
The NeMo Retriever Library core pipeline features run on a single A10G or better GPU.
The core pipeline features include the following:

- llama3.2-nv-embedqa-1b-v2 — Embedding model for converting text chunks into vectors.
- llama3.2-nv-embedqa-1b-v2 — Embedding model for converting text chunks into vectors. Embedding is available as hosted NIM, local Hugging Face, or self-hosted.
- nemoretriever-page-elements-v3 — Detects and classifies images on a page as a table, chart or infographic.
- nemoretriever-table-structure-v1 — Detects rows, columns, and cells within a table to preserve table structure and convert to Markdown format.
- nemoretriever-graphic-elements-v1 — Detects graphic elements within chart images such as titles, legends, axes, and numerical values.
Expand All @@ -30,7 +30,7 @@ This includes the following:

While nemotron-nano-12b-v2-vl is the default VLM, you can configure and use other vision language models for image captioning based on your specific use case requirements. For more information, refer to [Extract Captions from Images](python-api-reference.md#extract-captions-from-images).

- Reranker — Use [llama-3.2-nv-rerankqa-1b-v2](https://build.nvidia.com/nvidia/llama-3.2-nv-rerankqa-1b-v2) for improved retrieval accuracy.
- Reranker — Use [llama-3.2-nv-rerankqa-1b-v2](https://build.nvidia.com/nvidia/llama-3.2-nv-rerankqa-1b-v2) for improved retrieval accuracy. Reranking is available as **hosted NIM**, **local Hugging Face**, or **self-hosted**.



Expand Down
2 changes: 1 addition & 1 deletion docs/docs/extraction/vlm-embed.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Use Multimodal Embedding with NeMo Retriever Library

This guide explains how to use the [NeMo Retriever Library](https://www.perplexity.ai/search/overview.md) with the multimodal embedding model [Llama Nemotron Embed VL 1B v2](https://build.nvidia.com/nvidia/llama-nemotron-embed-vl-1b-v2).
This guide explains how to use the [NeMo Retriever Library](overview.md) with the multimodal embedding model [Llama Nemotron Embed VL 1B v2](https://build.nvidia.com/nvidia/llama-nemotron-embed-vl-1b-v2). This page covers self-hosted deployment of the multimodal embedding NIM; text embedding and reranking also support hosted NIMs and local Hugging Face models.

The `Llama Nemotron Embed VL 1B v2` model is optimized for multimodal question-answering and retrieval tasks.
It can embed documents as text, images, or paired text-image combinations.
Expand Down
32 changes: 20 additions & 12 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# What is NVIDIA NeMo Retriever?
# What is NVIDIA NeMo Retriever Library?

NVIDIA NeMo Retriever is a collection of microservices
NVIDIA NeMo Retriever Library is a collection of microservices
for building and scaling multimodal data extraction, embedding, and reranking pipelines
with high accuracy and maximum data privacy – built with NVIDIA NIM.
NeMo Retriever, part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for managing the AI agent lifecycle,
NeMo Retriever Library, part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for managing the AI agent lifecycle,
ensures data privacy and seamlessly connects to proprietary data wherever it resides,
empowering secure, enterprise-grade retrieval.

NeMo Retriever provides the following:
NeMo Retriever Library provides the following:

- **Multimodal Data Extraction** — Quickly extract documents at scale that include text, tables, charts, and infographics.
- **Embedding + Indexing** — Embed all extracted text from text chunks and images, and then insert into LanceDB (default) or Milvus — accelerated with NVIDIA cuVS.
Expand All @@ -17,20 +17,26 @@ NeMo Retriever provides the following:
![Overview diagram](extraction/images/overview-retriever.png)


## Get Started

**[Deploy without containers (Library Mode)](extraction/quickstart-library-mode.md)** is the primary, recommended path for workloads under 100 PDFs. Use it for local development, experimentation, and small-scale ingestion.


## Enterprise-Ready Features

NVIDIA NeMo Retriever comes with enterprise-ready features, including the following:
NVIDIA NeMo Retriever Library comes with enterprise-ready features, including the following:

- **High Accuracy** — NeMo Retriever exhibits a high level of accuracy when retrieving across various modalities through enterprise documents.
- **High Throughput** — NeMo Retriever is capable of extracting, embedding, indexing and retrieving across hundreds of thousands of documents at scale with high throughput.
- **Decomposable/Customizable** — NeMo Retriever consists of modules that can be separately used and deployed in your own environment.
- **Enterprise-Grade Security** — NeMo Retriever NIMs come with security features such as the use of [safetensors](https://huggingface.co/docs/safetensors/index), continuous patching of CVEs, and more.
- **[World-class performance](extraction/benchmarking.md)** — See Benchmarks & Comparison for throughput and recall metrics.
- **High Accuracy** — NeMo Retriever Library exhibits a high level of accuracy when retrieving across various modalities through enterprise documents.
- **High Throughput** — NeMo Retriever Library is capable of extracting, embedding, indexing and retrieving across hundreds of thousands of documents at scale with high throughput.
- **Decomposable/Customizable** — NeMo Retriever Library consists of modules that can be separately used and deployed in your own environment.
- **Enterprise-Grade Security** — NeMo Retriever Library NIMs come with security features such as the use of [safetensors](https://huggingface.co/docs/safetensors/index), continuous patching of CVEs, and more.



## Applications

The following are some applications that use NVIDIA NeMo Retriever:
The following are some applications that use NVIDIA NeMo Retriever Library:

- [AI Virtual Assistant for Customer Service](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant) (NVIDIA AI Blueprint)
- [Build an Enterprise RAG pipeline](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline/blueprintcard) (NVIDIA AI Blueprint)
Expand All @@ -43,7 +49,9 @@ The following are some applications that use NVIDIA NeMo Retriever:

## Related Topics

- [NeMo Retriever Text Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html)
- [NeMo Retriever Text Reranking NIM](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html)
Embedding and reranking support **hosted NIMs**, **local Hugging Face** models, and **self-hosted** deployment:

- [NeMo Retriever Library Text Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html) (hosted NIM, local HF, self-hosted)
- [NeMo Retriever Library Text Reranking NIM](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html) (hosted NIM, local HF, self-hosted)
- [NVIDIA NIM for Object Detection](https://docs.nvidia.com/nim/ingestion/object-detection/latest/overview.html)
- [NVIDIA NIM for Image OCR](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/overview.html)
7 changes: 4 additions & 3 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
site_name: NeMo Retriever Documentation
site_name: NeMo Retriever Library Documentation
site_url: https://docs.nvidia.com/nemo/retriever/

repo_name: NVIDIA/nv-ingest
Expand Down Expand Up @@ -55,12 +55,13 @@ extra_css:


nav:
- NeMo Retriever:
- NeMo Retriever Library:
- Overview:
- Overview: index.md
- NeMo Retriever Extraction:
- Overview: extraction/overview.md
- Release Notes: extraction/releasenotes-nv-ingest.md
# Get Started CTA points to Library Mode QuickStart; Library Mode is the primary path for workloads <100 PDFs.
- Get Started:
- Prerequisites: extraction/prerequisites.md
- Support Matrix: extraction/support-matrix.md
Expand All @@ -85,7 +86,7 @@ nav:
- NimClient Usage: extraction/nimclient.md
- Resource Scaling Modes: extraction/scaling-modes.md
- Performance:
- Benchmarking: extraction/benchmarking.md
- Benchmarks & Comparison: extraction/benchmarking.md
- Telemetry: extraction/telemetry.md
- Throughput Is Dataset-Dependent: extraction/throughput-is-dataset-dependent.md
- Reference:
Expand Down
Loading