diff --git a/docs/en/notes/api/operators/pdf2vqa/generate/LLMOutputParser.md b/docs/en/notes/api/operators/pdf2vqa/generate/LLMOutputParser.md
index 249001a00..68662c931 100644
--- a/docs/en/notes/api/operators/pdf2vqa/generate/LLMOutputParser.md
+++ b/docs/en/notes/api/operators/pdf2vqa/generate/LLMOutputParser.md
@@ -1,7 +1,7 @@
---
title: LLMOutputParser
createTime: 2026/01/20 20:15:00
-permalink: /en/api/operators/core_text/parse/llmoutputparser/
+permalink: /en/api/operators/pdf2vqa/generate/llmoutputparser/
---
## 📘 Overview
@@ -16,8 +16,7 @@ The core functionalities of this operator include:
## `__init__` Function
```python
-def __init__(self,
- mode: Literal['question', 'answer'],
+def __init__(self,
output_dir: str,
intermediate_dir: str = "intermediate"
)
@@ -28,7 +27,6 @@ def __init__(self,
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
-| **mode** | str | Required | Parsing mode. Options are `'question'` or `'answer'`, which affects the output filename and the image subdirectory name. |
| **output_dir** | str | Required | The final root directory for structured data and images. |
| **intermediate_dir** | str | "intermediate" | The intermediate directory where original image resources processed by MinerU are located. |
@@ -76,7 +74,7 @@ Suppose the LLM returns: `1, 3`
The operator looks up entries with `id` 1 and 3 in the layout JSON:
* If `id: 1` is the text "What is AI?" and `id: 3` is the image `path/to/img.png`.
-* The restored content will be: `What is AI?\n`.
+* The restored content will be: `What is AI?\n`.
### 2. Output File Structure
@@ -86,7 +84,7 @@ After execution, the directory structure under `output_dir` (referenced as `cach
output_dir/
└── {name}/
├── extracted_questions.jsonl # Structured data
- └── question_images/ # Automatically synchronized images
+ └── vqa_images/ # Automatically synchronized images
├── img1.png
└── ...
@@ -96,7 +94,7 @@ output_dir/
```json
{
- "question": "Please analyze the image below:\n",
+ "question": "Please analyze the image below:\n",
"answer": "This is the parsed answer text.",
"solution": "Detailed step-by-step solution...",
"label": "1",
diff --git a/docs/en/notes/api/operators/pdf2vqa/generate/MineruToLLMInputOperator.md b/docs/en/notes/api/operators/pdf2vqa/generate/MineruToLLMInputOperator.md
index 6d056fe5a..523dbafee 100644
--- a/docs/en/notes/api/operators/pdf2vqa/generate/MineruToLLMInputOperator.md
+++ b/docs/en/notes/api/operators/pdf2vqa/generate/MineruToLLMInputOperator.md
@@ -1,7 +1,7 @@
---
title: MinerU2LLMInputOperator
createTime: 2026/01/20 20:10:00
-permalink: /en/api/operators/core_text/convert/mineru2llminputoperator/
+permalink: /en/api/operators/pdf2vqa/generate/mineru2llminputoperator/
---
## 📘 Overview
diff --git a/docs/en/notes/api/operators/pdf2vqa/generate/QAMerger.md b/docs/en/notes/api/operators/pdf2vqa/generate/QAMerger.md
index 3b7a70e01..6dfadb0d4 100644
--- a/docs/en/notes/api/operators/pdf2vqa/generate/QAMerger.md
+++ b/docs/en/notes/api/operators/pdf2vqa/generate/QAMerger.md
@@ -1,7 +1,7 @@
---
title: QA_Merger
createTime: 2026/01/20 20:25:00
-permalink: /en/api/operators/core_text/merge/qamerger/
+permalink: /en/api/operators/pdf2vqa/generate/qamerger/
---
## 📘 Overview
diff --git a/docs/en/notes/guide/quickstart/PDFVQAExtract.md b/docs/en/notes/guide/quickstart/PDFVQAExtract.md
index cedfea0f7..c80527992 100644
--- a/docs/en/notes/guide/quickstart/PDFVQAExtract.md
+++ b/docs/en/notes/guide/quickstart/PDFVQAExtract.md
@@ -22,7 +22,7 @@ Major stages:
## 2. Quick Start
-### Step 1: Install Dataflow (and MinerU)
+### Step 1: Install Dataflow
Install Dataflow:
```shell
pip install "open-dataflow[pdf2vqa]"
@@ -35,12 +35,6 @@ cd Dataflow
pip install -e ".[pdf2vqa]"
```
-Then install MinerU and download models:
-```shell
-pip install "mineru[vllm]>=2.5.0,<2.7.0"
-mineru-models-download
-```
-
### Step 2: Create a workspace
```shell
cd /your/working/directory
@@ -55,13 +49,18 @@ dataflow init
You can then add your pipeline script under `pipelines/` or any custom path.
### Step 4: Configure API credentials
+`DF_API_KEY` is for calling LLM API, and `MINERU_API_KEY` is for calling MinerU for layout analysis.
+`MINERU_API_KEY` can be obtained from https://mineru.net/apiManage/token, and `DF_API_KEY` can be obtained from your LLM provider (e.g., OpenAI, Google Gemini, etc.). Set them as environment variables:
+
Linux / macOS:
```shell
export DF_API_KEY="sk-xxxxx"
+export MINERU_API_KEY="sk2-xxxxx"
```
Windows PowerShell:
```powershell
$env:DF_API_KEY = "sk-xxxxx"
+$env:MINERU_API_KEY = "sk2-xxxxx"
```
In the pipeline script, set your API endpoint:
```python
@@ -72,12 +71,7 @@ self.llm_serving = APILLMServing_request(
max_workers=100,
)
```
-and set MinerU backend ('vlm-vllm-engine' or 'vlm-transformers') and LLM max token length (recommended not to exceed 128000 to avoid LLM forgetting details).
-**Caution: The pipeline was only tested with the `vlm` backend; compatibility with the `pipeline` backend is uncertain due to format differences. Using the `vlm` backend is recommended.**
-The `vlm-vllm-engine` backend requires GPU support.
-```python
-self.mineru_executor = FileOrURLToMarkdownConverterBatch(intermediate_dir = "intermediate", mineru_backend="vlm-vllm-engine")
-```
+and set LLM max token length (recommended not to exceed 128000 to avoid LLM forgetting details).
```python
self.vqa_extractor = ChunkedPromptedGenerator(
@@ -97,16 +91,12 @@ You can also import the operators into other workflows; the remainder of this do
### 1. Input data
-Each job is defined by a JSONL row. Two modes are supported:
+Each job is defined by a JSONL row. `input_pdf_paths` can be a single PDF or a list of PDFs (questions appear before answers). `name` is an identifier for the job. Questions and answers can be interleaved or separated; they can come from the same PDF or different PDFs.
-- **QA-Separated PDFs**
- ```jsonl
- {"question_pdf_path": "/abs/path/questions.pdf", "answer_pdf_path": "/abs/path/answers.pdf", "subject": "math", "output_dir": "./output/math"}
- ```
-- **QA-Interleaved PDFs**
- ```jsonl
- {"question_pdf_path": "/abs/path/qa.pdf", "answer_pdf_path": "/abs/path/qa.pdf", "name": "math2"}
- ```
+```jsonl
+{"input_pdf_paths": "./example_data/PDF2VQAPipeline/questionextract_test.pdf", "name": "math1"}
+{"input_pdf_paths": ["./example_data/PDF2VQAPipeline/math_question.pdf", "./example_data/PDF2VQAPipeline/math_answer.pdf"], "name": "math2"}
+```
`FileStorage` handles batching/cache management:
```python
@@ -120,15 +110,48 @@ self.storage = FileStorage(
### 2. Document layout extraction (MinerU)
-For each PDF (question, answer, or mixed), the pipeline calls `_parse_file_with_mineru` inside `FileOrURLToMarkdownConverterBatch`. MinerU outputs:
+For each PDF (question, answer, or mixed), the pipeline calls `_parse_file_with_mineru` inside `FileOrURLToMarkdownConverterAPI`. MinerU outputs:
-- `//_content_list.json`: structured layout tokens (texts, figures, tables, IDs)
-- `//images/`: cropped page images
+- `*_content_list.json`: structured layout tokens (texts, figures, tables, IDs)
+- `images/`: cropped page images
-The backend can be:
+---
+**Note**:
+If you want to use a locally deployed MinerU model, you can replace the operator with `FileOrURLToMarkdownConverterLocal` (original version from opendatalab) or `FileOrURLToMarkdownConverterFlash` (our accelerated version), and provide the corresponding model path and deployment parameters.
-- `vlm-transformers`: CPU/GPU compatible
-- `vlm-vllm-engine`: high-throughput GPU mode (requires CUDA)
+For example:
+
+```python
+self.mineru_executor = FileOrURLToMarkdownConverterAPI(intermediate_dir = "intermediate")
+```
+
+can be replaced with
+
+```python
+self.mineru_executor = FileOrURLToMarkdownConverterLocal(
+ intermediate_dir = "intermediate",
+ mineru_model_path = "path/to/mineru/model",
+)
+```
+
+or
+
+```python
+self.mineru_executor = FileOrURLToMarkdownConverterFlash(
+ intermediate_dir = "intermediate",
+ mineru_model_path = "path/to/mineru/model",
+ batch_size = 4,
+ replicas = 1,
+ num_gpus_per_replica = 1,
+ engine_gpu_util_rate_to_ray_cap = 0.9
+)
+```
+
+You can refer to https://github.com/OpenDCAI/DataFlow/blob/main/dataflow/operators/knowledge_cleaning/generate/mineru_operators.py for specific parameters and usage.
+
+---
+
+Afterwards, the `MinerU2LLMInputOperator` flattens list items and re-indexes them to create LLM-friendly input.
### 3. QA extraction (VQAExtractor)
@@ -136,7 +159,7 @@ The backend can be:
- Grouping and pairing Q&A based, and inserting images to proper positions.
- Supports QA separated or interleaved PDFs.
-- Copies rendered images into `output_dir/question_images` and/or `answer_images`.
+- Copies rendered images into `cache_path/name/vqa_images`.
- Parses ``, ``, ``, ``, ``, `