Skip to content

同步 MMKG 流水线文档与算子真实行为#32

Open
W-RMSL wants to merge 1 commit into
mainfrom
docs/mmkg-pipeline-update
Open

同步 MMKG 流水线文档与算子真实行为#32
W-RMSL wants to merge 1 commit into
mainfrom
docs/mmkg-pipeline-update

Conversation

@W-RMSL
Copy link
Copy Markdown
Collaborator

@W-RMSL W-RMSL commented May 11, 2026

No description provided.

Copilot AI review requested due to automatic review settings May 11, 2026 06:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Multimodal KG pipeline guide (ZH/EN) to better reflect the current MMKG pipeline’s expected inputs/outputs and example data, including sample image paths and updated visual-triple examples.

Changes:

  • Add sample image references and update img_dict guidance for local image paths.
  • Update the documented visual triple (vis_triple) format and example IO payloads.
  • Document vis_url alongside the input schema and update example JSON accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
docs/zh/notes/kg_guide/kg_pipelines_by_types/multimodal_kg_pipeline.md Updates ZH multimodal pipeline guide’s sample assets, input schema, and visual triple examples.
docs/en/notes/kg_guide/kg_pipelines_by_types/multimodal_kg_pipeline.md Updates EN multimodal pipeline guide’s sample assets, input schema, and visual triple examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 74 to 79
This pipeline requires at least the following fields:

- **raw_chunk**: source text for entity and textual triple extraction
- **img_dict**: an image dictionary where keys are image IDs and values are local image paths
- **img_dict**: an image dictionary where keys are image IDs (free-form strings that appear in `vis_triple`) and values are local image paths
- **vis_url**: a list of image paths the VLM opens during step 5 QA generation. Its order must match the order in which image IDs first appear in `img_dict`.


- combine images and candidate entities to extract visual facts
- typically output triples in the form `"<subj> entity <rel> depicted_in <obj> image_id"`
- output triples in the form `"<subj> entity <obj> image_id <rel> depicted_in "`
Comment on lines +174 to +176
"<subj> Cybertruck <obj> img_cybertruck <rel> depicted_in ",
"<subj> Elon Musk <obj> img_musk_stage <rel> depicted_in ",
"<subj> Cybertruck <obj> img_musk_stage <rel> depicted_in "
- Sample images: `example_data/MultimodalKGPipeline/images/cyber.jpg`, `example_data/MultimodalKGPipeline/images/musk.jpg`

For real image-text workloads, values in `img_dict` must be valid local image paths. The default JSON file is provided as a runnable input structure.
Values in `img_dict` are the actual paths the VLM serving layer opens. Only **local paths** are supported today: the serving layer calls `open(path, "rb")` and base64-encodes the bytes into the request, and it does not fetch remote URLs. The default data ships with runnable example images, and the paths are written relative to the `api_pipelines/` directory (the CWD when you run `python multimodal_kg_pipeline.py`).
Comment on lines 74 to 79
该流水线至少需要以下字段:

- **raw_chunk**:原始文本,用于实体和文本三元组抽取。
- **img_dict**:图片字典,key 为图片 ID,value 为本地图片路径。
- **img_dict**:图片字典,key 为图片 ID(自定义字符串,会出现在 `vis_triple` 中),value 为本地图片路径。
- **vis_url**:图片路径列表,供 step5 QA 生成时打开图片传给 VLM。元素顺序需要与 `img_dict` 中图片首次出现的顺序一致。


- 结合图片与候选实体抽取视觉事实
- 输出格式通常为 `"<subj> 实体 <rel> depicted_in <obj> 图片ID"`
- 输出格式为 `"<subj> 实体 <obj> 图片ID <rel> depicted_in "`
Comment on lines +174 to +176
"<subj> Cybertruck <obj> img_cybertruck <rel> depicted_in ",
"<subj> Elon Musk <obj> img_musk_stage <rel> depicted_in ",
"<subj> Cybertruck <obj> img_musk_stage <rel> depicted_in "
- 示例图片:`example_data/MultimodalKGPipeline/images/cyber.jpg`、`example_data/MultimodalKGPipeline/images/musk.jpg`

真实图文场景中,`img_dict` 的 value 需要是本地可访问的图片路径;默认数据使用 JSON 格式,可直接作为结构示例
`img_dict` 的 value 是 VLM 实际打开的图片路径,目前仅支持**本地路径**(serving 层用 `open(path, "rb")` 读字节再 base64 编码内联到请求里,不会自动下载远程 URL)。默认数据已随包附带可直接运行的示例图,路径按 `python multimodal_kg_pipeline.py` 在 `api_pipelines/` 目录下的相对位置给出
- Sample images: `example_data/MultimodalKGPipeline/images/cyber.jpg`, `example_data/MultimodalKGPipeline/images/musk.jpg`

For real image-text workloads, values in `img_dict` must be valid local image paths. The default JSON file is provided as a runnable input structure.
Values in `img_dict` are the actual paths the VLM serving layer opens. Only **local paths** are supported today: the serving layer calls `open(path, "rb")` and base64-encodes the bytes into the request, and it does not fetch remote URLs. The default data ships with runnable example images, and the paths are written relative to the `api_pipelines/` directory (the CWD when you run `python multimodal_kg_pipeline.py`).
- 示例图片:`example_data/MultimodalKGPipeline/images/cyber.jpg`、`example_data/MultimodalKGPipeline/images/musk.jpg`

真实图文场景中,`img_dict` 的 value 需要是本地可访问的图片路径;默认数据使用 JSON 格式,可直接作为结构示例
`img_dict` 的 value 是 VLM 实际打开的图片路径,目前仅支持**本地路径**(serving 层用 `open(path, "rb")` 读字节再 base64 编码内联到请求里,不会自动下载远程 URL)。默认数据已随包附带可直接运行的示例图,路径按 `python multimodal_kg_pipeline.py` 在 `api_pipelines/` 目录下的相对位置给出
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants