You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It begins by constructing a fine-grained knowledge graph from the source text,then identifies knowledge gaps in LLMs using the expected calibration error metric, prioritizing the generation of QA pairs that target high-value, long-tail knowledge.
59
60
Furthermore, GraphGen incorporates multi-hop neighborhood sampling to capture complex relational information and employs style-controlled generation to diversify the resulting QA data.
@@ -62,20 +63,48 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
62
63
63
64
## 📌 Latest Updates
64
65
66
+
-**2025.10.30**: We support several new LLM clients and inference backends including [Ollama_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/ollama_client.py), [http_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/http_client.py), [HuggingFace Transformers](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/hf_wrapper.py) and [SGLang](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/sglang_wrapper.py).
65
67
-**2025.10.23**: We support VQA(Visual Question Answering) data generation now. Run script: `bash scripts/generate/generate_vqa.sh`.
66
68
-**2025.10.21**: We support PDF as input format for data generation now via [MinerU](https://github.com/opendatalab/MinerU).
67
-
-**2025.09.29**: We auto-update gradio demo on [Hugging Face](https://huggingface.co/spaces/chenzihong/GraphGen) and [ModelScope](https://modelscope.cn/studios/chenzihong/GraphGen).
68
69
69
70
<details>
70
71
<summary>History</summary>
71
72
73
+
-**2025.09.29**: We auto-update gradio demo on [Hugging Face](https://huggingface.co/spaces/chenzihong/GraphGen) and [ModelScope](https://modelscope.cn/studios/chenzihong/GraphGen).
72
74
-**2025.08.14**: We have added support for community detection in knowledge graphs using the Leiden algorithm, enabling the synthesis of Chain-of-Thought (CoT) data.
73
75
-**2025.07.31**: We have added Google, Bing, Wikipedia, and UniProt as search back-ends.
74
76
-**2025.04.21**: We have released the initial version of GraphGen.
75
77
76
78
</details>
77
79
78
80
81
+
## ⚙️ Support List
82
+
83
+
We support various LLM inference servers, API servers, inference clients, input file formats, data modalities, output data formats, and output data types.
84
+
Users can flexibly configure according to the needs of synthetic data.
85
+
86
+
| Inference Server | Api Server | Inference Client | Input File Format | Data Modal | Data Format | Data Type |
Experience GraphGen through [Web](https://g-app-center-120612-6433-jpdvmvp.openxlab.space) or [Backup Web Entrance](https://openxlab.org.cn/apps/detail/chenzihonga/GraphGen)
@@ -176,7 +205,7 @@ For any questions, please check [FAQ](https://github.com/open-sciencelab/GraphGe
176
205
Pick the desired format and run the matching script:
0 commit comments