Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL

## 📌 Latest Updates

- **2025.12.26**: Added comprehensive knowledge graph evaluation metrics including accuracy assessment (entity/relation extraction quality), consistency assessment (conflict detection), and structural robustness assessment (noise ratio, connectivity, degree distribution).
- **2025.12.16**: Added [rocksdb](https://github.com/facebook/rocksdb) for key-value storage backend and [kuzudb](https://github.com/kuzudb/kuzu) for graph database backend support.
- **2025.12.16**: Added [vllm](https://github.com/vllm-project/vllm) for local inference backend support.
- **2025.12.16**: Refactored the data generation pipeline using [ray](https://github.com/ray-project/ray) to improve the efficiency of distributed execution and resource management.
Expand Down
1 change: 1 addition & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
在数据生成后,您可以使用[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) 和 [xtuner](https://github.com/InternLM/xtuner)对大语言模型进行微调。

## 📌 最新更新
- **2025.12.26**: 新增知识图谱评估指标,包括准确度评估(实体/关系抽取质量)、一致性评估(冲突检测)和结构鲁棒性评估(噪声比、连通性、度分布)。
- **2025.12.16**:新增 [rocksdb](https://github.com/facebook/rocksdb) 作为键值存储后端, [kuzudb](https://github.com/kuzudb/kuzu) 作为图数据库后端的支持。
- **2025.12.16**:新增 [vllm](https://github.com/vllm-project/vllm) 作为本地推理后端的支持。
- **2025.12.16**:使用 [ray](https://github.com/ray-project/ray) 重构了数据生成 pipeline,提升了分布式执行和资源管理的效率。
Expand Down
21 changes: 15 additions & 6 deletions graphgen/models/evaluator/kg/structure_evaluator.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from collections import Counter
from typing import Any, Dict, Optional

import numpy as np
Expand Down Expand Up @@ -81,14 +82,22 @@ def _calculate_powerlaw_r2(degree_map: Dict[str, int]) -> Optional[float]:
return None

try:
# Fit power law: log(y) = a * log(x) + b
log_degrees = np.log(degrees)
sorted_log_degrees = np.sort(log_degrees)
x = np.arange(1, len(sorted_log_degrees) + 1)
log_x = np.log(x)
degree_counts = Counter(degrees)
degree_values, frequencies = zip(*sorted(degree_counts.items()))

if len(degree_values) < 3:
logger.warning(
f"Insufficient unique degrees ({len(degree_values)}) for power law fitting. "
f"Graph may be too uniform."
)
return None

# Fit power law: log(frequency) = a * log(degree) + b
log_degrees = np.log(degree_values)
log_frequencies = np.log(frequencies)

# Linear regression on log-log scale
r_value, *_ = stats.linregress(log_x, sorted_log_degrees)
r_value, *_ = stats.linregress(log_degrees, log_frequencies)
r2 = r_value**2

return float(r2)
Expand Down