diff --git a/README.md b/README.md index 99288683..5bff7e41 100644 --- a/README.md +++ b/README.md @@ -62,6 +62,7 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL ## 📌 Latest Updates +- **2025.12.26**: Added comprehensive knowledge graph evaluation metrics including accuracy assessment (entity/relation extraction quality), consistency assessment (conflict detection), and structural robustness assessment (noise ratio, connectivity, degree distribution). - **2025.12.16**: Added [rocksdb](https://github.com/facebook/rocksdb) for key-value storage backend and [kuzudb](https://github.com/kuzudb/kuzu) for graph database backend support. - **2025.12.16**: Added [vllm](https://github.com/vllm-project/vllm) for local inference backend support. - **2025.12.16**: Refactored the data generation pipeline using [ray](https://github.com/ray-project/ray) to improve the efficiency of distributed execution and resource management. diff --git a/README_zh.md b/README_zh.md index f15f5523..6fef86bf 100644 --- a/README_zh.md +++ b/README_zh.md @@ -62,6 +62,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期 在数据生成后,您可以使用[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) 和 [xtuner](https://github.com/InternLM/xtuner)对大语言模型进行微调。 ## 📌 最新更新 +- **2025.12.26**: 新增知识图谱评估指标,包括准确度评估(实体/关系抽取质量)、一致性评估(冲突检测)和结构鲁棒性评估(噪声比、连通性、度分布)。 - **2025.12.16**:新增 [rocksdb](https://github.com/facebook/rocksdb) 作为键值存储后端, [kuzudb](https://github.com/kuzudb/kuzu) 作为图数据库后端的支持。 - **2025.12.16**:新增 [vllm](https://github.com/vllm-project/vllm) 作为本地推理后端的支持。 - **2025.12.16**:使用 [ray](https://github.com/ray-project/ray) 重构了数据生成 pipeline,提升了分布式执行和资源管理的效率。 diff --git a/graphgen/models/evaluator/kg/structure_evaluator.py b/graphgen/models/evaluator/kg/structure_evaluator.py index d9fa45a9..997639be 100644 --- a/graphgen/models/evaluator/kg/structure_evaluator.py +++ b/graphgen/models/evaluator/kg/structure_evaluator.py @@ -1,3 +1,4 @@ +from collections import Counter from typing import Any, Dict, Optional import numpy as np @@ -81,14 +82,22 @@ def _calculate_powerlaw_r2(degree_map: Dict[str, int]) -> Optional[float]: return None try: - # Fit power law: log(y) = a * log(x) + b - log_degrees = np.log(degrees) - sorted_log_degrees = np.sort(log_degrees) - x = np.arange(1, len(sorted_log_degrees) + 1) - log_x = np.log(x) + degree_counts = Counter(degrees) + degree_values, frequencies = zip(*sorted(degree_counts.items())) + + if len(degree_values) < 3: + logger.warning( + f"Insufficient unique degrees ({len(degree_values)}) for power law fitting. " + f"Graph may be too uniform." + ) + return None + + # Fit power law: log(frequency) = a * log(degree) + b + log_degrees = np.log(degree_values) + log_frequencies = np.log(frequencies) # Linear regression on log-log scale - r_value, *_ = stats.linregress(log_x, sorted_log_degrees) + r_value, *_ = stats.linregress(log_degrees, log_frequencies) r2 = r_value**2 return float(r2)