InternScience · ChenZiHong-Gavin · Dec 26, 2025 · Dec 26, 2025 · Dec 26, 2025 · Dec 26, 2025
diff --git a/README.md b/README.md
@@ -62,6 +62,7 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
 
 ## 📌 Latest Updates
 
+- **2025.12.26**: Added comprehensive knowledge graph evaluation metrics including accuracy assessment (entity/relation extraction quality), consistency assessment (conflict detection), and structural robustness assessment (noise ratio, connectivity, degree distribution).
 - **2025.12.16**: Added [rocksdb](https://github.com/facebook/rocksdb) for key-value storage backend and [kuzudb](https://github.com/kuzudb/kuzu) for graph database backend support.
 - **2025.12.16**: Added [vllm](https://github.com/vllm-project/vllm) for local inference backend support.
 - **2025.12.16**: Refactored the data generation pipeline using [ray](https://github.com/ray-project/ray) to improve the efficiency of distributed execution and resource management.

diff --git a/README_zh.md b/README_zh.md
@@ -62,6 +62,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱，然后利用期
 在数据生成后，您可以使用[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) 和 [xtuner](https://github.com/InternLM/xtuner)对大语言模型进行微调。
 
 ## 📌 最新更新
+- **2025.12.26**: 新增知识图谱评估指标，包括准确度评估（实体/关系抽取质量）、一致性评估（冲突检测）和结构鲁棒性评估（噪声比、连通性、度分布）。
 - **2025.12.16**:新增 [rocksdb](https://github.com/facebook/rocksdb) 作为键值存储后端, [kuzudb](https://github.com/kuzudb/kuzu) 作为图数据库后端的支持。
 - **2025.12.16**:新增 [vllm](https://github.com/vllm-project/vllm) 作为本地推理后端的支持。
 - **2025.12.16**:使用 [ray](https://github.com/ray-project/ray) 重构了数据生成 pipeline，提升了分布式执行和资源管理的效率。

diff --git a/graphgen/models/evaluator/kg/structure_evaluator.py b/graphgen/models/evaluator/kg/structure_evaluator.py
@@ -1,3 +1,4 @@
+from collections import Counter
 from typing import Any, Dict, Optional
 
 import numpy as np
@@ -81,14 +82,22 @@ def _calculate_powerlaw_r2(degree_map: Dict[str, int]) -> Optional[float]:
             return None
 
         try:
-            # Fit power law: log(y) = a * log(x) + b
-            log_degrees = np.log(degrees)
-            sorted_log_degrees = np.sort(log_degrees)
-            x = np.arange(1, len(sorted_log_degrees) + 1)
-            log_x = np.log(x)
+            degree_counts = Counter(degrees)
+            degree_values, frequencies = zip(*sorted(degree_counts.items()))
+
+            if len(degree_values) < 3:
+                logger.warning(
+                    f"Insufficient unique degrees ({len(degree_values)}) for power law fitting. "
+                    f"Graph may be too uniform."
+                )
+                return None
+
+            # Fit power law: log(frequency) = a * log(degree) + b
+            log_degrees = np.log(degree_values)
+            log_frequencies = np.log(frequencies)
 
             # Linear regression on log-log scale
-            r_value, *_ = stats.linregress(log_x, sorted_log_degrees)
+            r_value, *_ = stats.linregress(log_degrees, log_frequencies)
             r2 = r_value**2
 
             return float(r2)