Skip to content

Commit 18be127

Browse files
committed
feat: add KG quality evaluation module
1 parent fbc8d52 commit 18be127

File tree

12 files changed

+1043
-1
lines changed

12 files changed

+1043
-1
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
python3 -m graphgen.operators.evaluate_kg.evaluate_kg \
2+
--working_dir cache \
3+
--graph_backend kuzu \
4+
--kv_backend rocksdb \
5+
--sample_size 100 \
6+
--max_concurrent 10

graphgen/models/__init__.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
1-
from .evaluator import LengthEvaluator, MTLDEvaluator, RewardEvaluator, UniEvaluator
1+
from .evaluator import (
2+
KGQualityEvaluator,
3+
LengthEvaluator,
4+
MTLDEvaluator,
5+
RewardEvaluator,
6+
UniEvaluator,
7+
)
28
from .generator import (
39
AggregatedGenerator,
410
AtomicGenerator,

graphgen/models/evaluator/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from .kg_quality_evaluator import KGQualityEvaluator
12
from .length_evaluator import LengthEvaluator
23
from .mtld_evaluator import MTLDEvaluator
34
from .reward_evaluator import RewardEvaluator
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# KG Quality Evaluation Module
2+
3+
This module provides comprehensive quality evaluation for knowledge graphs built by GraphGen.
4+
5+
## Module Structure
6+
7+
The evaluation functionality has been split into modular components:
8+
9+
- **`accuracy_evaluator.py`**: Entity/relation/triple accuracy evaluation using LLM-as-judge
10+
- **`consistency_evaluator.py`**: Attribute value conflict detection
11+
- **`structure_evaluator.py`**: Graph structural robustness metrics
12+
- **`utils.py`**: Utility functions (NetworkX conversion, text retrieval, sampling)
13+
- **`kg_quality_evaluator.py`**: Main evaluator class that integrates all modules
14+
15+
## Features
16+
17+
### 1. Accuracy Assessment
18+
- **Entity Recognition Accuracy**: Samples entities and validates them using LLM
19+
- **Relation Extraction Accuracy**: Samples relations and validates them using LLM
20+
- **Triple Validation (RLC)**: Samples triples and validates them using LLM
21+
- Calculates Precision, Recall, and F1 scores for each metric
22+
23+
### 2. Consistency Assessment
24+
- Detects attribute value conflicts (same entity, same attribute, different values)
25+
- Calculates conflict rate: `conflict_entities_count / total_entities`
26+
- Returns detailed conflict information
27+
28+
### 3. Structural Robustness Assessment
29+
- **Noise Ratio**: Isolated nodes / total nodes (threshold: < 15%)
30+
- **Largest Connected Component Ratio**: Largest CC nodes / total nodes (threshold: > 90%)
31+
- **Average Node Degree**: Average degree across all nodes (threshold: 2-5)
32+
- **Power Law Distribution R²**: Degree distribution fit (threshold: > 0.75)
33+
34+
## Usage
35+
36+
### Command Line Usage
37+
38+
```bash
39+
# Run all evaluations
40+
python -m graphgen.operators.evaluate_kg.evaluate_kg --working_dir cache
41+
42+
# Run specific evaluation
43+
python -m graphgen.operators.evaluate_kg.evaluate_kg --working_dir cache --accuracy_only
44+
45+
# Custom configuration
46+
python -m graphgen.operators.evaluate_kg.evaluate_kg \
47+
--working_dir cache \
48+
--sample_size 200 \
49+
--graph_backend networkx \
50+
--kv_backend json_kv
51+
```
52+
53+
### Shell Script Usage
54+
55+
```bash
56+
# Basic usage
57+
bash examples/evaluate_kg/evaluate_kg.sh
58+
59+
# With custom options
60+
bash examples/evaluate_kg/evaluate_kg.sh \
61+
--working_dir cache \
62+
--sample_size 200 \
63+
--accuracy_only
64+
```
65+
66+
## Requirements
67+
68+
- **NetworkX**: Required for structural evaluation
69+
- **scipy**: Required for power law distribution fitting
70+
- **numpy**: Required for numerical calculations
71+
- **LLM Client**: Required for accuracy evaluation (configure via `TRAINEE_*` env vars)
72+
73+
## Output Format
74+
75+
The evaluation returns a dictionary with the following structure:
76+
77+
```python
78+
{
79+
"accuracy": {
80+
"entity_accuracy": {
81+
"precision": float,
82+
"recall": float,
83+
"f1": float,
84+
"true_positives": int,
85+
"false_positives": int,
86+
"sample_size": int
87+
},
88+
"relation_accuracy": { ... },
89+
"triple_accuracy": { ... }
90+
},
91+
"consistency": {
92+
"conflict_rate": float,
93+
"conflict_entities_count": int,
94+
"total_entities": int,
95+
"conflicts": [ ... ]
96+
},
97+
"structure": {
98+
"total_nodes": int,
99+
"total_edges": int,
100+
"noise_ratio": float,
101+
"largest_cc_ratio": float,
102+
"avg_degree": float,
103+
"powerlaw_r2": float | None,
104+
"thresholds": {
105+
"noise_ratio": { "value": float, "threshold": float, "pass": bool },
106+
...
107+
}
108+
}
109+
}
110+
```
111+
112+
## Notes
113+
114+
- Accuracy evaluation requires LLM API access and may be slow for large sample sizes
115+
- Structural evaluation automatically converts Kuzu storage to NetworkX for analysis
116+
- All evaluations include error handling and will return error messages if something fails
117+
- The evaluator automatically loads graph and chunk storage from the working directory
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from .accuracy_evaluator import AccuracyEvaluator
2+
from .consistency_evaluator import ConsistencyEvaluator
3+
from .structure_evaluator import StructureEvaluator
4+
from .utils import convert_to_networkx, get_relevant_text, get_source_text, sample_items
5+
6+
__all__ = [
7+
"AccuracyEvaluator",
8+
"ConsistencyEvaluator",
9+
"StructureEvaluator",
10+
"convert_to_networkx",
11+
"get_relevant_text",
12+
"get_source_text",
13+
"sample_items",
14+
]

0 commit comments

Comments
 (0)