VANILLA: Validated Knowledge Graph Completion

A Normalization-based Framework for Integrity, Link Prediction, and Logical Accuracy

🔍 Overview

VANILLA is a comprehensive framework designed to enhance Knowledge Graph Completion (KGC) by validating inferred facts using logical constraints derived from domain knowledge and normalizing the exisiting anomalies in KGs. Unlike traditional methods that rely solely on vector embeddings, VANILLA integrates symbolic reasoning with numerical learning to ensure logical validity, integrity, and accuracy of predictions. The framework uses SHACL constraints to validate predictions and supports symbolic rule mining, constraint checking, and KGC evaluation with state-of-the-art embedding models.

📁 Repository Structure

.
├── KG/                         # Benchmark knowledge graphs
│   ├── French_Royalty/
│   ├── SGKG/
│   ├── SynthLC-1000/
│   ├── SynthLC-10000/
│   ├── YAGO3-10/
│   └── DB100K/
│
├── Rules/                      # Symbolic horn rules for each benchmark
│   ├── French_Royalty/
│   ├── SGKG/
│   ├── SynthLC-1000/
│   ├── SynthLC-10000/
│   ├── YAGO3-10/
│   └── DB100K/
├── Constraints/                # SHACL constraints
│   ├── French_Royalty/
│   ├── SGKG/
│   ├── SynthLC-1000/
│   ├── SynthLC-10000/
│   ├── YAGO3-10/
│   └── DB100K/
│
├── Predictions/                # Output predictions
├── LICENSE.txt
├── README.md
├── input.json
├── requirements.txt
├── symbolic_predictions_updated.py
├── transform_new.py
└── validation.py

📊 Benchmark Statistics

KG Size	Benchmark	#Triples	#Entities	#Relations
Large	DB100K	695,572	99,604	470
	SynthLC-10000	106,549	10,000	9
Medium	YAGO3-10	1,080,264	123,086	37
	SGKG	54,585	36,450	6
Small	French Royalty	10,526	2,601	12
	SynthLC-1000	10,668	1,000	9

KG Size	Benchmark	#Constraints	#Valid	#Invalid
Large	DB100K	6	390,351	62,024
	SynthLC-10000	25	223,523	26,477
Medium	YAGO3-10	4	393,205	58,719
	SGKG	5	156,965	12,150
Small	French Royalty	2	1,922	298
	SynthLC-1000	25	22,335	2,665

⚙️ Setup Instructions

Clone the repository

git clone [email protected]:SDM-TIB/VANILLA.git

Install dependencies
```
pip install -r requirements.txt
```
Configure input Modify input.json to select the benchmark KG and rule/constraint files.

🚀 Running the Pipeline

1. Symbolic Predictions & Constraint Validation

Run the script to generate predictions and validate them:

python symbolic_predictions_updated.py

This will:

Generate inferred predictions using logical rules.
Validate them against SHACL constraints.
Output:
- Transformed KGs
- Constraint validation reports in Constraints/
- Predictions in Predictions/

📈 Evaluation Metrics

We evaluate KG completion using embedding models:

TransE, TransH, TransD
RotatE, ComplEx, TuckER
CompGCN

Metrics reported:

Hits@1, Hits@3, Hits@5, Hits@10
Mean Reciprocal Rank (MRR)

🧠 Graphical Summary

The VANILLA framework integrates symbolic rules, domain constraints, and neural embeddings for high-quality knowledge graph completion. It identifies valid and invalid triples using evolving logical constraints and employs numerical models to infer missing links, ensuring semantic consistency and logical soundness in the normalized KG.

📄 License

This project is licensed under the terms of the LICENSE.txt.

Authors

VANILLA has been developed by members of the Scientific Data Management Group at TIB, as an ongoing research effort. The development is co-ordinated and supervised by Maria-Esther Vidal. We strongly encourage you to report any issues you have with VANILLA. Please, use the GitHub issue tracker to do so. VANILLA has been implemented in joint work by Disha Purohit, and Yashrajsinh Chudasama.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VANILLA: Validated Knowledge Graph Completion

A Normalization-based Framework for Integrity, Link Prediction, and Logical Accuracy

🔍 Overview

📁 Repository Structure

📊 Benchmark Statistics

⚙️ Setup Instructions

🚀 Running the Pipeline

1. Symbolic Predictions & Constraint Validation

📈 Evaluation Metrics

🧠 Graphical Summary

📄 License

Authors

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Constraints		Constraints
KG		KG
Predictions		Predictions
Rules		Rules
Transformed_FrenchRoyalty		Transformed_FrenchRoyalty
Transformed_LC		Transformed_LC
Transformed_SGKG4		Transformed_SGKG4
Transformed_SynthLC-1000		Transformed_SynthLC-1000
Transformed_SynthLC-10000		Transformed_SynthLC-10000
Transformed_YAGO_MotivatingExample		Transformed_YAGO_MotivatingExample
LICENSE.txt		LICENSE.txt
README.md		README.md
Symbolic_predictions_updated.py		Symbolic_predictions_updated.py
input.json		input.json
requirements.txt		requirements.txt
transform_new.py		transform_new.py
validation.py		validation.py

License

SDM-TIB/VANILLA

Folders and files

Latest commit

History

Repository files navigation

VANILLA: Validated Knowledge Graph Completion

A Normalization-based Framework for Integrity, Link Prediction, and Logical Accuracy

🔍 Overview

📁 Repository Structure

📊 Benchmark Statistics

⚙️ Setup Instructions

🚀 Running the Pipeline

1. Symbolic Predictions & Constraint Validation

📈 Evaluation Metrics

🧠 Graphical Summary

📄 License

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages