This project systematically evaluates Microsoft's GraphRAG framework on a German tax law dataset, combining LLM-based retrieval with advanced graph search.
🔹 Built full GraphRAG pipelines (Indexing, Prompt Tuning, Baseline, Querying)
🔹 Used vLLM to serve local LLaMA-3.1-8B-Instruct models as an OpenAI-compatible API format
🔹 Customized large language models (LLaMA-3.1-8B-Instruct) and embeddings (e5-mistral-7b-instruct)
🔹 Benchmarked retrieval effectiveness using RAGAS and GPT-4o-mini evaluations
🔹 Scalable to HPC clusters with Slurm job automation and optimized for A100 GPUs
Clone this repository and install the required packages:
pip install -r requirements.txtFollow Microsoft's GraphRAG Installation Guide to correctly configure GraphRAG.
The GraphRAG workspace used in this project is named legalRAG.
| Purpose | Model Name |
|---|---|
| Main LLM | Llama-3.1-8B-Instruct |
| Embedding Model | e5-mistral-7b-instruct |
| Evaluation LLM (RAGAS) | gpt-4o-mini |
The following changes were made to improve performance:
| Parameter | Original | New | Description |
|---|---|---|---|
LLM_MAX_TOKENS |
4000 | 1024 | Reduced maximum context size |
LLM_TEMPERATURE |
0 | 0.3 | Introduced slight variability in responses |
LLM_TOP_P |
1 | 0.8 | Limited token diversity |
LLM_CONCURRENT_REQUESTS |
25 | 15 | Reduced concurrent requests for stability |
All tasks (indexing, tuning, baseline creation, querying) are handled using a single Slurm job script:
run_graphrag_job.sh
sbatch run_graphrag_job.sh [index|tune|query|baseline]-
To run indexing:
sbatch run_graphrag_job.sh index
-
To run prompt tuning:
sbatch run_graphrag_job.sh tune
-
To create baseline (without GraphRAG retrieval):
sbatch run_graphrag_job.sh baseline
-
To run querying:
sbatch run_graphrag_job.sh query
Each job automatically:
- Sets the correct job name
- Redirects logs to appropriate folders (
batch_logs/logs/) - Activates the environment
- Executes the corresponding Python script
GraphRAG_Evaluation/
├── run_graphrag_job.sh # Unified Slurm job submission script
├── scripts/
│ ├── indexing.py # Indexing script
│ ├── prompt_tune.py # Prompt tuning script
│ ├── query_graphrag.py # GraphRAG querying script
│ └── baseline_query.py # Baseline (LLM-only querying) script
├── requirements.txt # Python package requirements
└── README.md| Component | Description |
|---|---|
| vLLM | Version 0.6.5 |
| GraphRAG | Version 1.0.0 |
| Requirements | See requirements.txt (all libraries and versions listed) |
| Purpose | Model |
|---|---|
| Main LLM | meta-llama/Llama-3.1-8B-Instruct |
| Embedding Model | intfloat/e5-mistral-7b-instruct |
| Evaluation Model | gpt-4o-mini |
Evaluation metrics were computed using RAGAS to assess answer correctness and answer relevancy.
GraphRAG’s retrieval-augmented approach was observed to improve grounding and relevance of generated responses compared to querying the LLM alone.
Detailed evaluation findings will be made available upon official publication.
- This repository assumes that GraphRAG is properly installed and accessible.
- Ensure your compute environment has sufficient GPU memory.
- The workspace used is named
legalRAG. - All Python scripts (
indexing.py,prompt_tune.py,baseline_query.py,query_graphrag.py) should be adapted if applying to different datasets.