-
Notifications
You must be signed in to change notification settings - Fork 13
feat: KGRag - Knowledge Graph-Enhanced RAG with Mellea #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ydzhu98
wants to merge
47
commits into
generative-computing:main
Choose a base branch
from
ydzhu98:yzhu/missing_components
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
e11a209
add kgrag layer 4
7dfd58d
fix some formatting issue
ec5e2d6
Layer 3 of knowledge graph
bd107db
layer 2 of the project
bebc66b
layer 1 of kgrag
4e30946
Fill in the most important functions of KGRag
36457f3
Fill in the core missing functions
411ba59
Finished the less important functions
1ebf39c
Adding testing files
44825be
Fix issues found in the testing
42964ec
Implement Phase 2: 8 runnable scripts for KG-RAG pipeline
6e119ba
Extract common functions from the scripts
35919aa
Add test function for the utility files
aa4fa5d
Add the final run.sh script so that one can call it directly.
7be0ce1
Add two read me files for the library and the example
4e994c9
fix the issue so that run.sh is runable. However further update are e…
3cf11b4
Rewrite preprocessing and embedding scripts for mellea-style pipeline
b130db0
Enhance run_kg_update.py to match mellea's architecture and functiona…
eb017a3
Add comprehensive KG-RAG pipeline architecture documentation
60073db
Add run_kg_update.py as Step 3 to run.sh pipeline
d8a603d
Enable run_kg_update.py to load LLM/Neo4j config from .env file
e40da15
Fix run_kg_update.py to properly use RITS model from environment
272de0d
Update KG-RAG example README with three-stage pipeline documentation
2b36a1e
Remove large data files from git tracking (keep locally)
fca3a15
Add dataset directory documentation for large data files
523b244
update the run.sh for step 0
6e08755
Fix the run_kg_update and run_qa
6c70efe
fix the run_eval.py and readme
fe0f99e
remove some intermediate files
e025a6e
remove the data from git
2ad1b85
Refactor the scripts to better utilize the library
8d75eab
Address some of the issue from comments
427f644
Additional changes found during review
f3af8ba
Address additional issues of the comments from the PR
7a15435
Fix the embeder after refactoring
6125e56
Update embedder
e5c2b16
Another round of update
89b1132
make layer 1 only talk to layer 4 via layer 3
670c25e
Clean readme and comments
932b75f
Update the two readme
941275a
Deep intergrate with the predefined classes
16c487e
Speed up the embedding step
6428ed8
Address some of the comment left in the PR.
b18be73
Refactor code to clean further
a714442
chore: remove CLAUDE.md from repository
7275226
refactor(test): rename test files to reflect correct layer numbers an…
49e6d3f
Merge branch 'main' into yzhu/missing_components
ydzhu98 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| # Graph Database Configuration (Neo4j default; adjust for other graph DBs) | ||
| NEO4J_URI=bolt://localhost:7687 | ||
| NEO4J_USER=neo4j | ||
| NEO4J_PASSWORD=password | ||
|
|
||
| # Data Directory | ||
| KG_BASE_DIRECTORY=./dataset | ||
| DATA_PATH=./data | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Primary LLM — any OpenAI-compatible endpoint | ||
| # --------------------------------------------------------------------------- | ||
| # Option A: OpenAI | ||
| # API_KEY=sk-... | ||
| # MODEL_NAME=gpt-4o-mini | ||
|
|
||
| # Option B: Local Ollama | ||
| # API_BASE=http://localhost:11434/v1 | ||
| # API_KEY=ollama | ||
| # MODEL_NAME=llama3.2 | ||
|
|
||
| # Option C: vLLM / self-hosted | ||
| # API_BASE=http://localhost:8000/v1 | ||
| # API_KEY=dummy | ||
| # MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct | ||
|
|
||
| # Option D: Azure OpenAI | ||
| # API_BASE=https://<your-resource>.openai.azure.com/openai/deployments/<deployment>/ | ||
| # API_KEY=<azure-key> | ||
| # MODEL_NAME=gpt-4o-mini | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Optional: Separate evaluation model (defaults to primary LLM if unset) | ||
| # --------------------------------------------------------------------------- | ||
| # EVAL_API_BASE=... | ||
| # EVAL_API_KEY=... | ||
| # EVAL_MODEL_NAME=... | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Optional: Embedding model for vector entity alignment | ||
| # --------------------------------------------------------------------------- | ||
| # EMB_API_BASE=http://localhost:11434/v1 | ||
| # EMB_API_KEY=ollama | ||
| # EMB_MODEL_NAME=nomic-embed-text | ||
| # VECTOR_DIMENSIONS=768 | ||
|
|
||
| # Request Configuration | ||
| MAX_RETRIES=3 | ||
| TIME_OUT=1800 | ||
|
|
||
| # OpenTelemetry — disable if you don't have a collector running | ||
| OTEL_SDK_DISABLED=true |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.