[Issue]: <title> Substantial Indexation Speed Degradation v 0.3.2 -> 1.2.0 #1733
Labels
awaiting_response
Maintainers or community have suggested solutions or requested info, awaiting filer response
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
Do you need to file an issue?
Describe the issue
The time for creating a new graph index increased from 6.5 minutes in v0.3.2 to 33 minutes in v1.2.0.
The indexing time increase seems to be related to the entity_extraction process, which is very slow in the latest version for me. I tried to replicate the environment variables as closely as possible but without success so far.
To check that the increased compute time isn’t due to different prompts from prompt tuning, I ran a pipeline where I copied the tuned prompt from entity_extraction.txt in v0.3.2 into a v1.2.0 pipeline (copied over after the tuning step, of course), but the entity extraction is still slow for me. This leads me to believe something might be broken in the library, or that I'm missing something.
Steps to reproduce
v.1.2.0
`
----------- imports -----------
import subprocess
----------- graphrag init -----------
command_list = [
"python",
"-m",
"graphrag",
"init",
"--root", ROOT_PATH
]
result = subprocess.run(command_list, capture_output=capture_output, text=True)
Modify settings env file...
----------- Prompt Tune -----------
cmd = [
"python", "-m", "graphrag", "prompt-tune",
"--root", ROOT_PATH,
"--config", str(Path(ROOT_PATH, "settings.yaml")),
"--domain", domain,
"--no-discover-entity-types"
]
subprocess.run(cmd, check=True)
----------- Indexation -----------
cmd = [
"python",
"-m",
"graphrag", "index",
"--root", ROOT_PATH,
"--output", str(Path(ROOT_PATH, "output")),
"--verbose"
]
subprocess.run(cmd, check=True)
`
v.0.3.2
`
----------- imports -----------
import subprocess
----------- graphrag init -----------
command_setup = [
'python',
'-m',
'graphrag.index',
'--init',
'--root',
ROOT_PATH
]
subprocess.run(command_setup, capture_output=False, text=False)
Modify settings env file...
----------- Prompt Tune -----------
cmd = [
"python", "-m", "graphrag.prompt_tune",
"--root", ROOT_PATH,
"--config", str(Path(ROOT_PATH, "settings.yaml")),
"--no-entity-types"
]
subprocess.run(cmd, check=True)
----------- Indexation -----------
cmd = [
"python",
"-m",
"graphrag.index",
"--root",
ROOT_PATH
]
subprocess.run(cmd, check=True)
`
GraphRAG Config Used
Logs and screenshots
No response
Additional Information
The text was updated successfully, but these errors were encountered: