Skip to content

feat: kg#780

Open
mykhailobuleshnyi wants to merge 4 commits into
mainfrom
feat/kg
Open

feat: kg#780
mykhailobuleshnyi wants to merge 4 commits into
mainfrom
feat/kg

Conversation

@mykhailobuleshnyi

@mykhailobuleshnyi mykhailobuleshnyi commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Note

Medium Risk
New ingestion path writes to Neo4j with LLM output and heuristic entity linking; edge-only ACL metadata and ontology filtering reduce leak risk but wrong merges or bad extractions could still pollute the graph.

Overview
Adds LLM-driven knowledge graph ingestion to Dynamiq: documents can be turned into Neo4j-ready node/relationship payloads and upserted in workflows alongside vector stores.

EntityExtractor runs extraction per document and emits shapes compatible with Neo4jGraphStore.write_graph. Optional Ontology / Triple schemas constrain the LLM in the prompt and hard-filter results; ontology attributes become HAS_ATTRIBUTEAttributeValue edges instead of node properties. Document metadata and provenance are applied only on relationship properties (entities stay identity-only).

KnowledgeGraphWriter subclasses the extractor to add optional trigram-based entity resolution against the live graph, then write_graph (Neo4j today; other backends error until implemented). Neo4jGraphWriter is a separate thin writer for pre-built payloads.

New GraphRAG examples (kg_ingestion.py, kg_question_answering.py) wire parallel Qdrant + Neo4j ingestion and an agent with vector search + CypherExecutor. Unit tests cover sanitization, JSON parsing, and stub-LLM execution paths.

Reviewed by Cursor Bugbot for commit 3148884. Bugbot is set up for automated code reviews on this repo. Configure here.

@mykhailobuleshnyi mykhailobuleshnyi requested a review from a team as a code owner June 11, 2026 13:28
verify_ssl=self.connection.verify_ssl,
timeout=self.connection.timeout,
)
return Neo4jGraphStore(connection=self.connection, client=client, database=self.database)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bypasses ConnectionManager for graph client

Medium Severity

KnowledgeGraph builds its graph store with self.connection.connect() instead of the shared client from ConnectionManager via ConnectionNode. That skips pooling/reuse, can open extra Neo4j drivers for the same connection, and never gets ensure_client() reconnection during run().

Fix in Cursor Fix in Web

Triggered by project rule: Bugbot Rules for Dynamiq

Reviewed by Cursor Bugbot for commit 83dea14. Configure here.

raise NotImplementedError(
f"{type(self).__name__}: write is not supported for backend "
f"{type(self._graph_store).__name__}. Only Neo4j currently implements write_graph."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uninitialized store misreported as unsupported

Medium Severity

When _graph_store is still None (e.g. init_components was not run), execute raises NotImplementedError naming NoneType as an unsupported backend instead of indicating the graph store was never initialized.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 83dea14. Configure here.

"keys": [],
}

result = self._graph_store.write_graph(nodes=nodes, relationships=relationships, database=self.database)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neo4j writer missing store guard

Medium Severity

Neo4jGraphWriter.execute calls self._graph_store.write_graph without verifying _graph_store was created in init_components, so a direct or misconfigured run can raise AttributeError instead of a controlled error.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 83dea14. Configure here.

continue
existing = self._existing_nodes(label)
if any(ex_id == new_id for ex_id, _ in existing):
continue # same id already in graph -> write_graph MERGE handles it

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graph id type mismatch skips dedup

Medium Severity

Duplicate resolution compares existing graph id values to extracted string ids with == without normalizing types, so the same logical id can fail the “already in graph” check and trigger redundant trigram linking or duplicate nodes.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 83dea14. Configure here.

relationship_types: list[str] | None = None
ontology: Ontology | None = None
response_format: dict[str, Any] | None = None
input_schema: ClassVar[type[EntityExtractorInputSchema]] = EntityExtractorInputSchema

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing YAML roundtrip tests

Medium Severity

New nodes EntityExtractor, KnowledgeGraph, and Neo4jGraphWriter (nested llm and connection serialization) ship without workflow YAML serialize/deserialize roundtrip coverage required for this project.

Additional Locations (2)
Fix in Cursor Fix in Web

Triggered by project rule: Bugbot Rules for Dynamiq

Reviewed by Cursor Bugbot for commit 83dea14. Configure here.

@github-actions

Copy link
Copy Markdown

Coverage

Coverage Report •
FileStmtsMissCoverMissing
dynamiq/nodes/extractors
   entity_extractor.py1975174%116, 119–121, 128, 157, 186–187, 198–199, 204–208, 212, 214, 225–228, 238–244, 246, 250–258, 260–262, 266–268, 272, 296–298, 301, 343, 381
   knowledge_graph.py1117829%22–23, 27–30, 76, 79–81, 85–88, 92–94, 100–101, 108, 112–113, 118, 121–122, 124–125, 127–128, 132, 140, 145, 156–160, 163–165, 171–180, 182–189, 194–195, 197–205, 208–216
dynamiq/nodes/writers
   graph.py462252%51–53, 57–64, 73–75, 77–78, 80–82, 90–91, 95
TOTAL33661983770% 

Tests Skipped Failures Errors Time
2467 2 💤 0 ❌ 0 🔥 2m 50s ⏱️

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 6 total unresolved issues (including 5 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3148884. Configure here.

if rel["start_identity"] in id_remap:
rel["start_identity"] = id_remap[rel["start_identity"]]
if rel["end_identity"] in id_remap:
rel["end_identity"] = id_remap[rel["end_identity"]]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attribute IDs not remapped

Medium Severity

When resolve_duplicates remaps an extracted entity id onto an existing graph node, only exact id matches in id_remap are updated. AttributeValue node ids and relationship identities use the {entity_id}::{attr_key} form, so they keep the old entity prefix while HAS_ATTRIBUTE starts point at the canonical id, leaving mismatched attribute nodes and duplicate attribute values on re-ingestion.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3148884. Configure here.

@mykhailobuleshnyi mykhailobuleshnyi changed the title feat/kg feat: kg Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant