Skip to content

Add KG Library Layer 4: Graph Backend Abstraction#2

Open
ydzhu98 wants to merge 7 commits intomainfrom
yzhu/kg-layer4-clean
Open

Add KG Library Layer 4: Graph Backend Abstraction#2
ydzhu98 wants to merge 7 commits intomainfrom
yzhu/kg-layer4-clean

Conversation

@ydzhu98
Copy link
Copy Markdown
Owner

@ydzhu98 ydzhu98 commented Jan 25, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

This PR implements Layer 4 of the Knowledge Graph Query Library for Mellea: Graph Backend Abstraction. This layer provides the foundation for executing graph queries across different database systems, starting with Neo4j support.

What's Included

1. Core Data Structures (base.py)

Pure dataclasses for representing graph data:

  • GraphNode: Represents a node with id, label, and properties
  • GraphEdge: Represents an edge connecting two nodes
  • GraphPath: Represents a path through the graph

Key features:

  • Simple dataclasses (not Components)
  • Factory methods for creating from Neo4j objects
  • Full type safety with type hints

2. Abstract Backend Interface (graph_dbs/base.py)

GraphBackend ABC following Mellea's Backend pattern:

class GraphBackend(ABC):
    - execute_query(query) -> GraphResult
    - get_schema() -> dict
    - validate_query(query) -> (bool, error)
    - supports_query_type(type) -> bool
    - close()

Design principles:

  • Follows Mellea's Backend(model_id, model_options) pattern
  • Abstract methods for core operations
  • Backend-agnostic interface

3. Neo4j Backend Implementation (graph_dbs/neo4j.py)

Full Neo4j integration:

Features:

  • ✅ Execute Cypher queries with parameters
  • ✅ Parse Neo4j results into GraphNode/GraphEdge/GraphPath
  • ✅ Retrieve graph schema (node types, edge types, properties)
  • ✅ Validate Cypher syntax using EXPLAIN
  • ✅ Automatic deduplication of nodes and edges
  • ✅ Support for paths
  • ✅ Async/await throughout
  • ✅ Proper connection management

Key implementation details:

  • Uses both sync and async Neo4j drivers
  • Caches nodes during parsing to handle relationships
  • Deduplicates results across multiple records
  • Handles Neo4j-specific types (Node, Relationship, Path)

4. Mock Backend for Testing (graph_dbs/mock.py)

Testing utility:

Features:

  • Predefined mock data
  • Query history tracking
  • Always validates queries as valid
  • Supports all query types
  • No database connection required

Use cases:

  • Unit testing without database
  • Development without Neo4j
  • CI/CD pipelines

5. Minimal Component Stubs

Temporary implementations for testing Layer 4:

  • components/query.py: Minimal GraphQuery class
  • components/result.py: Minimal GraphResult class
  • components/traversal.py: Minimal GraphTraversal class

Note: These will be replaced with full Component implementations in Layer 2.

Testing

Test Structure

test/contribs/kg/
├── test_base.py              # Data structure tests (9 tests)
├── test_mock_backend.py      # Mock backend tests (7 tests)
└── test_neo4j_backend.py     # Neo4j integration tests (14 tests)

Test Coverage

Total: 30 tests

  • 18 passing (without Neo4j)
  • 12 skipped (require running Neo4j instance)

Test Categories

  1. Base Data Structures (9 tests)

    • Node/Edge/Path creation
    • Property handling
    • Equality testing
  2. Mock Backend (7 tests)

    • Creation and configuration
    • Schema retrieval
    • Query validation
    • History tracking
  3. Neo4j Backend (14 tests)

    • Connection management
    • Query execution
    • Parameter binding
    • Result parsing
    • Schema retrieval
    • Query validation
    • Error handling

Running Tests

# All tests (mocks pass, Neo4j tests skip if no instance)
uv run pytest test/contribs/kg/ -v

# With Neo4j running:
docker run --rm -p 7687:7687 -p 7474:7474 \
    -e NEO4J_AUTH=neo4j/testpassword \
    neo4j:latest

uv run pytest test/contribs/kg/test_neo4j_backend.py -v

Module Structure

Following the design document requirements:

mellea/contribs/kg/
├── __init__.py                # Public API exports
├── base.py                    # Core data structures
├── graph_dbs/                 # Backend implementations
│   ├── __init__.py
│   ├── base.py                # GraphBackend ABC
│   ├── neo4j.py               # Neo4jBackend
│   └── mock.py                # MockGraphBackend
├── components/                # Minimal stubs for Layer 4
│   ├── __init__.py
│   ├── query.py               # GraphQuery (minimal)
│   ├── result.py              # GraphResult (minimal)
│   └── traversal.py           # GraphTraversal (minimal)
├── sampling/                  # Empty (Layer 3)
│   └── __init__.py
├── requirements/              # Empty (Layer 3)
│   └── __init__.py
└── README.md                  # Documentation

Design Decisions

1. Data Structures vs Components

  • GraphNode, GraphEdge, GraphPath are dataclasses, not Components
  • Simple, pure data representation
  • Components (with format_for_llm()) come in Layer 2

2. Backend Pattern

  • Follows Mellea's Backend abstraction for LLMs
  • backend_id and backend_options similar to model_id and model_options
  • Abstract methods for core operations
  • Easy to add new backends (Neptune, RDF, etc.)

3. Neo4j Element IDs

  • Uses element_id instead of deprecated id property
  • Compatible with Neo4j 5.x and 6.x

4. Result Deduplication

  • Automatically deduplicates nodes and edges
  • Handles UNION queries correctly
  • Maintains node cache for efficient edge creation

5. Async-First

  • All I/O operations are async
  • Uses Neo4j's AsyncGraphDatabase
  • Maintains sync driver for compatibility

API Examples

Basic Usage

from mellea.contribs.kg.graph_dbs import Neo4jBackend
from mellea.contribs.kg.components import GraphQuery

# Connect
backend = Neo4jBackend(
    connection_uri="bolt://localhost:7687",
    auth=("neo4j", "password"),
)

# Query
query = GraphQuery(
    query_string="MATCH (p:Person)-[:ACTED_IN]->(m:Movie) RETURN p, m",
    parameters={},
)
result = await backend.execute_query(query)

# Results
for node in result.nodes:
    print(f"{node.label}: {node.properties}")

await backend.close()

Validation

query = GraphQuery(query_string="MATCH (n) RETURN n")
is_valid, error = await backend.validate_query(query)

if not is_valid:
    print(f"Invalid: {error}")

Schema

schema = await backend.get_schema()
print(f"Node types: {schema['node_types']}")
print(f"Edge types: {schema['edge_types']}")

Documentation

Next Steps

After this PR is merged:

  1. Layer 2 PR: Implement full Graph Query Components

    • Convert minimal stubs to full Components
    • Add format_for_llm() implementations
    • Implement fluent Cypher query builder
    • Add multiple result format styles
  2. Layer 3 PR: Add LLM-Guided Query Construction

    • Implement @generative functions
    • Add validation strategies
    • Create query requirements
  3. Layer 1 PR: Application Examples

    • End-to-end usage examples
    • KGRag integration
    • Best practices documentation

Commits

This PR contains the following components:

  1. Core data structures (base.py)
  2. Abstract backend interface
  3. Neo4j backend implementation
  4. Mock backend for testing
  5. Minimal component stubs
  6. Comprehensive test suite
  7. Documentation

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Implements Layer 4 of the Knowledge Graph Query Library with Neo4j support:

Core Components:
- GraphNode, GraphEdge, GraphPath dataclasses (base.py)
- GraphBackend abstract interface (graph_dbs/base.py)
- Neo4jBackend full implementation (graph_dbs/neo4j.py)
- MockGraphBackend for testing (graph_dbs/mock.py)

Features:
✓ Execute Cypher queries with parameters
✓ Parse Neo4j results into graph structures
✓ Retrieve graph schema (node/edge types, properties)
✓ Validate Cypher syntax
✓ Automatic result deduplication
✓ Async/await throughout
✓ Proper connection management

Testing:
- 30 comprehensive tests (18 passing, 12 skip without Neo4j)
- Unit tests for data structures
- Mock backend tests
- Neo4j integration tests

Documentation:
- Complete README with examples
- Design document included
- Implementation summary for reviewers

No new dependencies (neo4j>=6.0.3 already in project).

Follows Mellea conventions and Backend pattern.
@github-actions
Copy link
Copy Markdown

The PR description has been updated. Please fill out the template for your PR to be reviewed.

yzhu added 3 commits January 25, 2026 10:04
- Replace list slice with next(iter()) for RUF015
- Add docstrings to all __init__.py files for D104
- Add type ignore comments for neo4j imports (no type stubs)
- Add assertions for Neo4j relationship nodes (mypy union-attr)
- Run ruff formatter on all files

All tests still pass (18 passing, 12 skipped).
- Convert text-based architecture overview to Mermaid flowchart
- Add color coding for each layer
- Improve visual clarity and readability
- Maintains same 4-layer structure
@ydzhu98 ydzhu98 force-pushed the yzhu/kg-layer4-clean branch from 17d640e to ef05884 Compare January 25, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant