Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 15 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ deriva/
├── common/ (Shared utilities)
│ ├── types.py - Shared TypedDicts, Protocols, ProgressReporter
│ ├── logging.py - Pipeline logging (JSON Lines)
│ ├── logging.py - Pipeline logging with structlog (JSON Lines output)
│ ├── chunking.py - File chunking with overlap support
│ └── utils.py - File encoding, helpers
Expand Down Expand Up @@ -719,13 +719,14 @@ def add_node(self, node: GraphNode, node_id: str | None = None) -> str:

### Overview

The LLM adapter (`adapters/llm/`) provides a unified interface for multiple LLM providers with caching and structured output support.
The LLM adapter (`adapters/llm/`) provides a unified interface for multiple LLM providers using **pydantic-ai** for agent-based LLM interactions with automatic retries, caching, and structured output support.

**Supported Providers:**

- **Azure OpenAI** - Enterprise Azure deployments
- **OpenAI** - Direct OpenAI API
- **Anthropic** - Claude models
- **Mistral** - Mistral AI models
- **Ollama** - Local LLM inference (no API key required)

### Basic Usage
Expand All @@ -744,9 +745,9 @@ else:
print(f"Error: {response.error}")
```

### Structured Output with Pydantic
### Structured Output with pydantic-ai

Use `response_model` to get validated, type-safe responses:
Use `response_model` to get validated, type-safe responses via pydantic-ai agents:

```python
from pydantic import BaseModel, Field
Expand Down Expand Up @@ -1035,23 +1036,28 @@ def classify_files(
- Registry comes from DatabaseManager (passed as data, not manager)

#### `logging.py`
**Goal:** JSON Lines logging for pipeline runs with configurable verbosity.
**Goal:** JSON Lines logging for pipeline runs using structlog with configurable verbosity.

```python
class LogLevel(int, Enum):
PHASE = 1 # High-level: classification, extraction, derivation
STEP = 2 # Steps: Repository, Directory, File, etc.
DETAIL = 3 # Item-level: each file, node, edge

class PipelineLogger:
def log(self, level: int, phase: str, status: str, ...) -> None
class RunLogger:
"""Structured logger using structlog for JSON Lines output."""
def phase_start(self, phase: str, message: str = "") -> None
def phase_end(self, phase: str, message: str = "") -> None
def step(self, step_name: str) -> StepContext # Context manager
def get_entries(self, min_level: int = 1) -> List[LogEntry]
```

**Rules:**
- Logs stored in `logs/run_{id}/log_{datetime}.jsonl`

- Uses structlog for structured logging with JSON Lines output
- Logs stored in `workspace/logs/run_{id}/`
- Use level 1 for phase start/end, level 2 for steps, level 3 for details
- Logger instance created in app.py, passed to extraction functions
- Logger instance created in services, passed to extraction/derivation functions

#### `utils.py`
**Goal:** Shared utility functions for file handling and data processing.
Expand Down
29 changes: 17 additions & 12 deletions deriva/adapters/llm/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# LLM Adapter

Multi-provider LLM abstraction with caching and structured output support.
Multi-provider LLM abstraction using pydantic-ai with caching and structured output support.

**Version:** 1.0.0
**Version:** 2.0.0

## Purpose

The LLM adapter provides a unified interface for querying multiple LLM providers (Azure OpenAI, OpenAI, Anthropic, Ollama, LM Studio) with automatic caching and Pydantic-based structured output parsing.
The LLM adapter provides a unified interface for querying multiple LLM providers (Azure OpenAI, OpenAI, Anthropic, Mistral, Ollama, LM Studio) using **pydantic-ai** for agent-based interactions with automatic retries and Pydantic-based structured output parsing.

## Key Exports

Expand Down Expand Up @@ -56,7 +56,9 @@ if response.response_type == "live":
print(response.content)
```

## Structured Output with Pydantic
## Structured Output with pydantic-ai

Uses pydantic-ai agents for type-safe, validated responses:

```python
from pydantic import BaseModel, Field
Expand All @@ -72,7 +74,7 @@ result = llm.query(
prompt="Extract the main business concept from this code...",
response_model=BusinessConcept
)
# result is a validated BusinessConcept instance
# result is a validated BusinessConcept instance (via pydantic-ai agent)
print(result.name)
```

Expand Down Expand Up @@ -111,13 +113,16 @@ LLM_LMSTUDIO_LOCAL_URL=http://localhost:1234/v1/chat/completions

## Providers

| Provider | Class | Description |
|----------|-------|-------------|
| Azure OpenAI | `AzureOpenAIProvider` | Azure-hosted OpenAI models |
| OpenAI | `OpenAIProvider` | OpenAI API direct |
| Anthropic | `AnthropicProvider` | Claude models |
| Ollama | `OllamaProvider` | Local Ollama models |
| LM Studio | `LMStudioProvider` | Local LM Studio (OpenAI-compatible) |
All providers are implemented via pydantic-ai's model abstraction:

| Provider | pydantic-ai Model | Description |
|--------------|--------------------| ------------------------------------|
| Azure OpenAI | `AzureOpenAIModel` | Azure-hosted OpenAI models |
| OpenAI | `OpenAIModel` | OpenAI API direct |
| Anthropic | `AnthropicModel` | Claude models |
| Mistral | `MistralModel` | Mistral AI models |
| Ollama | `OllamaModel` | Local Ollama models |
| LM Studio | `OpenAIModel` | Local LM Studio (OpenAI-compatible) |

## Response Types

Expand Down
Loading
Loading