Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
262 changes: 261 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,262 @@
# MarketInference
A multi-agent parallel inference orchestrator for MarketAgents

A multi-agent parallel inference orchestrator for managing and coordinating AI model interactions. This library provides a robust framework for handling parallel AI completions, tool management, and structured outputs across multiple AI providers.

## Features

### Multi-Provider Support
- OpenAI (GPT models)
- Anthropic (Claude models)
- VLLM
- LiteLLM

### Core Capabilities
- Parallel request processing with intelligent rate limiting
- Tool registration and execution framework
- Structured message handling and history tracking
- Entity management with immutable patterns
- Comprehensive logging and monitoring
- Type-safe implementations using Pydantic

### Advanced Features
- Automatic schema validation for tools and outputs
- Asynchronous and synchronous execution support
- UUID-based entity tracking
- Configurable rate limiting per provider
- Message history management
- Tool execution orchestration

## Dependencies

Key dependencies and their purposes:
- **AI/ML**:
- `anthropic`: Anthropic Claude API integration
- `openai`: OpenAI API integration
- `tiktoken`: Token counting for rate limiting
- **Web/API**:
- `fastapi`: API framework support
- `uvicorn`: ASGI server
- `aiohttp`: Async HTTP client
- **Data Handling**:
- `pydantic`: Data validation and settings management
- `sqlmodel`: SQL database integration
- `polars`: Data manipulation
- **Development**:
- `pytest`: Testing framework
- `pytest-asyncio`: Async test support
- `pytest-cov`: Test coverage
- `pytest-benchmark`: Performance testing

## Installation

### Using pip
```bash
# Clone the repository
git clone https://github.com/marketagents-ai/MarketInference.git
cd MarketInference

# Install dependencies
pip install -r requirements.txt

# Optional: Install in development mode
pip install -e .
```

### Using Conda
```bash
# Clone the repository
git clone https://github.com/marketagents-ai/MarketInference.git
cd MarketInference

# Create and activate conda environment
conda create -n market-inference python=3.10
conda activate market-inference

# Install dependencies
pip install -r requirements.txt

# Optional: Install in development mode
pip install -e .
```

## Environment Setup

1. Copy the example environment file:
```bash
cp .env.example .env
```

2. Configure the following environment variables in `.env`:
```bash
# OpenAI Configuration
OPENAI_API_KEY=your_openai_key
OPENAI_CONTEXT_LENGTH=8192

# Anthropic Configuration
ANTHROPIC_API_KEY=your_anthropic_key
ANTHROPIC_CONTEXT_LENGTH=100000

# Azure OpenAI (Optional)
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_ENDPOINT=your_azure_endpoint
AZURE_OPENAI_CONTEXT_LENGTH=8192

# Rate Limiting
MAX_REQUESTS_PER_MINUTE=50
MAX_TOKENS_PER_MINUTE=100000
```

## Quick Start

1. Basic usage example:
```python
from minference.lite.inference import InferenceOrchestrator
from minference.lite.models import ChatThread, LLMClient

# Initialize the orchestrator
orchestrator = InferenceOrchestrator(oai_request_limits=oai_request_limits)

# Create a chat thread
chat_thread = ChatThread(
llm_config=LLMConfig(
client=LLMClient.openai,
model="gpt-4",
response_format=ResponseFormat.tool,
max_tokens=1000,
temperature=0
),
tools=[weather_tool]
)


# Run inference
result = await orchestrator.run_inference(chat_thread)
print(result.content)
```

2. Using tools:
```python
from minference.caregistry import CallableRegistry

CallableRegistry()

def calculate_sum(a: int, b: int) -> int:
"""Calculate the sum of two numbers."""
return a + b
calcuate_addition = CallableTool.from_callable(calculate_sum)

# Use in chat with tool calling
chat_thread = ChatThread(
new_message=[{"role": "user", "content": "What is 5 + 3?"}],
llm_client=LLMClient(
provider="anthropic",
model="claude-3-opus-20240229",
tools=[calcuate_addition]
)
)
```

## Example Files

The `examples/` directory contains several example implementations:

1. `lite_inference_example.py`: Basic inference usage
```python
# Example of basic inference with a single model
from minference.lite.inference import InferenceOrchestrator
# ... (see examples/lite_inference_example.py)
```

2. `lite_sequential_tool_inference.py`: Sequential tool execution
```python
# Example of sequential tool execution
from minference.lite.models import ChatThread
# ... (see examples/lite_sequential_tool_inference.py)
```

3. `lite_tools_inference_example.py`: Tool integration examples
```python
# Example of tool integration
from minference.caregistry import CallableRegistry
# ... (see examples/lite_tools_inference_example.py)
```

4. `tools_setup_example.py`: Complex tool setup
```python
# Example of advanced tool configuration
from minference.lite.models import StructuredTool
# ... (see examples/tools_setup_example.py)
```

5. `pathfinder.ipynb`: Interactive notebook with examples

## Advanced Usage

### Parallel Processing

```python
# Process multiple chat threads in parallel
chat_threads = [
ChatThread(...),
ChatThread(...),
ChatThread(...)
]

results = await orchestrator.run_parallel_inference(chat_threads)
```

### Entity Management

```python
from minference.lite.models import Entity
from uuid import UUID

class CustomEntity(Entity):
name: str
data: Dict[str, Any]

def process(self) -> None:
# Process entity data
pass
```

## Configuration

The library can be configured through environment variables or programmatically:

```python
from minference.lite.models import RequestLimits

# Configure rate limits
limits = RequestLimits(
max_requests_per_minute=50,
max_tokens_per_minute=100000,
provider="openai"
)
```


## Project Structure

```
MarketInference/
├── minference/
│ ├── lite/ # Lightweight implementation
│ │ ├── inference.py # Core inference logic
│ │ ├── models.py # Data models
│ │ └── requests.py # Request handling
│ ├── core/ # Core functionality
│ │ └── clients_models.py # Client implementations
│ ├── base_registry.py # Base registry implementation
│ ├── caregistry.py # Callable registry
│ ├── enregistry.py # Entity registry
│ └── utils.py # Utility functions
├── tests/ # Test suite
│ ├── test_fixtures/ # Test fixtures
│ ├── test_registration.py
│ ├── test_schemas.py
│ └── test_execution.py & more
├── examples/ # Usage examples
└── requirements.txt # Dependencies
```

106 changes: 106 additions & 0 deletions examples/parallel_inference_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
"""
Example demonstrating parallel inference with multiple tools.
"""
from typing import Dict, Any
import asyncio
import random
import logging
from datetime import datetime

from minference.lite.inference import InferenceOrchestrator
from minference.lite.models import LLMConfig, LLMClient, ResponseFormat, ChatThread, MessageRole, CallableTool
from minference.caregistry import CallableRegistry
from minference.enregistry import EntityRegistry

# Set up logging
logging.basicConfig(level=logging.INFO)
EntityRegistry._logger = logging.getLogger(__name__)
CallableRegistry._logger = logging.getLogger(__name__)

# Initialize registries
EntityRegistry()
CallableRegistry()

def get_current_weather(location: str, unit: str = "fahrenheit") -> Dict[str, Any]:
"""Get the current weather in a given location."""
# Simulate API call

temperature = random.randint(0, 100)
return {
"location": location,
"temperature": temperature,
"unit": unit,
"timestamp": datetime.now().isoformat()
}

def get_stock_price(symbol: str) -> Dict[str, Any]:
"""Get the current stock price for a given symbol."""
# Simulate API call
price = random.uniform(10, 1000)
return {
"symbol": symbol,
"price": round(price, 2),
"currency": "USD",
"timestamp": datetime.now().isoformat()
}

# Register tools
CallableRegistry.register("get_current_weather", get_current_weather)
CallableRegistry.register("get_stock_price", get_stock_price)

# Create CallableTools
weather_tool = CallableTool.from_callable(get_current_weather)
stock_tool = CallableTool.from_callable(get_stock_price)

# Create orchestrator
orchestrator = InferenceOrchestrator()

# Run parallel inference
async def main():
chat_threads = [
ChatThread(
llm_config=LLMConfig(
client=LLMClient.openai,
model="gpt-4o-mini",
response_format=ResponseFormat.auto_tools, # Changed back to tool for OpenAI
max_tokens=1000,
temperature=0
),
tools=[weather_tool]
),
ChatThread(
llm_config=LLMConfig(
client=LLMClient.openai,
model="gpt-4o-mini",
response_format=ResponseFormat.auto_tools, # Changed back to tool for OpenAI
max_tokens=1000,
temperature=0
),
tools=[stock_tool]
)
]

# Add messages to threads with clear instructions
weather_message = """Please use the get_current_weather tool to check the weather in New York and Tokyo.
I need actual temperature data for both cities. Make sure to call the tool twice, once for each city.
Please provide your response in a clear, structured format."""
chat_threads[0].new_message = weather_message
chat_threads[0].add_user_message()

stock_message = """Please use the get_stock_price tool to check the current stock prices for AAPL and GOOGL.
I need the actual current prices for both stocks. Make sure to call the tool twice, once for each stock symbol.
Please provide your response in a clear, structured format."""
chat_threads[1].new_message = stock_message
chat_threads[1].add_user_message()

results = await orchestrator.run_parallel_ai_completion(chat_threads)
for i, result in enumerate(results):
print(f"\nQuery: {chat_threads[i].history[-1].content}")
print(f"Response: {result.content}")
if result.json_object:
print(f"Tool Call: {result.json_object.name}")
print(f"Arguments: {result.json_object.object}")
print("---")

if __name__ == "__main__":
asyncio.run(main())
Loading