Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,159 @@ Additional functionality like cost modelling and MLFlow experiment tracking is e

For more details, check out our selection of end-to-end code examples in the [examples](https://github.com/awslabs/llmeter/tree/main/examples) folder!

## 🖼️ Multi-Modal Payload Support

LLMeter supports creating payloads with multi-modal content including images, videos, audio, and documents alongside text. This enables testing of modern multi-modal AI models.

### Installation for Multi-Modal Support

For enhanced format detection from file content (recommended), install the optional `multimodal` extra:

```terminal
pip install 'llmeter[multimodal]'
```

Or with uv:

```terminal
uv pip install 'llmeter[multimodal]'
```

This installs the `puremagic` library for content-based format detection using magic bytes. Without it, format detection falls back to file extensions.

### Basic Multi-Modal Usage

```python
from llmeter.endpoints import BedrockConverse

# Single image from file
payload = BedrockConverse.create_payload(
user_message="What is in this image?",
images=["photo.jpg"],
max_tokens=256
)

# Multiple images
payload = BedrockConverse.create_payload(
user_message="Compare these images:",
images=["image1.jpg", "image2.png"],
max_tokens=512
)

# Image from bytes (requires puremagic for format detection)
with open("photo.jpg", "rb") as f:
image_bytes = f.read()

payload = BedrockConverse.create_payload(
user_message="What is in this image?",
images=[image_bytes],
max_tokens=256
)

# Mixed content types
payload = BedrockConverse.create_payload(
user_message="Analyze this presentation and supporting materials",
documents=["slides.pdf"],
images=["chart.png"],
max_tokens=1024
)

# Video analysis
payload = BedrockConverse.create_payload(
user_message="Describe what happens in this video",
videos=["clip.mp4"],
max_tokens=1024
)
```

### Supported Content Types

- **Images**: JPEG, PNG, GIF, WebP
- **Documents**: PDF
- **Videos**: MP4, MOV, AVI
- **Audio**: MP3, WAV, OGG

Format support varies by model. The library detects formats automatically and lets the API endpoint validate compatibility.

### Endpoint-Specific Format Handling

Different endpoints expect different format strings:

- **Bedrock**: Uses short format strings (e.g., `"jpeg"`, `"png"`, `"pdf"`)
- **OpenAI**: Uses full MIME types (e.g., `"image/jpeg"`, `"image/png"`)
- **SageMaker**: Uses Bedrock format by default (model-dependent)

The library handles these differences automatically based on the endpoint you're using.

### ⚠️ Security Warning: Format Detection Is NOT Input Validation

**IMPORTANT**: The format detection in this library is for testing and development convenience ONLY. It is NOT a security mechanism and MUST NOT be used with untrusted files without proper validation.

#### What This Library Does

- Detects likely file format from magic bytes (puremagic) or extension (mimetypes)
- Reads binary content from files
- Packages content for API endpoints
- Provides type checking (bytes vs strings)

#### What This Library Does NOT Do

- ❌ Validate file content safety or integrity
- ❌ Scan for malicious content or malware
- ❌ Sanitize or clean file data
- ❌ Protect against malformed or exploited files
- ❌ Guarantee format correctness beyond detection heuristics
- ❌ Validate file size or prevent memory exhaustion
- ❌ Check for embedded scripts or exploits
- ❌ Verify file authenticity or source

#### Intended Use Cases

This format detection is designed for:

- **Testing and development**: Loading known-safe test files during development
- **Internal tools**: Processing files from trusted internal sources
- **Prototyping**: Quick experimentation with multi-modal models
- **Controlled environments**: Scenarios where file sources are fully trusted

#### NOT Intended For

This format detection should NOT be used for:

- **Production user uploads**: Files uploaded by end users through web forms or APIs
- **External file sources**: Files from untrusted URLs, email attachments, or third-party systems
- **Security-sensitive applications**: Any application where file safety is critical
- **Public-facing services**: Services that accept files from the internet

#### Recommended Security Practices for Untrusted Files

When working with untrusted files (user uploads, external sources, etc.), you MUST implement proper security measures:

1. **Validate file sources**: Only accept files from trusted, authenticated sources
2. **Scan for malware**: Use antivirus/malware scanning (e.g., ClamAV) before processing
3. **Validate file integrity**: Verify checksums, digital signatures, or other integrity mechanisms
4. **Sanitize content**: Use specialized libraries to validate and sanitize file content:
- Images: Re-encode with PIL/Pillow to strip metadata and validate structure
- PDFs: Use PDF sanitization libraries to remove scripts and validate structure
- Videos: Re-encode with ffmpeg to validate and sanitize
5. **Limit file sizes**: Enforce maximum file size limits before reading into memory
6. **Sandbox processing**: Process untrusted files in isolated environments (containers, VMs)
7. **Validate API responses**: Check that API endpoints successfully processed the content
8. **Implement rate limiting**: Prevent abuse through excessive file uploads
9. **Log and monitor**: Track file processing for security auditing

### Backward Compatibility

Text-only payloads continue to work exactly as before:

```python
# Still works - no changes needed
payload = BedrockConverse.create_payload(
user_message="Hello, world!",
max_tokens=256
)
```

## Analyze and compare results

You can analyze the results of a single run or a load test by generating interactive charts. You can find examples in in the [examples](examples) folder.
Expand Down
1 change: 1 addition & 0 deletions docs/reference/json_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
:::: llmeter.json_utils
182 changes: 151 additions & 31 deletions llmeter/callbacks/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,21 @@

from __future__ import annotations

import os
import importlib
import json
import logging
from abc import ABC
from typing import final
from typing import Any, final

from upath.types import ReadablePathLike, WritablePathLike

from ..endpoints.base import InvocationResponse
from ..json_utils import LLMeterEncoder
from ..results import Result
from ..runner import _RunConfig
from ..utils import ensure_path

logger = logging.getLogger(__name__)


class Callback(ABC):
Expand All @@ -21,8 +29,17 @@ class Callback(ABC):
associated with test runs or individual model invocations.

A Callback object may implement multiple of the defined lifecycle hooks (such as
`before_invoke`, `after_run`, etc). Callbacks must support serializing their configuration to
a file (by implementing `save_to_file`), and loading back (via `load_from_file`).
`before_invoke`, `after_run`, etc). Callbacks support serializing their configuration via
``to_dict()`` / ``from_dict()`` (and the convenience wrappers ``save_to_file()`` /
``load_from_file()``).

Serialization uses a ``_callback_type`` marker (``"module:ClassName"``) so that
``Callback.from_dict()`` can dynamically import and reconstruct the correct subclass
without a hardcoded registry. This means third-party callbacks round-trip through JSON
automatically, as long as the defining module is importable at load time.

Subclasses with complex nested state (like ``CostModel``) can override ``to_dict()`` and
``from_dict()`` while preserving the type marker by calling ``super()``.
"""

async def before_invoke(self, payload: dict) -> None:
Expand Down Expand Up @@ -70,46 +87,149 @@ async def after_run(self, result: Result) -> None:
"""
pass

def save_to_file(self, path: os.PathLike | str) -> None:
"""Save this Callback to file
# -- Serialization -----------------------------------------------------------------

Individual Callbacks implement this method to save their configuration to a file that will
be re-loadable with the equivalent `_load_from_file()` method.
def to_dict(self) -> dict:
"""Serialize this callback's configuration to a JSON-safe dict.

Args:
path: (Local or Cloud) path where the callback is saved
The returned dict includes a ``_callback_type`` key with the fully-qualified
class path (``"module:ClassName"``), enabling ``Callback.from_dict`` to
reconstruct the correct subclass without a hardcoded registry.

By default, all public (non-underscore-prefixed) instance attributes are
included. Subclasses with richer state should override this method and call
``super().to_dict()`` to preserve the type marker.

Returns:
dict: A JSON-serializable dictionary representation of this callback.

Example::

>>> from llmeter.callbacks import CostModel
>>> from llmeter.callbacks.cost.dimensions import InputTokens
>>> model = CostModel(request_dims=[InputTokens(price_per_million=3.0)])
>>> d = model.to_dict()
>>> d["_callback_type"]
'llmeter.callbacks.cost.model:CostModel'
"""
raise NotImplementedError("TODO: Callback.save_to_file is not yet implemented!")
cls = self.__class__
data: dict[str, Any] = {
"_callback_type": f"{cls.__module__}:{cls.__qualname__}",
}
data.update({k: v for k, v in vars(self).items() if not k.startswith("_")})
return data

@staticmethod
@final
def load_from_file(path: os.PathLike | str) -> Callback:
"""Load (any type of) Callback from file
@classmethod
def from_dict(cls, raw: dict, **kwargs: Any) -> Callback:
"""Reconstruct a Callback from a dict produced by ``to_dict()``.

`Callback.load_from_file()` attempts to detect the type of Callback saved in a given file,
and use the relevant implementation's `_load_from_file` method to load it.
Uses the ``_callback_type`` field to dynamically import and instantiate
the correct subclass. If called on a concrete subclass (e.g.
``CostModel.from_dict(...)``), the ``_callback_type`` is still respected
so that the dict always controls which class is created.

Args:
path: (Local or Cloud) path where the callback is saved
raw: A dictionary previously produced by ``to_dict()`` (or loaded from
JSON). Must contain a ``_callback_type`` key.
**kwargs: Extra keyword arguments forwarded to the resolved class
constructor (or its own ``from_dict`` if it overrides this method).

Returns:
callback: A loaded Callback - for example an `MlflowCallback`.
Callback: An instance of the appropriate Callback subclass.

Raises:
ValueError: If ``_callback_type`` is missing from *raw*.
ImportError: If the module referenced by ``_callback_type`` cannot be
imported.
AttributeError: If the class name cannot be found in the referenced
module.

Example::

>>> from llmeter.callbacks.base import Callback
>>> d = {
... "_callback_type": "llmeter.callbacks.mlflow:MlflowCallback",
... "step": 1,
... "nested": False,
... }
>>> cb = Callback.from_dict(d) # returns an MlflowCallback instance
"""
raise NotImplementedError(
"TODO: Callback.load_from_file is not yet implemented!"
)
raw = dict(raw) # shallow copy — don't mutate caller's dict
callback_type = raw.pop("_callback_type", None)
if callback_type is None:
raise ValueError(
"Cannot deserialize Callback: dict is missing '_callback_type' key. "
f"Got keys: {list(raw.keys())}"
)

module_path, class_name = callback_type.rsplit(":", 1)
module = importlib.import_module(module_path)
callback_cls = getattr(module, class_name)

# If the resolved class has its own from_dict (e.g. CostModel), delegate to it
# so that subclass-specific deserialization logic is honoured.
if callback_cls is not cls and "from_dict" in callback_cls.__dict__:
# Re-inject _callback_type so the subclass from_dict can pop it if needed
return callback_cls.from_dict(raw, **kwargs)

# Remove any keys the constructor doesn't expect (e.g. _type from JSONableBase)
raw.pop("_type", None)
return callback_cls(**raw, **kwargs)

def to_json(self, **kwargs: Any) -> str:
"""Serialize this callback to a JSON string.

Args:
**kwargs: Extra keyword arguments forwarded to ``json.dumps``
(e.g. ``indent``).

Returns:
str: JSON representation of this callback.
"""
kwargs.setdefault("cls", LLMeterEncoder)
return json.dumps(self.to_dict(), **kwargs)

@classmethod
def _load_from_file(cls, path: os.PathLike | str) -> Callback:
"""Load this Callback from file
def from_json(cls, json_string: str, **kwargs: Any) -> Callback:
"""Reconstruct a Callback from a JSON string produced by ``to_json()``.

Args:
json_string: A valid JSON string.
**kwargs: Extra keyword arguments forwarded to ``from_dict``.

Returns:
Callback: An instance of the appropriate Callback subclass.
"""
return cls.from_dict(json.loads(json_string), **kwargs)

def save_to_file(self, path: WritablePathLike) -> None:
"""Save this Callback's configuration to a JSON file.

Individual Callbacks implement this method to define how they can be loaded from files
created by the equivalent `save_to_file()` method.
The file can be loaded back with ``Callback.load_from_file(path)``.

Args:
path: (Local or Cloud) path where the callback is saved
path: (Local or Cloud) path where the callback should be saved.
"""
path = ensure_path(path)
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("w") as f:
f.write(self.to_json(indent=4))

@staticmethod
@final
def load_from_file(path: ReadablePathLike) -> Callback:
"""Load (any type of) Callback from a JSON file.

The ``_callback_type`` field inside the file determines which subclass is
instantiated, so callers don't need to know the concrete type in advance.

Args:
path: (Local or Cloud) path to a JSON file previously created by
``save_to_file()``.

Returns:
callback: The loaded Callback object
Callback: The deserialized callback instance.
"""
raise NotImplementedError(
"TODO: Callback._load_from_file is not yet implemented!"
)
path = ensure_path(path)
with path.open("r") as f:
return Callback.from_dict(json.load(f))
Loading