Skip to content

feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations#40

Open
acere wants to merge 8 commits intoawslabs:mainfrom
acere:fix/image-payload-serialization
Open

feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations#40
acere wants to merge 8 commits intoawslabs:mainfrom
acere:fix/image-payload-serialization

Conversation

@acere
Copy link
Copy Markdown
Collaborator

@acere acere commented Mar 24, 2026

Summary

Addresses #20 by implementing binary-safe serialization for payloads and results containing images, and adds comprehensive multi-modal content support across endpoints. Also supersedes the approach in #21 with a more complete solution.

Related: #7 (serialization standardization)

Changes

Binary-safe image serialization (closes #20)

Payloads and results containing images were being double base64-encoded during serialization, causing significant memory overhead. Serialization now handles binary content natively via LLMeterBytesEncoder and InvocationResponseEncoder, avoiding the redundant encoding.

Multi-modal payload support

create_payload() on BedrockConverse, OpenAI, and SageMaker endpoints now accepts optional images, documents, videos, and audio keyword arguments (as file paths or raw bytes). Format detection uses puremagic (optional multimodal extra) with fallback to file extensions. Each endpoint maps formats to its expected convention automatically (Bedrock short names, OpenAI MIME types).

Text-only payloads are fully backward compatible — no changes needed for existing callers.

UPath standardization

All open() calls replaced with UPath.open() for consistent cross-platform and cloud storage support. Cost model save_to_file() now auto-creates parent directories.

JSON-safe Result stats

Result.to_dict() now converts datetime fields (start_time, end_time) via utc_datetime_serializer, so json.dumps(result.stats) works without a custom serializer.

Testing

  • 575 unit tests passing (including property-based tests via Hypothesis)
  • New test suites for multi-modal serialization, properties, and per-endpoint behavior
  • Integration tests updated with multi-modal payload examples

Closes #20

@acere acere requested a review from athewsey March 25, 2026 01:52
@acere acere force-pushed the fix/image-payload-serialization branch from 365abe5 to 63be212 Compare March 25, 2026 13:44
acere added 4 commits March 30, 2026 09:58
Addresses awslabs#20 by implementing binary-safe serialization for payloads
and results containing images. This prevents double base64 encoding
and significantly reduces memory usage and serialization time.

Changes:
- Add binary serialization support to prompt_utils and results
- Update endpoints to use binary-safe serialization
- Add property-based tests for serialization correctness
- Update integration tests to verify image handling
…s endpoints

- Add multi-modal content handling (images, videos, audio, documents) to BedrockConverse, OpenAI, and SageMaker endpoints
- Implement automatic format detection using puremagic with fallback to file extensions
- Add multimodal utility functions for format conversion and content serialization
- Support both file paths and raw bytes for multi-modal content
- Add endpoint-specific format string handling (Bedrock short format, OpenAI MIME types)
- Implement comprehensive unit tests for multi-modal serialization and properties across all endpoints
- Add detailed README documentation with usage examples and security warnings for format detection
- Fix Path serialization in JSONableBase to handle os.PathLike objects
- Update pyproject.toml with optional multimodal extra for puremagic dependency
- Improve integration tests with multi-modal payload examples
- Enhance prompt utilities with multi-modal content handling
… compatibility

- Replace built-in open() calls with UPath.open() across all file operations
- Add UPath import to cost model for consistent path handling
- Update cost model save_to_file() to create parent directories automatically
- Standardize file reading in prompt_utils, results, runner, and tokenizers modules
- Improves cross-platform file system support and enables cloud storage integration
Result.stats included raw datetime objects (start_time, end_time)
from to_dict(), causing TypeError when users called json.dumps()
without a custom serializer. Now to_dict() converts datetime fields
via utc_datetime_serializer so stats is always directly serializable.
@acere acere force-pushed the fix/image-payload-serialization branch from 63be212 to e0ecacb Compare March 30, 2026 17:34
@acere acere changed the title Fix image payload serialization and add multi-modal support feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations Mar 30, 2026
acere added 4 commits March 30, 2026 14:13
…path helper

Replace scattered `os.PathLike | str` type annotations with the proper
UPath type aliases from `upath.types`:
- `ReadablePathLike` for parameters used to load/read data
- `WritablePathLike` for parameters used to save/write data

Add `ensure_path()` utility to `llmeter.utils` to centralize the
`Path(x)` normalization boilerplate that was duplicated at the top of
nearly every function accepting a path argument. The helper handles
None passthrough and uses a lazy UPath import.

Runtime `isinstance(obj, os.PathLike)` checks in serialization helpers
are left unchanged since TypeAliases cannot be used for runtime checks.
…nstances

Cloud-backed UPath instances (e.g. S3Path) do not implement os.PathLike,
so isinstance checks against os.PathLike alone would miss them. This caused:

- Serialization: cloud UPaths skipped the path branch in JSON serializers
- runner.py: cloud UPath payloads not recognized as path references,
  leading to unnecessary re-saving or failure to load from path

Fix by checking isinstance(obj, (os.PathLike, Path)) where Path is UPath,
which catches both plain pathlib.Path (via os.PathLike) and cloud UPaths
(via UPath). Serialization keeps .as_posix() for cross-platform safety.
Rationalize scattered serialization utilities into a single unified module:

- Create llmeter/json_utils.py with LLMeterEncoder (handles bytes, datetime,
  date, time, PathLike, to_dict() objects, str() fallback) and
  llmeter_bytes_decoder (restores __llmeter_bytes__ markers to bytes).

- Remove redundant encoders: LLMeterBytesEncoder (prompt_utils),
  InvocationResponseEncoder (results), utc_datetime_serializer (results),
  and inline _default_serializer lambdas (runner, endpoints/base).

- Slim down callbacks/cost/serde.py to cost-specific helpers (JSONableBase,
  ISerializable, from_dict_with_class, from_dict_with_class_map,
  to_dict_recursive_generic). Update to Python 3.10 typing (dict/type
  builtins instead of Dict/Type).

- Standardize all to_json() methods to default to cls=LLMeterEncoder via
  kwargs.setdefault(), ensuring consistent encoding across InvocationResponse,
  Result, and JSONableBase.

- Remove serializer/deserializer/cls customization params from save_payloads,
  load_payloads, _load_data_file — hardcode LLMeterEncoder and
  llmeter_bytes_decoder since custom encoders produce files that can't be
  loaded back without metadata.

- LLMeterEncoder.default() delegates to to_dict() for objects that implement
  it, enabling json.dump(self, f, cls=LLMeterEncoder) without manual
  to_dict() calls (used in Endpoint.save).

- Convert all changed files to relative imports, run ruff check + format +
  import sorting.

- Clean up llmeter/utils.py: move upath imports to top level (it's a hard
  dependency), remove unnecessary from __future__ import annotations.

- Add docs/reference/json_utils.md and mkdocs.yml nav entry.

- Add property-based tests for datetime, date, time, PathLike, and to_dict()
  encoding (TestDatetimeSerializationProperties, TestPathSerializationProperties,
  TestToDictSerializationProperties).

- Update existing tests to use LLMeterEncoder/llmeter_bytes_decoder from
  llmeter.json_utils instead of old aliases.

All 581 unit tests pass.
Move path and datetime string conversion out of to_dict() and
to_dict_recursive_generic() — these are Python dict builders, not
serializers. Type coercion to strings is now exclusively handled by
LLMeterEncoder at JSON serialization time.

- Endpoint.to_dict(): remove PathLike → as_posix() coercion, simplify
  to a dict comprehension. Remove unused os import.
- to_dict_recursive_generic(): remove PathLike → as_posix() and
  datetime/date/time → isoformat() coercions. Keep structural recursion
  (nested dicts, lists, to_dict() delegation). Remove unused os, datetime
  imports.
@@ -201,7 +203,9 @@ async def after_run(self, result: Result) -> None:

def save_to_file(self, path: str) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this str typing of path is wrong as something UPath-like should also be accepted right? Seems like there are multiple inconsistent path typings in our exposed APIs at the moment.

Comment on lines +63 to +64
if isinstance(v, os.PathLike):
result[k] = Path(v).as_posix()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check only works when v is a local path - fails with e.g. UPath("s3://...")

>>> original_bytes == restored_bytes
True
"""
from llmeter.results import InvocationResponseEncoder
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import not at top level of file, and it's local import so not sure there's a good reason?

There are a couple of other instances of this in other files too

Comment on lines 222 to 231
def create_payload(
user_message: str | list[str], max_tokens: int = 256, **kwargs: Any
user_message: str | list[str] | None = None,
max_tokens: int | None = None,
*,
images: list[bytes] | list[str] | None = None,
documents: list[bytes] | list[str] | None = None,
videos: list[bytes] | list[str] | None = None,
audio: list[bytes] | list[str] | None = None,
**kwargs: Any,
) -> dict:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach doesn't give users control over the ordering of the different types of block.

Might it be more flexible and still about-as-usable to instead export utility classes for the different types of content, and have this function take an ordered list of different types of content?

class ImageContent:
    @classmethod
    def from_path(cls, file_path):
        ...

class BedrockBase(Endpoint):
    @staticmethod
    def create_payload(
        user_message: str | list[str | AudioContent | DocumentContent | ImageContent | VideoContent]
        ...
    ):
        ...

BedrockBase.create_payload(
    [
        "What's the title of the graph?,
        ImageContent.from_path("my-cool-graph.png")
    ],
    ...
)

Same feedback applies to other connectors e.g. openai

import importlib
from dataclasses import dataclass, field

from llmeter.utils import ensure_path
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily a problem to solve for the entire codebase here, but this is mixing absolute & local relative imports (see local dependencies section below).

As discussed I think we prefer to standardize on relative, so let's at least ensure we're not introducing new absolute imports in this PR.


try:
req_body = json.dumps(payload).encode("utf-8")
req_body = json.dumps(payload, cls=LLMeterEncoder).encode("utf-8")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't work correctly because Bedrock endpoints don't want to receive __llmeter_bytes__ wrappers?

from upath import UPath as Path


class LLMeterEncoder(json.JSONEncoder):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we have a good reason not to, I'd suggest to just expose a (json.dump default-compatible) function interface, like we did before?

  1. My guess from StackOverflow/etc is that it's more common for developers to override default than the whole cls in json.dump - so people would be more familiar with that interface.
  2. We're not currently using any of the added flexibility of providing a JSONEncoder class versus just a default function.
    1. It seems unlikely that we would want to get in to the complexities of re-implementing JSON {iter}encode from scratch.
    2. If our serialization methods were receiving an encoder object, rather than a class, then at least there'd be the potential benefit of users configuring params like indent only once and re-using the encoder - but if we pass it in as a class then that statefulness is lost and we still have duplication of indent/etc parameters in lots of places around the codebase.
  3. We're already providing a function-based interface for decoding with llmeter_bytes_decoder below (instead of e.g. providing a JSONDecoder), so it seems inconsistent

if tokenizer_path is None:
return DummyTokenizer()
with open(tokenizer_path, "r") as f:
tokenizer_path = UPath(tokenizer_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ensure_path right?

@@ -123,7 +124,7 @@ def save_tokenizer(tokenizer: Any, output_path: UPath | str) -> UPath:

output_path = UPath(output_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be ensure_path?



def ensure_path(
path: ReadablePathLike | WritablePathLike | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullable yes, but I'm not sure there's any need for this argument to be optional? You'd never want to call ensure_path().

Unless there's some reason it doesn't work, I'd suggest the cleanest typing would be:

@overload
def ensure_path(path: ReadablePathLike | WritablePathLike) -> UPath: ...


@overload
def ensure_path(path: None) -> None: ...


def ensure_path(path):
    {implementation}

...Which I believe should propagate the fact that output is None if and only if input is None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't save Bedrock test payloads/results with images

2 participants