feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations by acere · Pull Request #40 · awslabs/llmeter

acere · 2026-03-24T13:13:09Z

Summary

Addresses #20 by implementing binary-safe serialization for payloads and results containing images, and adds comprehensive multi-modal content support across endpoints. Also supersedes the approach in #21 with a more complete solution.

Related: #7 (serialization standardization)

Changes

Binary-safe image serialization (closes #20)

Payloads and results containing images were being double base64-encoded during serialization, causing significant memory overhead. Serialization now handles binary content natively via LLMeterBytesEncoder and InvocationResponseEncoder, avoiding the redundant encoding.

Multi-modal payload support

create_payload() on BedrockConverse, OpenAI, and SageMaker endpoints now accepts optional images, documents, videos, and audio keyword arguments (as file paths or raw bytes). Format detection uses puremagic (optional multimodal extra) with fallback to file extensions. Each endpoint maps formats to its expected convention automatically (Bedrock short names, OpenAI MIME types).

Text-only payloads are fully backward compatible — no changes needed for existing callers.

UPath standardization

All open() calls replaced with UPath.open() for consistent cross-platform and cloud storage support. Cost model save_to_file() now auto-creates parent directories.

JSON-safe Result stats

Result.to_dict() now converts datetime fields (start_time, end_time) via utc_datetime_serializer, so json.dumps(result.stats) works without a custom serializer.

Testing

575 unit tests passing (including property-based tests via Hypothesis)
New test suites for multi-modal serialization, properties, and per-endpoint behavior
Integration tests updated with multi-modal payload examples

Closes #20

Addresses awslabs#20 by implementing binary-safe serialization for payloads and results containing images. This prevents double base64 encoding and significantly reduces memory usage and serialization time. Changes: - Add binary serialization support to prompt_utils and results - Update endpoints to use binary-safe serialization - Add property-based tests for serialization correctness - Update integration tests to verify image handling

…s endpoints - Add multi-modal content handling (images, videos, audio, documents) to BedrockConverse, OpenAI, and SageMaker endpoints - Implement automatic format detection using puremagic with fallback to file extensions - Add multimodal utility functions for format conversion and content serialization - Support both file paths and raw bytes for multi-modal content - Add endpoint-specific format string handling (Bedrock short format, OpenAI MIME types) - Implement comprehensive unit tests for multi-modal serialization and properties across all endpoints - Add detailed README documentation with usage examples and security warnings for format detection - Fix Path serialization in JSONableBase to handle os.PathLike objects - Update pyproject.toml with optional multimodal extra for puremagic dependency - Improve integration tests with multi-modal payload examples - Enhance prompt utilities with multi-modal content handling

… compatibility - Replace built-in open() calls with UPath.open() across all file operations - Add UPath import to cost model for consistent path handling - Update cost model save_to_file() to create parent directories automatically - Standardize file reading in prompt_utils, results, runner, and tokenizers modules - Improves cross-platform file system support and enables cloud storage integration

Result.stats included raw datetime objects (start_time, end_time) from to_dict(), causing TypeError when users called json.dumps() without a custom serializer. Now to_dict() converts datetime fields via utc_datetime_serializer so stats is always directly serializable.

…path helper Replace scattered `os.PathLike | str` type annotations with the proper UPath type aliases from `upath.types`: - `ReadablePathLike` for parameters used to load/read data - `WritablePathLike` for parameters used to save/write data Add `ensure_path()` utility to `llmeter.utils` to centralize the `Path(x)` normalization boilerplate that was duplicated at the top of nearly every function accepting a path argument. The helper handles None passthrough and uses a lazy UPath import. Runtime `isinstance(obj, os.PathLike)` checks in serialization helpers are left unchanged since TypeAliases cannot be used for runtime checks.

…nstances Cloud-backed UPath instances (e.g. S3Path) do not implement os.PathLike, so isinstance checks against os.PathLike alone would miss them. This caused: - Serialization: cloud UPaths skipped the path branch in JSON serializers - runner.py: cloud UPath payloads not recognized as path references, leading to unnecessary re-saving or failure to load from path Fix by checking isinstance(obj, (os.PathLike, Path)) where Path is UPath, which catches both plain pathlib.Path (via os.PathLike) and cloud UPaths (via UPath). Serialization keeps .as_posix() for cross-platform safety.

Rationalize scattered serialization utilities into a single unified module: - Create llmeter/json_utils.py with LLMeterEncoder (handles bytes, datetime, date, time, PathLike, to_dict() objects, str() fallback) and llmeter_bytes_decoder (restores __llmeter_bytes__ markers to bytes). - Remove redundant encoders: LLMeterBytesEncoder (prompt_utils), InvocationResponseEncoder (results), utc_datetime_serializer (results), and inline _default_serializer lambdas (runner, endpoints/base). - Slim down callbacks/cost/serde.py to cost-specific helpers (JSONableBase, ISerializable, from_dict_with_class, from_dict_with_class_map, to_dict_recursive_generic). Update to Python 3.10 typing (dict/type builtins instead of Dict/Type). - Standardize all to_json() methods to default to cls=LLMeterEncoder via kwargs.setdefault(), ensuring consistent encoding across InvocationResponse, Result, and JSONableBase. - Remove serializer/deserializer/cls customization params from save_payloads, load_payloads, _load_data_file — hardcode LLMeterEncoder and llmeter_bytes_decoder since custom encoders produce files that can't be loaded back without metadata. - LLMeterEncoder.default() delegates to to_dict() for objects that implement it, enabling json.dump(self, f, cls=LLMeterEncoder) without manual to_dict() calls (used in Endpoint.save). - Convert all changed files to relative imports, run ruff check + format + import sorting. - Clean up llmeter/utils.py: move upath imports to top level (it's a hard dependency), remove unnecessary from __future__ import annotations. - Add docs/reference/json_utils.md and mkdocs.yml nav entry. - Add property-based tests for datetime, date, time, PathLike, and to_dict() encoding (TestDatetimeSerializationProperties, TestPathSerializationProperties, TestToDictSerializationProperties). - Update existing tests to use LLMeterEncoder/llmeter_bytes_decoder from llmeter.json_utils instead of old aliases. All 581 unit tests pass.

Move path and datetime string conversion out of to_dict() and to_dict_recursive_generic() — these are Python dict builders, not serializers. Type coercion to strings is now exclusively handled by LLMeterEncoder at JSON serialization time. - Endpoint.to_dict(): remove PathLike → as_posix() coercion, simplify to a dict comprehension. Remove unused os import. - to_dict_recursive_generic(): remove PathLike → as_posix() and datetime/date/time → isoformat() coercions. Keep structural recursion (nested dicts, lists, to_dict() delegation). Remove unused os, datetime imports.

athewsey · 2026-03-30T20:55:08Z

llmeter/callbacks/cost/model.py

@@ -201,7 +203,9 @@ async def after_run(self, result: Result) -> None:

    def save_to_file(self, path: str) -> None:


I think this str typing of path is wrong as something UPath-like should also be accepted right? Seems like there are multiple inconsistent path typings in our exposed APIs at the moment.

athewsey · 2026-03-30T20:57:43Z

llmeter/callbacks/cost/serde.py

+        if isinstance(v, os.PathLike):
+            result[k] = Path(v).as_posix()


This check only works when v is a local path - fails with e.g. UPath("s3://...")

athewsey · 2026-03-30T21:03:44Z

llmeter/endpoints/base.py

+            >>> original_bytes == restored_bytes
+            True
+        """
+        from llmeter.results import InvocationResponseEncoder


Import not at top level of file, and it's local import so not sure there's a good reason?

There are a couple of other instances of this in other files too

athewsey · 2026-03-30T21:20:51Z

llmeter/endpoints/bedrock.py

    def create_payload(
-        user_message: str | list[str], max_tokens: int = 256, **kwargs: Any
+        user_message: str | list[str] | None = None,
+        max_tokens: int | None = None,
+        *,
+        images: list[bytes] | list[str] | None = None,
+        documents: list[bytes] | list[str] | None = None,
+        videos: list[bytes] | list[str] | None = None,
+        audio: list[bytes] | list[str] | None = None,
+        **kwargs: Any,
    ) -> dict:


This approach doesn't give users control over the ordering of the different types of block.

Might it be more flexible and still about-as-usable to instead export utility classes for the different types of content, and have this function take an ordered list of different types of content?

class ImageContent: @classmethod def from_path(cls, file_path): ... class BedrockBase(Endpoint): @staticmethod def create_payload( user_message: str | list[str | AudioContent | DocumentContent | ImageContent | VideoContent] ... ): ... BedrockBase.create_payload( [ "What's the title of the graph?, ImageContent.from_path("my-cool-graph.png") ], ... )

Same feedback applies to other connectors e.g. openai

athewsey · 2026-03-30T21:26:41Z

llmeter/callbacks/cost/model.py

 import importlib
+from dataclasses import dataclass, field
+
+from llmeter.utils import ensure_path


Not necessarily a problem to solve for the entire codebase here, but this is mixing absolute & local relative imports (see local dependencies section below).

As discussed I think we prefer to standardize on relative, so let's at least ensure we're not introducing new absolute imports in this PR.

athewsey · 2026-04-06T07:55:21Z

llmeter/endpoints/bedrock_invoke.py


        try:
-            req_body = json.dumps(payload).encode("utf-8")
+            req_body = json.dumps(payload, cls=LLMeterEncoder).encode("utf-8")


This can't work correctly because Bedrock endpoints don't want to receive __llmeter_bytes__ wrappers?

athewsey · 2026-04-06T09:12:10Z

llmeter/json_utils.py

+from upath import UPath as Path
+
+
+class LLMeterEncoder(json.JSONEncoder):


Unless we have a good reason not to, I'd suggest to just expose a (json.dump default-compatible) function interface, like we did before?

My guess from StackOverflow/etc is that it's more common for developers to override default than the whole cls in json.dump - so people would be more familiar with that interface.

We're not currently using any of the added flexibility of providing a JSONEncoder class versus just a default function.

It seems unlikely that we would want to get in to the complexities of re-implementing JSON {iter}encode from scratch.

If our serialization methods were receiving an encoder object, rather than a class, then at least there'd be the potential benefit of users configuring params like indent only once and re-using the encoder - but if we pass it in as a class then that statefulness is lost and we still have duplication of indent/etc parameters in lots of places around the codebase.

We're already providing a function-based interface for decoding with llmeter_bytes_decoder below (instead of e.g. providing a JSONDecoder), so it seems inconsistent

athewsey · 2026-04-06T09:21:57Z

llmeter/tokenizers.py

        if tokenizer_path is None:
            return DummyTokenizer()
-        with open(tokenizer_path, "r") as f:
+        tokenizer_path = UPath(tokenizer_path)


Should be ensure_path right?

athewsey · 2026-04-06T09:22:18Z

llmeter/tokenizers.py

@@ -123,7 +124,7 @@ def save_tokenizer(tokenizer: Any, output_path: UPath | str) -> UPath:

    output_path = UPath(output_path)


Should this also be ensure_path?

athewsey · 2026-04-06T09:27:57Z

llmeter/utils.py

+
+
+def ensure_path(
+    path: ReadablePathLike | WritablePathLike | None = None,


Nullable yes, but I'm not sure there's any need for this argument to be optional? You'd never want to call ensure_path().

Unless there's some reason it doesn't work, I'd suggest the cleanest typing would be:

@overload def ensure_path(path: ReadablePathLike | WritablePathLike) -> UPath: ... @overload def ensure_path(path: None) -> None: ... def ensure_path(path): {implementation}

...Which I believe should propagate the fact that output is None if and only if input is None.

acere requested a review from athewsey March 25, 2026 01:52

acere assigned acere and athewsey Mar 25, 2026

acere force-pushed the fix/image-payload-serialization branch from 365abe5 to 63be212 Compare March 25, 2026 13:44

acere added 4 commits March 30, 2026 09:58

acere force-pushed the fix/image-payload-serialization branch from 63be212 to e0ecacb Compare March 30, 2026 17:34

acere changed the title ~~Fix image payload serialization and add multi-modal support~~ feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations Mar 30, 2026

acere added 4 commits March 30, 2026 14:13

athewsey requested changes Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations#40

feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations#40
acere wants to merge 8 commits intoawslabs:mainfrom
acere:fix/image-payload-serialization

acere commented Mar 24, 2026 •

edited

Loading

Uh oh!

athewsey Mar 30, 2026

Uh oh!

athewsey Mar 30, 2026

Uh oh!

athewsey Mar 30, 2026

Uh oh!

athewsey Mar 30, 2026

Uh oh!

athewsey Mar 30, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -201,7 +203,9 @@ async def after_run(self, result: Result) -> None:

		def save_to_file(self, path: str) -> None:

		if isinstance(v, os.PathLike):
		result[k] = Path(v).as_posix()

		from upath import UPath as Path


		class LLMeterEncoder(json.JSONEncoder):

		@@ -123,7 +124,7 @@ def save_tokenizer(tokenizer: Any, output_path: UPath \| str) -> UPath:

		output_path = UPath(output_path)



		def ensure_path(
		path: ReadablePathLike \| WritablePathLike \| None = None,

Conversation

acere commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Binary-safe image serialization (closes #20)

Multi-modal payload support

UPath standardization

JSON-safe Result stats

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acere commented Mar 24, 2026 •

edited

Loading