feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations#40
feat: Multi-modal payload support, binary-safe serialization, and cross-platform file operations#40acere wants to merge 8 commits intoawslabs:mainfrom
Conversation
365abe5 to
63be212
Compare
Addresses awslabs#20 by implementing binary-safe serialization for payloads and results containing images. This prevents double base64 encoding and significantly reduces memory usage and serialization time. Changes: - Add binary serialization support to prompt_utils and results - Update endpoints to use binary-safe serialization - Add property-based tests for serialization correctness - Update integration tests to verify image handling
…s endpoints - Add multi-modal content handling (images, videos, audio, documents) to BedrockConverse, OpenAI, and SageMaker endpoints - Implement automatic format detection using puremagic with fallback to file extensions - Add multimodal utility functions for format conversion and content serialization - Support both file paths and raw bytes for multi-modal content - Add endpoint-specific format string handling (Bedrock short format, OpenAI MIME types) - Implement comprehensive unit tests for multi-modal serialization and properties across all endpoints - Add detailed README documentation with usage examples and security warnings for format detection - Fix Path serialization in JSONableBase to handle os.PathLike objects - Update pyproject.toml with optional multimodal extra for puremagic dependency - Improve integration tests with multi-modal payload examples - Enhance prompt utilities with multi-modal content handling
… compatibility - Replace built-in open() calls with UPath.open() across all file operations - Add UPath import to cost model for consistent path handling - Update cost model save_to_file() to create parent directories automatically - Standardize file reading in prompt_utils, results, runner, and tokenizers modules - Improves cross-platform file system support and enables cloud storage integration
Result.stats included raw datetime objects (start_time, end_time) from to_dict(), causing TypeError when users called json.dumps() without a custom serializer. Now to_dict() converts datetime fields via utc_datetime_serializer so stats is always directly serializable.
63be212 to
e0ecacb
Compare
…path helper Replace scattered `os.PathLike | str` type annotations with the proper UPath type aliases from `upath.types`: - `ReadablePathLike` for parameters used to load/read data - `WritablePathLike` for parameters used to save/write data Add `ensure_path()` utility to `llmeter.utils` to centralize the `Path(x)` normalization boilerplate that was duplicated at the top of nearly every function accepting a path argument. The helper handles None passthrough and uses a lazy UPath import. Runtime `isinstance(obj, os.PathLike)` checks in serialization helpers are left unchanged since TypeAliases cannot be used for runtime checks.
…nstances Cloud-backed UPath instances (e.g. S3Path) do not implement os.PathLike, so isinstance checks against os.PathLike alone would miss them. This caused: - Serialization: cloud UPaths skipped the path branch in JSON serializers - runner.py: cloud UPath payloads not recognized as path references, leading to unnecessary re-saving or failure to load from path Fix by checking isinstance(obj, (os.PathLike, Path)) where Path is UPath, which catches both plain pathlib.Path (via os.PathLike) and cloud UPaths (via UPath). Serialization keeps .as_posix() for cross-platform safety.
Rationalize scattered serialization utilities into a single unified module: - Create llmeter/json_utils.py with LLMeterEncoder (handles bytes, datetime, date, time, PathLike, to_dict() objects, str() fallback) and llmeter_bytes_decoder (restores __llmeter_bytes__ markers to bytes). - Remove redundant encoders: LLMeterBytesEncoder (prompt_utils), InvocationResponseEncoder (results), utc_datetime_serializer (results), and inline _default_serializer lambdas (runner, endpoints/base). - Slim down callbacks/cost/serde.py to cost-specific helpers (JSONableBase, ISerializable, from_dict_with_class, from_dict_with_class_map, to_dict_recursive_generic). Update to Python 3.10 typing (dict/type builtins instead of Dict/Type). - Standardize all to_json() methods to default to cls=LLMeterEncoder via kwargs.setdefault(), ensuring consistent encoding across InvocationResponse, Result, and JSONableBase. - Remove serializer/deserializer/cls customization params from save_payloads, load_payloads, _load_data_file — hardcode LLMeterEncoder and llmeter_bytes_decoder since custom encoders produce files that can't be loaded back without metadata. - LLMeterEncoder.default() delegates to to_dict() for objects that implement it, enabling json.dump(self, f, cls=LLMeterEncoder) without manual to_dict() calls (used in Endpoint.save). - Convert all changed files to relative imports, run ruff check + format + import sorting. - Clean up llmeter/utils.py: move upath imports to top level (it's a hard dependency), remove unnecessary from __future__ import annotations. - Add docs/reference/json_utils.md and mkdocs.yml nav entry. - Add property-based tests for datetime, date, time, PathLike, and to_dict() encoding (TestDatetimeSerializationProperties, TestPathSerializationProperties, TestToDictSerializationProperties). - Update existing tests to use LLMeterEncoder/llmeter_bytes_decoder from llmeter.json_utils instead of old aliases. All 581 unit tests pass.
Move path and datetime string conversion out of to_dict() and to_dict_recursive_generic() — these are Python dict builders, not serializers. Type coercion to strings is now exclusively handled by LLMeterEncoder at JSON serialization time. - Endpoint.to_dict(): remove PathLike → as_posix() coercion, simplify to a dict comprehension. Remove unused os import. - to_dict_recursive_generic(): remove PathLike → as_posix() and datetime/date/time → isoformat() coercions. Keep structural recursion (nested dicts, lists, to_dict() delegation). Remove unused os, datetime imports.
| @@ -201,7 +203,9 @@ async def after_run(self, result: Result) -> None: | |||
|
|
|||
| def save_to_file(self, path: str) -> None: | |||
There was a problem hiding this comment.
I think this str typing of path is wrong as something UPath-like should also be accepted right? Seems like there are multiple inconsistent path typings in our exposed APIs at the moment.
llmeter/callbacks/cost/serde.py
Outdated
| if isinstance(v, os.PathLike): | ||
| result[k] = Path(v).as_posix() |
There was a problem hiding this comment.
This check only works when v is a local path - fails with e.g. UPath("s3://...")
llmeter/endpoints/base.py
Outdated
| >>> original_bytes == restored_bytes | ||
| True | ||
| """ | ||
| from llmeter.results import InvocationResponseEncoder |
There was a problem hiding this comment.
Import not at top level of file, and it's local import so not sure there's a good reason?
There are a couple of other instances of this in other files too
| def create_payload( | ||
| user_message: str | list[str], max_tokens: int = 256, **kwargs: Any | ||
| user_message: str | list[str] | None = None, | ||
| max_tokens: int | None = None, | ||
| *, | ||
| images: list[bytes] | list[str] | None = None, | ||
| documents: list[bytes] | list[str] | None = None, | ||
| videos: list[bytes] | list[str] | None = None, | ||
| audio: list[bytes] | list[str] | None = None, | ||
| **kwargs: Any, | ||
| ) -> dict: |
There was a problem hiding this comment.
This approach doesn't give users control over the ordering of the different types of block.
Might it be more flexible and still about-as-usable to instead export utility classes for the different types of content, and have this function take an ordered list of different types of content?
class ImageContent:
@classmethod
def from_path(cls, file_path):
...
class BedrockBase(Endpoint):
@staticmethod
def create_payload(
user_message: str | list[str | AudioContent | DocumentContent | ImageContent | VideoContent]
...
):
...
BedrockBase.create_payload(
[
"What's the title of the graph?,
ImageContent.from_path("my-cool-graph.png")
],
...
)Same feedback applies to other connectors e.g. openai
| import importlib | ||
| from dataclasses import dataclass, field | ||
|
|
||
| from llmeter.utils import ensure_path |
There was a problem hiding this comment.
Not necessarily a problem to solve for the entire codebase here, but this is mixing absolute & local relative imports (see local dependencies section below).
As discussed I think we prefer to standardize on relative, so let's at least ensure we're not introducing new absolute imports in this PR.
|
|
||
| try: | ||
| req_body = json.dumps(payload).encode("utf-8") | ||
| req_body = json.dumps(payload, cls=LLMeterEncoder).encode("utf-8") |
There was a problem hiding this comment.
This can't work correctly because Bedrock endpoints don't want to receive __llmeter_bytes__ wrappers?
| from upath import UPath as Path | ||
|
|
||
|
|
||
| class LLMeterEncoder(json.JSONEncoder): |
There was a problem hiding this comment.
Unless we have a good reason not to, I'd suggest to just expose a (json.dump default-compatible) function interface, like we did before?
- My guess from StackOverflow/etc is that it's more common for developers to override
defaultthan the wholeclsinjson.dump- so people would be more familiar with that interface. - We're not currently using any of the added flexibility of providing a
JSONEncoderclass versus just adefaultfunction.- It seems unlikely that we would want to get in to the complexities of re-implementing JSON
{iter}encodefrom scratch. - If our serialization methods were receiving an encoder object, rather than a class, then at least there'd be the potential benefit of users configuring params like
indentonly once and re-using the encoder - but if we pass it in as a class then that statefulness is lost and we still have duplication of indent/etc parameters in lots of places around the codebase.
- It seems unlikely that we would want to get in to the complexities of re-implementing JSON
- We're already providing a function-based interface for decoding with
llmeter_bytes_decoderbelow (instead of e.g. providing aJSONDecoder), so it seems inconsistent
| if tokenizer_path is None: | ||
| return DummyTokenizer() | ||
| with open(tokenizer_path, "r") as f: | ||
| tokenizer_path = UPath(tokenizer_path) |
There was a problem hiding this comment.
Should be ensure_path right?
| @@ -123,7 +124,7 @@ def save_tokenizer(tokenizer: Any, output_path: UPath | str) -> UPath: | |||
|
|
|||
| output_path = UPath(output_path) | |||
There was a problem hiding this comment.
Should this also be ensure_path?
|
|
||
|
|
||
| def ensure_path( | ||
| path: ReadablePathLike | WritablePathLike | None = None, |
There was a problem hiding this comment.
Nullable yes, but I'm not sure there's any need for this argument to be optional? You'd never want to call ensure_path().
Unless there's some reason it doesn't work, I'd suggest the cleanest typing would be:
@overload
def ensure_path(path: ReadablePathLike | WritablePathLike) -> UPath: ...
@overload
def ensure_path(path: None) -> None: ...
def ensure_path(path):
{implementation}...Which I believe should propagate the fact that output is None if and only if input is None.
Summary
Addresses #20 by implementing binary-safe serialization for payloads and results containing images, and adds comprehensive multi-modal content support across endpoints. Also supersedes the approach in #21 with a more complete solution.
Related: #7 (serialization standardization)
Changes
Binary-safe image serialization (closes #20)
Payloads and results containing images were being double base64-encoded during serialization, causing significant memory overhead. Serialization now handles binary content natively via
LLMeterBytesEncoderandInvocationResponseEncoder, avoiding the redundant encoding.Multi-modal payload support
create_payload()on BedrockConverse, OpenAI, and SageMaker endpoints now accepts optionalimages,documents,videos, andaudiokeyword arguments (as file paths or raw bytes). Format detection usespuremagic(optionalmultimodalextra) with fallback to file extensions. Each endpoint maps formats to its expected convention automatically (Bedrock short names, OpenAI MIME types).Text-only payloads are fully backward compatible — no changes needed for existing callers.
UPath standardization
All
open()calls replaced withUPath.open()for consistent cross-platform and cloud storage support. Cost modelsave_to_file()now auto-creates parent directories.JSON-safe Result stats
Result.to_dict()now convertsdatetimefields (start_time,end_time) viautc_datetime_serializer, sojson.dumps(result.stats)works without a custom serializer.Testing
Closes #20