feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests by acere · Pull Request #43 · awslabs/llmeter

acere · 2026-03-25T01:49:52Z

Summary

Adds the OpenAI Responses API endpoint support for LLMeter, with fixes to align with the actual API behavior.

Changes

Endpoint fixes (`llmeter/endpoints/openai_response.py`)

Rename max_tokens to max_output_tokens in create_payload (Response API parameter name)
Fix _parse_response to handle usage=None (Bedrock Mantle doesn't always return it) and use input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens
Rewrite _parse_stream_response to process typed events (response.output_text.delta, response.completed) instead of the old chunk-with-output-array format

Integration tests

Add tests/integ/test_response_endpoint.py — integration tests for ResponseEndpoint and ResponseStreamEndpoint wrappers against Bedrock Mantle
Fix tests/integ/test_response_bedrock.py to use ResponseUsage attribute names (input_tokens/output_tokens)

Unit test updates

Update all unit test mocks across 5 test files to use spec-based usage mocks (input_tokens/output_tokens) and event-based streaming mocks

Example notebook

Add examples/LLMeter with OpenAI Response API on Bedrock.ipynb demonstrating non-streaming and streaming usage with Runner and plotting

Testing

All 527 unit tests pass
Ruff lint clean

… test suite - Add ResponseEndpoint and ResponseStreamEndpoint classes for OpenAI Responses API support - Implement non-streaming and streaming response handling with proper error management - Add structured output support with response format validation and serialization - Create comprehensive unit test suite covering response parsing, error handling, format validation, model parameters, payload parsing, properties, and serialization - Add integration tests for Bedrock response endpoint functionality - Export new response endpoint classes from endpoints module - Update integration test configuration with response endpoint fixtures

- Rename max_tokens to max_output_tokens in create_payload (Response API parameter name) - Fix _parse_response to handle usage=None (Bedrock Mantle) and use input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens - Rewrite _parse_stream_response to process typed events (response.output_text.delta, response.completed) instead of the old chunk-with-output-array format - Fix test_response_bedrock.py to use ResponseUsage attribute names (input_tokens/output_tokens) - Add integration tests for ResponseEndpoint and ResponseStreamEndpoint - Add example notebook for Response API on Bedrock - Update all unit test mocks to match new behavior

acere added 2 commits March 24, 2026 21:41

acere requested a review from athewsey March 25, 2026 01:50

acere self-assigned this Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43
acere wants to merge 2 commits intoawslabs:mainfrom
acere:ResponseAPI

acere commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

acere commented Mar 25, 2026

Summary

Changes

Endpoint fixes (llmeter/endpoints/openai_response.py)

Integration tests

Unit test updates

Example notebook

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Endpoint fixes (`llmeter/endpoints/openai_response.py`)