Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Dec 6, 2025

Summary

  • Exclude mini variants from prompt_cache_retention support in model features
  • Piggyback existing tests to validate mini variants are not covered

Context
Evaluation surfaced failures related to passing prompt_cache_retention to mini variants (e.g. gpt-5-mini / gpt-5.1-codex-mini) causing litellm BadRequest errors. The intended behavior is to avoid sending prompt_cache_retention for these mini models.

Changes

  • openhands-sdk/openhands/sdk/llm/utils/model_features.py
    • supports_prompt_cache_retention now requires model to match GPT-5/GPT-4.1 patterns AND not contain "mini".
  • tests/sdk/llm/test_responses_parsing_and_kwargs.py
    • Updated test_chat_and_responses_options_prompt_cache_retention_gpt_5_plus_and_non_gpt to assert no prompt_cache_retention for mini variants.
  • tests/sdk/llm/test_model_features.py
    • Updated test_prompt_cache_retention_support expectations for mini variants to False.

Validation

  • Ran pre-commit on changed files: all hooks passed.
  • Ran targeted tests for the modified areas: passing.
    • tests/sdk/llm/test_responses_parsing_and_kwargs.py::test_chat_and_responses_options_prompt_cache_retention_gpt_5_plus_and_non_gpt
    • tests/sdk/llm/test_model_features.py::test_prompt_cache_retention_support

Notes

  • Full test suite has an unrelated import error in tests/github_workflows/test_resolve_model_config.py due to missing helper module; unrelated to this change.

Co-authored-by: openhands [email protected]

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:53e95c9-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-53e95c9-python \
  ghcr.io/openhands/agent-server:53e95c9-python

All tags pushed for this build

ghcr.io/openhands/agent-server:53e95c9-golang-amd64
ghcr.io/openhands/agent-server:53e95c9-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:53e95c9-golang-arm64
ghcr.io/openhands/agent-server:53e95c9-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:53e95c9-java-amd64
ghcr.io/openhands/agent-server:53e95c9-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:53e95c9-java-arm64
ghcr.io/openhands/agent-server:53e95c9-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:53e95c9-python-amd64
ghcr.io/openhands/agent-server:53e95c9-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:53e95c9-python-arm64
ghcr.io/openhands/agent-server:53e95c9-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:53e95c9-golang
ghcr.io/openhands/agent-server:53e95c9-java
ghcr.io/openhands/agent-server:53e95c9-python

About Multi-Architecture Support

  • Each variant tag (e.g., 53e95c9-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 53e95c9-python-amd64) are also available if needed

…djust tests

- Update model_features.get_features to skip mini variants
- Update tests to piggyback existing coverage and validate mini excluded

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm/utils
   model_features.py35197%161
TOTAL12426565354% 

enyst and others added 2 commits December 6, 2025 15:04
…atterns + mini exclusions

- Patterns: ['gpt-5', 'gpt-4.1'] with inline doc reference of actual listed models
- Exclude all '*mini' in feature gate (covers gpt-5-mini, gpt-5.1-mini, codex-mini)
- Extend tests to include explicit gpt-5.1-mini exclusion

Co-authored-by: openhands <[email protected]>
… docs; keep other minis excluded

- Update feature gate to carve out 'gpt-5.1-codex-mini'
- Update tests to expect retention for 5.1-codex-mini

Co-authored-by: openhands <[email protected]>
enyst and others added 3 commits December 6, 2025 16:13
…ix E501

- Provide find_models_by_id for tests expecting resolve_model_configs
- Wrap long error message to satisfy Ruff E501

Co-authored-by: openhands <[email protected]>
- Test failure was local-only; CI doesn’t run tests/github_workflows in tests.yml
- run-eval workflow uses resolve_model_config.py (singular) directly

Co-authored-by: openhands <[email protected]>
@enyst
Copy link
Collaborator Author

enyst commented Dec 6, 2025

PASS (200) for all documented positives:
openai/gpt-5.1
openai/gpt-5.1-codex
openai/gpt-5.1-codex-mini
openai/gpt-5.1-chat-latest
openai/gpt-5
openai/gpt-5-codex
openai/gpt-4.1

Negative controls:
openai/gpt-5-mini: UNEXPECTED-PASS (200), but response shows prompt_cache_retention: null and status=incomplete with incomplete_details.reason = max_output_tokens. So it didn’t error on the parameter, but the model didn’t produce output, and the retention was not applied.
openai/gpt-5.1-mini: 400 model_not_found (as before)

- Define llm_51_codex_mini before use

Co-authored-by: openhands <[email protected]>
@OpenHands OpenHands deleted a comment from openhands-ai bot Dec 6, 2025
@enyst enyst marked this pull request as ready for review December 6, 2025 16:30
@enyst enyst requested a review from xingyaoww December 6, 2025 16:30
@enyst
Copy link
Collaborator Author

enyst commented Dec 6, 2025

@xingyaoww Re: the failure in the agent behavior PR, looks like the issue is that some mini models don't support extended cache, while one does (gpt-5.1-codex-mini).

I verified a list of models, all those above that should support it, and tried a few that don't; excluded the one in integration tests too.

Behavior tests

@enyst enyst changed the title Exclude '*mini' models from prompt_cache_retention support and adjust tests Exclude '*mini' models from prompt_cache_retention Dec 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants