Add Codex Responses prompt normalization by krystophny · Pull Request #5 · computor-org/vllm-mlx

krystophny · 2026-03-24T08:25:08Z

Summary

normalize OpenAI Responses developer content into a single leading system message
merge instructions with developer/system content in the shape Codex expects
add targeted regression tests for Codex prompt compatibility on top of the Responses API core

Stack position

depends on krystophny/vllm-mlx#1: Add OpenAI Responses API core #1
intentionally does not change the core /v1/responses transport or persistence logic
this is the client-compat layer needed for Codex local mode after the core endpoint exists

Why this is independently deployable on top of #1

only affects /v1/responses
leaves chat/completions and Anthropic message paths alone
makes the Codex-specific prompt normalization explicit instead of smuggling it into the core endpoint PR

Related upstream context

`ggml-org/llama.cpp`

This PR follows the same class of fix that already showed up in llama.cpp:

llama.cpp#20079 translated developer -> system and merged prompt pieces for Codex/template compatibility: fix: translate "developer" role to "system" for chat template compatibility ggml-org/llama.cpp#20079
llama.cpp#18486 and #19720 provide the broader Responses background there: server: /v1/responses (partial) ggml-org/llama.cpp#18486 server: add OpenAI Responses API compliance ggml-org/llama.cpp#19720

`vllm-project/vllm`

Related open Responses prompt/state fixes upstream include:

vllm#37727 instructions leaking with previous_response_id: [Bugfix] Fix Responses API instructions leaking through previous_response_id vllm-project/vllm#37727
vllm#37739 default chat-template kwargs handling in Responses: [Frontend] Fix default_chat_template_kwargs handling in Responses API vllm-project/vllm#37739

Our scope here is narrower: normalize the prompt shape so Codex local turns render cleanly against vllm-mlx.

Validation

PYTHONPATH=/Users/ert/code/vllm-mlx /Users/ert/code/.venv/bin/python -m pytest tests/test_responses_api.py -q
local Codex validation against vllm-mlx in the FortBench MLX rerun

What could still improve

additional tests covering more Codex prompt combinations with previous_response_id
explicit fixtures for unsupported built-in tools alongside Codex prompt normalization
upstreaming once the core Responses PR shape settles

krystophny mentioned this pull request Mar 24, 2026

Add OpenAI Responses API core #1

Merged

krystophny changed the base branch from feature/openai-responses-api to main March 24, 2026 12:16

fix: normalize Codex responses prompts

64f0bbe

krystophny force-pushed the feature/responses-api-codex-compat branch from dd838de to 64f0bbe Compare March 24, 2026 12:17

krystophny merged commit f1eadce into main Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Codex Responses prompt normalization#5

Add Codex Responses prompt normalization#5
krystophny merged 1 commit intomainfrom
feature/responses-api-codex-compat

krystophny commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stack position

Why this is independently deployable on top of #1

Related upstream context

ggml-org/llama.cpp

vllm-project/vllm

Validation

What could still improve

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krystophny commented Mar 24, 2026 •

edited

Loading

`ggml-org/llama.cpp`

`vllm-project/vllm`