fix(server): strip volatile billing header on all cache paths (OpenAI + codex)#360
Merged
Merged
Conversation
… + codex)
- Pure normalizer normalize_system_for_cache() in prompt_normalize.{h,cpp}: no IO, no globals
- OpenAI /v1/chat/completions + codex /v1/responses now call the pure fn before tokenize/hash
- DRY: anthropic path's inline strip collapsed to thin caller (single normalization path)
- 6 pure-function tests: strips billing header (Anthropic array + OpenAI msg0), idempotent across turn change, preserves legit content, handles leading-whitespace header, cache-key stable
Contributor
There was a problem hiding this comment.
3 issues found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="server/src/server/http_server.cpp">
<violation number="1" location="server/src/server/http_server.cpp:1346">
P2: OpenAI system header normalization is skipped when `messages[0].content` is an array, leaving a cache-miss path unnormalized.</violation>
</file>
<file name="server/src/server/prompt_normalize.cpp">
<violation number="1" location="server/src/server/prompt_normalize.cpp:48">
P1: `messages[0]["content"]` is accessed without verifying the key exists, which can throw and fail request handling for malformed OpenAI payloads.
(Based on your team's feedback about guarding JSON string reads and key/type checks.) [FEEDBACK_USED].</violation>
</file>
Tip: cubic used a learning from your PR history. Let your coding agent read cubic learnings directly with the cubic MCP.
Re-trigger cubic
| if (first.is_object() && first.contains("role")) { | ||
| // OpenAI messages array: strip billing-header lines from messages[0]. | ||
| if (first.value("role", "") == "system") { | ||
| const auto & content = first["content"]; |
Contributor
There was a problem hiding this comment.
P1: messages[0]["content"] is accessed without verifying the key exists, which can throw and fail request handling for malformed OpenAI payloads.
(Based on your team's feedback about guarding JSON string reads and key/type checks.) .
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/prompt_normalize.cpp, line 48:
<comment>`messages[0]["content"]` is accessed without verifying the key exists, which can throw and fail request handling for malformed OpenAI payloads.
(Based on your team's feedback about guarding JSON string reads and key/type checks.) .</comment>
<file context>
@@ -0,0 +1,82 @@
+ if (first.is_object() && first.contains("role")) {
+ // OpenAI messages array: strip billing-header lines from messages[0].
+ if (first.value("role", "") == "system") {
+ const auto & content = first["content"];
+ if (content.is_string()) {
+ return strip_billing_header_lines(content.get<std::string>());
</file context>
| if (req.messages.is_array() && !req.messages.empty()) { | ||
| auto & m0 = req.messages[0]; | ||
| if (m0.is_object() && m0.value("role", "") == "system" && | ||
| m0.contains("content") && m0["content"].is_string()) { |
Contributor
There was a problem hiding this comment.
P2: OpenAI system header normalization is skipped when messages[0].content is an array, leaving a cache-miss path unnormalized.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/http_server.cpp, line 1346:
<comment>OpenAI system header normalization is skipped when `messages[0].content` is an array, leaving a cache-miss path unnormalized.</comment>
<file context>
@@ -1355,6 +1339,14 @@ bool HttpServer::route_request(int fd, const HttpRequest & hr) {
+ if (req.messages.is_array() && !req.messages.empty()) {
+ auto & m0 = req.messages[0];
+ if (m0.is_object() && m0.value("role", "") == "system" &&
+ m0.contains("content") && m0["content"].is_string()) {
+ m0["content"] = dflash::common::normalize_system_for_cache(req.messages);
+ }
</file context>
Suggested change
| m0.contains("content") && m0["content"].is_string()) { | |
| m0.contains("content") && (m0["content"].is_string() || m0["content"].is_array())) { |
cbac5f1 to
8f65f5c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-carved from #274 (commit
4ec8d17). A volatile per-turn header that clients prepend to the system prompt was poisoning the prefix cache (miss every turn) on the OpenAI path; the Anthropic path was already protected inline.Extracts a pure, IO-free
normalize_system_for_cache()(server/src/server/prompt_normalize.{h,cpp}) and routes all three cache paths through it before the prefix key is computed: Anthropic/v1/messages(refactored from the inline strip — single normalization path, DRY), OpenAI/v1/chat/completionsmessages[0](was unprotected — the core fix), and codex/v1/responsesinstructions. Known-pattern strip only (x-anthropic-billing-header+ the claude-code volatile block), position/whitespace-insensitive; legit system content is never stripped.Unit-tested (6 cases: strips on both shapes, idempotent across a changing header, preserves legit content, leading-whitespace, prefix-key stable). The 1620 prior assertions are unaffected. Branch A/B measured 0%→97.1% warm hit-rate on the OpenAI path; end-to-end warm-hit re-validation on the carved code is a batched GPU gate.
5 files, +228/-23.