Skip to content

Conversation

@caozhiyuan
Copy link
Contributor

@caozhiyuan caozhiyuan commented Nov 19, 2025

This pull request introduces support for "thinking" blocks and signatures in the Anthropic message translation pipeline, allowing richer reasoning metadata to be passed between Anthropic and OpenAI message formats. The main changes include updates to the message interfaces, translation logic, and streaming event handling to properly process and emit reasoning text and opaque signatures. The tokenizer is also updated to exclude reasoning metadata from token counts.

Anthropic "thinking" block and signature support:

  • Added signature field to the AnthropicThinkingBlock interface, and updated all relevant translation logic to extract and emit reasoning_text and reasoning_opaque from OpenAI messages into Anthropic "thinking" blocks. [1] [2] [3] [4] [5] [6] [7] [8] [9]

  • Updated streaming translation logic to emit "thinking" blocks and their signatures as separate content blocks, including handling for starting, updating, and stopping these blocks in the event stream. [1] [2]

State and event handling improvements:

  • Added thinkingBlockOpen state to AnthropicStreamState and ensured correct block lifecycle management for both text and thinking blocks in streaming translation. [1] [2] [3] [4]

Message content mapping and translation fixes:

  • Refactored content mapping to separate text and thinking blocks, and ensured reasoning metadata is correctly attached to assistant messages for both streaming and non-streaming translation. [1] [2] [3] [4] [5]

Tokenizer exclusion for reasoning metadata:

  • Updated the tokenizer to skip counting tokens for the reasoning_opaque field, preventing it from affecting token usage calculations.

Miscellaneous and infrastructure:

  • Set Bun server idleTimeout to 0 for improved server stability.

Summary by CodeRabbit

  • New Features

    • Added explicit reasoning/thinking fields and live thinking blocks in assistant messages and streams for improved transparency.
    • Streaming now emits reasoning-related deltas allowing reasoning_text and opaque reasoning data alongside content.
  • Chores

    • Adjusted server idle timeout to improve connection handling.
    • Token calculation updated to ignore opaque reasoning entries.
  • Tests

    • Updated tests to validate new reasoning fields and streaming state.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Walkthrough

Adds thinking-block metadata and streaming state, separates reasoning into text and opaque signature fields, refactors stream translation into modular handlers with explicit block lifecycle, exposes optional reasoning fields in copilot types, skips tokenizing reasoning_opaque, updates tests, and sets Bun idleTimeout to 0.

Changes

Cohort / File(s) Summary
Anthropic types
src/routes/messages/anthropic-types.ts
Added signature: string to AnthropicThinkingBlock and thinkingBlockOpen: boolean to AnthropicStreamState.
Stream translation
src/routes/messages/stream-translation.ts
Large refactor: translateChunkToAnthropicEvents decomposed into handlers (handleMessageStart, handleThinkingText, handleContent, handleToolCalls, handleFinish, handleReasoningOpaque, closeThinkingBlockIfOpen, etc.), added reasoning/signature delta flows, explicit block lifecycle and thinkingBlockOpen state tracking.
Non-stream translation
src/routes/messages/non-stream-translation.ts
Separate reasoning into reasoning_text and reasoning_opaque (signature) via getAnthropicThinkBlocks; thinking blocks removed from content stream; assistant responses now include reasoning fields when applicable.
Handler initialization
src/routes/messages/handler.ts
Initialize streamState with thinkingBlockOpen: false.
Copilot types / public API
src/services/copilot/create-chat-completions.ts
Exported Delta, Choice, ResponseMessage, and Message; added optional reasoning_text and reasoning_opaque to deltas/messages and logprobs to Choice.
Tokenizer
src/lib/tokenizer.ts
Skip token calculation for reasoning_opaque by early-continue in calculateMessageTokens.
Tests
tests/anthropic-request.test.ts, tests/anthropic-response.test.ts
Tests updated to include signature in thinking blocks, initialize thinkingBlockOpen, and assert reasoning_text where applicable.
Server config
src/start.ts
Set bun: { idleTimeout: 0 } in server serve configuration.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler
    participant Translator as StreamTranslator
    participant Emitter as EventEmitter

    Client->>Handler: send streaming chunk
    Handler->>Translator: translateChunkToAnthropicEvents(chunk)

    alt chunk contains reasoning_text (thinking)
        Translator->>Emitter: content_block_start (thinking)
        Translator->>Emitter: thinking_delta (reasoning_text)
        Note right of Translator `#e6f4ea`: thinkingBlockOpen = true
    end

    alt chunk contains reasoning_opaque (signature)
        Translator->>Translator: closeThinkingBlockIfOpen()
        Translator->>Emitter: signature_delta (reasoning_opaque)
        Translator->>Emitter: content_block_stop (thinking)
        Note right of Translator `#fce8d8`: thinkingBlockOpen = false
    end

    alt chunk contains text or tool_calls
        Translator->>Emitter: handleContent() / handleToolCalls() -> text_delta / tool_use deltas
    end

    alt stream finishes
        Translator->>Translator: handleFinish() (close blocks, emit stop_reason)
        Translator->>Emitter: message_delta + message_stop
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

  • Review focus:
    • src/routes/messages/stream-translation.ts — handler ordering, delta sequencing, and block lifecycle.
    • src/routes/messages/non-stream-translation.ts — reasoning field emission and thinking block removal from content.
    • src/services/copilot/create-chat-completions.ts — exported type changes and optional fields.
    • src/lib/tokenizer.ts — verify reasoning_opaque exclusion from token accounting.
    • Tests in tests/* — ensure assertions match new fields and state.

Poem

🐇 I nibble at deltas, tuck signatures neat,
Thinking blocks open, then close on quick feet,
Streams hum in order, reasoning split in two,
Tokens skip the secret that only signatures knew,
— A rabbit's tidy hop through the codebase meadow 🌿

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title is vague and partially misleading. It mentions 'gemini-3-pro' but the PR exclusively implements Anthropic thinking blocks and reasoning support; gemini-3-pro support is not evident in the changeset. Revise the title to reflect the actual primary change: something like 'Add Anthropic thinking blocks and reasoning metadata support' or 'Support Anthropic reasoning blocks in chat completions translation'.
Docstring Coverage ⚠️ Warning Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfb40d2 and 7657d87.

📒 Files selected for processing (1)
  • src/routes/messages/stream-translation.ts (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
🔇 Additional comments (8)
src/routes/messages/stream-translation.ts (8)

250-288: LGTM!

The function correctly emits a complete thinking block with signature and properly increments contentBlockIndex after closing the block (line 286).


290-336: LGTM!

The function correctly handles reasoning_text deltas and closes the thinking block when reasoning_opaque arrives in the same delta. The block lifecycle management (open at line 305, close at line 333, index increment at line 332) is correct.


338-360: LGTM!

The helper correctly closes open thinking blocks by emitting an empty signature_delta followed by content_block_stop, then increments the index and resets the state flag. The implementation is sound.


176-215: LGTM!

The function correctly closes thinking blocks (line 182) and tool blocks (lines 184-192) before emitting text content, ensuring proper block lifecycle management.


98-158: LGTM!

The function correctly closes thinking blocks (line 104) before processing tool calls and properly delegates reasoning opaque handling to the dedicated helper.


160-174: LGTM!

The function correctly closes non-tool content blocks before delegating to handleReasoningOpaque, ensuring proper block sequencing when tool calls include reasoning opaque signatures.


217-248: LGTM!

The function correctly emits the message_start event once and properly initializes token usage accounting for prompt cache hits.


1-21: LGTM!

The imports correctly include the necessary types from the copilot service, and the isToolBlockOpen helper correctly identifies tool blocks by checking if the current block index matches any registered tool call.


Comment @coderabbitai help to get the list of available commands and usage tips.

@caozhiyuan
Copy link
Contributor Author

@ericc-ch feature about chat completions reasoning, support gemini-3-pro works ok, please help review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
tests/anthropic-response.test.ts (1)

250-256: AnthropicStreamState initialization updated correctly; consider DRY helper

Adding thinkingBlockOpen: false keeps the test state in sync with the new AnthropicStreamState shape. You now duplicate the full initial state literal in multiple tests; consider a small helper like createInitialStreamState() to avoid future drift when adding more fields.

Also applies to: 351-357

src/routes/messages/anthropic-types.ts (1)

56-60: Thinking block signature and signature_delta wiring looks consistent, but type is stricter than usage

Making AnthropicThinkingBlock carry a signature: string and adding the signature_delta variant to AnthropicContentBlockDeltaEvent matches how the stream translator now emits signatures and how non‑stream translation constructs thinking blocks.

One nuance: elsewhere you defensively treat signature as optional/possibly empty (e.g., checking b.signature && b.signature.length > 0 and sometimes emitting signature: ""). If the upstream protocol allows thinking blocks without a signature, you might want signature?: string to better reflect the wire shape, instead of relying on empty strings as a sentinel.

Also applies to: 140-148

tests/anthropic-request.test.ts (1)

128-158: Tests exercise reasoning_text but not reasoning_opaque mapping

The two thinking‑block tests now correctly:

  • Include signature on Anthropic thinking blocks.
  • Assert that reasoning_text is populated while content only carries user‑visible text.

However, neither test asserts that the Anthropic signature ends up on assistantMessage.reasoning_opaque. Adding something like:

expect(assistantMessage?.reasoning_opaque).toBe("abc123")

and

expect(assistantMessage?.reasoning_opaque).toBe("def456")

would close the coverage gap for the opaque reasoning metadata as well. You might also tweak the test comments to clarify that thinking content is surfaced via reasoning_text, not merged into content.

Also applies to: 160-201

src/services/copilot/create-chat-completions.ts (1)

72-86: Reasoning fields and exported streaming types are wired consistently

Exporting Delta and Choice and adding reasoning_text / reasoning_opaque to both streaming deltas and non‑stream messages matches how the translation layers consume them. The translator code only treats these as optional and nullable, which is consistent with the type definitions here.

If you want to tighten typings later, you could replace object for logprobs with a more specific shape or Record<string, unknown>, but that’s not blocking.

Also applies to: 88-93, 114-120, 166-175

src/routes/messages/non-stream-translation.ts (2)

274-296: getAnthropicThinkBlocks drops opaque signature when both text and opaque are present

In translateToAnthropic, thinking blocks are reconstructed from:

  • choice.message.reasoning_text
  • choice.message.reasoning_opaque

via getAnthropicThinkBlocks. Right now that helper does:

if (reasoningText) {
  return [{ type: "thinking", thinking: reasoningText, signature: "" }]
}
if (reasoningOpaque) {
  return [{ type: "thinking", thinking: "", signature: reasoningOpaque }]
}

So when both reasoning_text and reasoning_opaque are present, the opaque portion is silently discarded for the non‑stream Anthropic response. If upstream ever sends both fields together, you likely want to preserve the signature as well, e.g., by returning a single block with both populated or two blocks (one text, one signature).

Suggestion: adjust getAnthropicThinkBlocks to handle the “both present” case explicitly instead of treating them as mutually exclusive.

Also applies to: 320-359


274-296: Multi-choice stop reason precedence is a bit opaque

stopReason is initialized from the first choice and then updated with:

if (choice.finish_reason === "tool_calls" || stopReason === "stop") {
  stopReason = choice.finish_reason
}

This causes a later non‑tool reason (e.g., "length") to overwrite an earlier "stop" reason, which may not be what you intend. If the goal is simply “prefer tool_calls over everything else,” a more direct rule (e.g., only overwrite when choice.finish_reason === "tool_calls") would be easier to reason about.

Not a blocker, but worth double‑checking the intended precedence.

src/routes/messages/stream-translation.ts (1)

227-264: Consider advancing contentBlockIndex after handleReasoningOpaque emits a full thinking block

handleReasoningOpaque emits a complete thinking block lifecycle:

  • content_block_start (type "thinking")
  • a dummy thinking_delta
  • a signature_delta
  • content_block_stop

but it never updates state.contentBlockIndex or any open/closed flags. As a result, the next text or tool block may reuse the same index as this ephemeral thinking block, which can make the event stream harder to reason about for consumers that treat indices as unique per block.

A low‑impact improvement would be to increment state.contentBlockIndex at the end of handleReasoningOpaque (and keep contentBlockOpen / thinkingBlockOpen false), so each emitted block gets a distinct index.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ea08fe and a2467d3.

📒 Files selected for processing (7)
  • src/routes/messages/anthropic-types.ts (2 hunks)
  • src/routes/messages/handler.ts (1 hunks)
  • src/routes/messages/non-stream-translation.ts (5 hunks)
  • src/routes/messages/stream-translation.ts (3 hunks)
  • src/services/copilot/create-chat-completions.ts (4 hunks)
  • tests/anthropic-request.test.ts (4 hunks)
  • tests/anthropic-response.test.ts (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/anthropic-types.ts
  • src/routes/messages/stream-translation.ts
  • src/routes/messages/non-stream-translation.ts
  • src/routes/messages/handler.ts
  • tests/anthropic-request.test.ts
🧬 Code graph analysis (2)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
  • AnthropicThinkingBlock (56-60)
🔇 Additional comments (4)
src/routes/messages/anthropic-types.ts (1)

195-208: New thinkingBlockOpen flag in stream state aligns with streaming helpers

Adding thinkingBlockOpen: boolean to AnthropicStreamState matches how stream-translation.ts now manages open thinking blocks separately from regular content blocks. The field is initialized in both the handler and tests, so state shape is consistent across call sites.

src/routes/messages/handler.ts (1)

56-64: Streaming state now correctly initializes thinkingBlockOpen

Including thinkingBlockOpen: false in the handler’s streamState keeps the runtime state in sync with AnthropicStreamState and the streaming translator’s expectations. No issues here.

src/routes/messages/non-stream-translation.ts (1)

126-180: Assistant reasoning is cleanly separated from visible content

handleAssistantMessage + the updated mapContent now:

  • Aggregate all thinking blocks into allThinkingContent and expose it via reasoning_text.
  • Select a single non‑empty signature into reasoning_opaque.
  • Ensure content only contains text (and images when present), with thinking and tool blocks filtered out.

This achieves the PR goal of keeping reasoning separate from user‑visible text and structuring tool calls via tool_calls. The mapContent filter to only include text blocks for non‑image content is particularly nice for avoiding accidental leakage of non‑display blocks.

Also applies to: 182-222

src/routes/messages/stream-translation.ts (1)

23-47: Streaming translator refactor improves readability and separation of concerns

The split of translateChunkToAnthropicEvents into handleMessageStart, handleThinkingText, handleContent, handleToolCalls, and handleFinish makes the streaming pipeline much easier to follow. State mutations (messageStartSent, contentBlockIndex, contentBlockOpen, thinkingBlockOpen, toolCalls) are now localized, which should reduce future bugs around mixing text/tool/thinking content.

Also applies to: 194-225

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/start.ts (1)

120-122: Disabling Bun idle timeout is helpful for streaming but has resource trade‑offs

Setting bun.idleTimeout = 0 will keep idle keep‑alive connections open indefinitely, which is good for long‑running reasoning/streaming but can retain dead or abandoned connections and increase resource usage under load.

Consider:

  • Making the idle timeout configurable via an env var (e.g., BUN_IDLE_TIMEOUT_SECONDS, defaulting to 0 or a high but finite value), and/or
  • Documenting that 0 is intentional for long‑lived streaming so operators know the impact if this is ever used in a shared/production setting.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2467d3 and 6c385b7.

📒 Files selected for processing (2)
  • src/routes/messages/non-stream-translation.ts (5 hunks)
  • src/start.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
  • AnthropicThinkingBlock (56-60)
🔇 Additional comments (4)
src/routes/messages/non-stream-translation.ts (4)

146-153: LGTM!

The extraction of thinking content and signature from multiple thinking blocks is well-handled. Concatenating thinking content and taking the first non-empty signature is a reasonable approach for the translation.


155-179: LGTM!

The reasoning fields are correctly added to assistant messages in both code paths (with and without tool calls), ensuring consistent handling of thinking block data.


194-200: LGTM!

The explicit filter for text blocks ensures that thinking blocks are excluded from regular content, which is correct since they're now handled separately via reasoning fields.


278-296: LGTM!

The refactor to use a unified assistantContentBlocks array that includes text, thinking, and tool use blocks is cleaner and correctly preserves the stop reason logic.

@caozhiyuan caozhiyuan force-pushed the feature/chat-completions-reasoning branch from 6c385b7 to 65a3568 Compare November 19, 2025 13:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/routes/messages/non-stream-translation.ts (1)

278-291: Anthropic response reconstruction is correct; consider ordering and naming nits

Combining text, thinking, and tool_use blocks into assistantContentBlocks using the new reasoning fields is logically sound. Two optional cleanups you might consider:

  • Put thingBlocks (thinking) before textBlocks to better mirror common Anthropic ordering (thinking preceding final text), if that matches how other paths structure content.
  • Rename thingBlocksthinkingBlocks for clarity.

Both are non-blocking and mostly about readability/consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c385b7 and 65a3568.

📒 Files selected for processing (1)
  • src/routes/messages/non-stream-translation.ts (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
  • AnthropicThinkingBlock (56-60)
🔇 Additional comments (3)
src/routes/messages/non-stream-translation.ts (3)

142-179: Reasoning metadata extraction in assistant messages looks solid; confirm multi-signature expectations

The separation of thinking content into reasoning_text/reasoning_opaque while keeping content free of thinking blocks is correct and symmetric with the Anthropic→OpenAI path. One behavioral detail: when multiple thinking blocks carry different signature values, only the first non-empty signature is propagated. If upstream could ever emit multiple distinct signatures per message, that information would be collapsed here; otherwise this is fine as-is.


194-201: mapContent correctly strips non-text (including thinking) from visible content

The updated non-image path returning only joined text blocks is a good change: thinking and other non-text blocks are no longer mixed into the user-visible content, which aligns with the new dedicated reasoning fields and keeps Anthropic internals out of the OpenAI surface.


336-359: getAnthropicThinkBlocks now preserves both reasoning fields as intended

This helper correctly emits a single AnthropicThinkingBlock whenever either reasoningText or reasoningOpaque is non-empty, and preserves both when both are present (thinking: reasoningText, signature: reasoningOpaque || ""). That resolves the earlier data-loss concern around dropping reasoning_opaque and keeps the types aligned with AnthropicThinkingBlock.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/routes/messages/stream-translation.ts (1)

23-47: Standalone reasoning_opaque deltas are not processed

If a delta contains only reasoning_opaque (no reasoning_text, tool_calls, content, or finish_reason), it will be silently ignored:

  • handleThinkingText only processes reasoning_opaque when reasoning_text is present (line 300)
  • handleToolCalls calls handleReasoningOpaque only when tool_calls exist (line 105)
  • handleFinish calls handleReasoningOpaque only when finish_reason exists (line 68)

Per the streaming protocol, deltas may carry only opaque reasoning tokens without other fields (as noted in the previous review's web search).

Option 1: Unconditionally process reasoning_opaque in handleThinkingText, regardless of whether reasoning_text is present:

 function handleThinkingText(
   delta: Delta,
   state: AnthropicStreamState,
   events: Array<AnthropicStreamEventData>,
 ) {
   if (delta.reasoning_text && delta.reasoning_text.length > 0) {
     if (!state.thinkingBlockOpen) {
       events.push({
         type: "content_block_start",
         index: state.contentBlockIndex,
         content_block: {
           type: "thinking",
           thinking: "",
         },
       })
       state.thinkingBlockOpen = true
     }

     events.push({
       type: "content_block_delta",
       index: state.contentBlockIndex,
       delta: {
         type: "thinking_delta",
         thinking: delta.reasoning_text,
       },
     })
+  }

-    if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) {
+  // Handle reasoning_opaque even if there's no reasoning_text
+  if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) {
+    if (state.thinkingBlockOpen) {
+      // Close open thinking block with signature
       events.push(
         {
           type: "content_block_delta",
           index: state.contentBlockIndex,
           delta: {
             type: "signature_delta",
             signature: delta.reasoning_opaque,
           },
         },
         {
           type: "content_block_stop",
           index: state.contentBlockIndex,
         },
       )
       state.contentBlockIndex++
       state.thinkingBlockOpen = false
+    } else {
+      // No thinking block open - create complete reasoning block
+      handleReasoningOpaque(delta, events, state)
     }
   }
 }

Option 2: Add standalone check in main function before handleFinish:

   handleContent(delta, state, events)

   handleToolCalls(delta, state, events)

+  // Handle standalone reasoning_opaque not caught by other handlers
+  if (
+    delta.reasoning_opaque &&
+    !state.thinkingBlockOpen &&
+    !state.contentBlockOpen
+  ) {
+    handleReasoningOpaque(delta, events, state)
+  }

   handleFinish(choice, state, { events, chunk })

I recommend Option 1 as it centralizes all reasoning handling in one place.

♻️ Duplicate comments (1)
src/routes/messages/stream-translation.ts (1)

49-94: Thinking blocks must be closed before message finishes

If state.thinkingBlockOpen is true when finish_reason arrives, the thinking block is never explicitly closed before message_stop. Per streaming protocol, thinking/content blocks must receive an explicit content_block_stop event before the message finishes.

Scenario:

  1. Delta with reasoning_text opens thinking block (thinkingBlockOpen = true)
  2. Final delta with only finish_reason (no reasoning_opaque, no other content)
  3. handleFinish closes contentBlockOpen but leaves thinkingBlockOpen = true
  4. message_stop is emitted without closing the thinking block

Apply this diff:

 function handleFinish(
   choice: Choice,
   state: AnthropicStreamState,
   context: {
     events: Array<AnthropicStreamEventData>
     chunk: ChatCompletionChunk
   },
 ) {
   const { events, chunk } = context
   if (choice.finish_reason && choice.finish_reason.length > 0) {
+    closeThinkingBlockIfOpen(state, events)
+
     if (state.contentBlockOpen) {
       context.events.push({
         type: "content_block_stop",
         index: state.contentBlockIndex,
       })
       state.contentBlockOpen = false
       state.contentBlockIndex++
     }

     handleReasoningOpaque(choice.delta, events, state)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65a3568 and ec66480.

📒 Files selected for processing (1)
  • src/routes/messages/stream-translation.ts (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
🔇 Additional comments (4)
src/routes/messages/stream-translation.ts (4)

1-47: Refactoring improves code organization

The modular handler approach (splitting into handleMessageStart, handleThinkingText, handleContent, handleToolCalls, handleFinish) makes the streaming translation logic clearer and easier to maintain. The imports of Choice and Delta types are appropriate.


96-199: Content and tool call handling looks correct

The handleToolCalls and handleContent functions properly invoke closeThinkingBlockIfOpen before processing their respective content types, ensuring thinking blocks are closed at the right boundaries. The block lifecycle management (opening, emitting deltas, closing, incrementing indices) follows the expected pattern.


201-232: Message start handling is correct

The handleMessageStart function correctly emits the message_start event once with proper initialization of token counters and cache metadata.


321-343: Helper function correctly manages thinking block closure

The closeThinkingBlockIfOpen helper properly emits signature_delta with an empty string and content_block_stop, then increments the index and resets the state flag. This is the correct pattern for closing thinking blocks.

@caozhiyuan caozhiyuan force-pushed the feature/chat-completions-reasoning branch from ec66480 to 9adc6e0 Compare November 19, 2025 15:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/routes/messages/stream-translation.ts (2)

269-315: Thinking text streaming generally coherent, but tightly coupled to finish logic

The handleThinkingText logic for opening/extending a thinking block and optionally attaching a signature_delta when reasoning_opaque is present is consistent on its own, and it correctly bumps contentBlockIndex and resets thinkingBlockOpen on closure.

The main caveats are its interaction with handleFinish (thinking block not closed when only finish_reason arrives, and potential double-signature when both reasoning_text and reasoning_opaque appear on the final chunk), which I’ve detailed in the handleFinish comment. Any changes there should be coordinated with this function to avoid duplicated or missing thinking closures.


49-94: Thinking block may remain open at finish and reasoning_opaque can be double-emitted

Two intertwined issues around finish handling:

  1. Open thinking block not closed on finish

    • handleFinish only closes state.contentBlockOpen.
    • If you’ve streamed one or more reasoning_text chunks (so state.thinkingBlockOpen === true) and then receive a final chunk that carries only finish_reason (no content, no tool_calls, no reasoning_opaque), the thinking block never gets a content_block_stop before message_stop. closeThinkingBlockIfOpen is never called from here.
  2. Double processing of reasoning_opaque on final chunks

    • handleThinkingText can already consume reasoning_opaque when it appears together with reasoning_text, emitting a signature_delta and content_block_stop and advancing contentBlockIndex.
    • handleFinish then unconditionally calls handleReasoningOpaque(choice.delta, …). For a final delta that contains both reasoning_text and reasoning_opaque, this opens a second thinking block with an empty thinking_delta plus another signature_delta, effectively duplicating the signature on a new block.

These behaviors match a previously raised concern and still look fragile with respect to upstream reasoning delta shapes.

A safer approach would be along the lines of:

  • In handleFinish, before emitting the message_delta/message_stop, explicitly close any open thinking block, ideally attaching choice.delta.reasoning_opaque to that block (if present) rather than starting a new one.
  • Only fall back to handleReasoningOpaque when there was no open thinking block for this delta (e.g., a reasoning-opaque-only final chunk).
  • Ensure that chunks which carry only finish_reason still cause a thinking block to be closed if state.thinkingBlockOpen is true.

This keeps every thinking block explicitly terminated and avoids emitting duplicate signature-only thinking blocks for the same reasoning payload.

🧹 Nitpick comments (2)
src/routes/messages/stream-translation.ts (2)

230-267: Reasoning-opaque helper currently assumes finish-only usage

handleReasoningOpaque emits a complete thinking block (start → empty thinking_delta → signature_delta → stop) at the current contentBlockIndex but never increments the index afterward. With the current code it is only called from handleFinish, so no subsequent blocks will reuse that index in the same message and this is functionally safe.

However, if this helper is ever reused earlier in the stream (e.g., from handleToolCalls or handleContent), not incrementing state.contentBlockIndex after the content_block_stop will cause index collisions with later blocks. It would be safer to either:

  • Increment state.contentBlockIndex++ after the stop event, or
  • Document that this helper is strictly for “final signature at finish” and should not be used mid-stream.

317-338: Thinking-block closer matches state model, but signature payload is hard-coded

closeThinkingBlockIfOpen correctly:

  • Emits a signature_delta followed by content_block_stop.
  • Increments contentBlockIndex.
  • Resets thinkingBlockOpen.

The only questionable part is that it always uses signature: "". If upstream can ever provide a final reasoning_opaque token on the same chunk that triggers closure, you may eventually want a variant that takes the current Delta and uses the real opaque signature here rather than an empty string. For now this is acceptable as a fallback closer, but it’s another place that will need adjustment if you rework finish-time reasoning handling.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec66480 and 9adc6e0.

📒 Files selected for processing (1)
  • src/routes/messages/stream-translation.ts (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
🔇 Additional comments (5)
src/routes/messages/stream-translation.ts (5)

1-5: Typed imports for streaming deltas look correct

Importing ChatCompletionChunk, Choice, and Delta from the shared Copilot types keeps this translator aligned with the upstream chunk shape; no issues here.


23-47: Single-choice assumption and handler ordering

translateChunkToAnthropicEvents always takes chunk.choices[0] and ignores any additional choices, and processes the delta in the fixed order: thinking → content → tools → finish. If Copilot ever streams multiple choices, they’ll be dropped here, but if the API contract guarantees a single streaming choice this is fine. The handler ordering also seems consistent with “reasoning first, then content/tools, then finish”.


96-154: Tool call streaming handling is consistent with Anthropic tool_use blocks

Tool call handling looks solid:

  • Closes any open thinking block before tools via closeThinkingBlockIfOpen.
  • Closes an existing content block before starting a new tool_use block.
  • Tracks tool calls by OpenAI tool_calls[index] and maps them to Anthropic content_block_start/input_json_delta events, supporting incremental arguments deltas.

Given the current state model, this should generate valid tool_use blocks without index collisions.


156-195: Content block handling correctly separates text from tools/thinking

The content handler:

  • Closes any open thinking block before text, preserving the “reasoning first, then visible text” invariant.
  • Detects and closes an open tool block via isToolBlockOpen before starting a text block.
  • Lazily starts a text block and appends text_delta events for delta.content.

This matches Anthropic’s expectations for text content blocks and keeps block indices advancing correctly.


197-228: message_start initialization and usage wiring look good

handleMessageStart emits a single message_start per stream, initializes usage with effective input tokens (excluding cached tokens), and defers non-zero output_tokens to the final message_delta. This matches the intended pattern of “lightweight start, accurate usage at finish”.

…g order when stream=false and exclude reasoning_opaque from token calculation in calculateMessageTokens
@caozhiyuan caozhiyuan force-pushed the feature/chat-completions-reasoning branch from 3f52575 to 3fa5519 Compare November 19, 2025 22:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/routes/messages/stream-translation.ts (1)

49-96: Critical: Thinking blocks are not explicitly closed before message finishes

A previous review flagged this issue and marked it as addressed (commit ec66480), but the fix is not present in the current code.

The bug: When reasoning_text deltas are followed by a chunk containing only finish_reason (no content, tool_calls, or reasoning_text), state.thinkingBlockOpen remains true, but handleFinish only closes contentBlockOpen (line 59). The thinking block never receives an explicit content_block_stop before message_stop (lines 91-93), violating Anthropic's streaming protocol.

Example flow:

  1. Delta 1: { reasoning_text: "thinking..." } → opens thinking block, sets thinkingBlockOpen = true
  2. Delta 2: { finish_reason: "stop" }handleFinish emits message_stop without closing thinking block

Anthropic's protocol requires all content blocks (including thinking blocks) to be explicitly terminated with content_block_stop before message_stop.

Apply this diff to close any open thinking block before emitting the final message events:

 function handleFinish(
   choice: Choice,
   state: AnthropicStreamState,
   context: {
     events: Array<AnthropicStreamEventData>
     chunk: ChatCompletionChunk
   },
 ) {
   const { events, chunk } = context
   if (choice.finish_reason && choice.finish_reason.length > 0) {
+    // Close any open thinking block before finishing
+    closeThinkingBlockIfOpen(state, events)
+
     if (state.contentBlockOpen) {
       const toolBlockOpen = isToolBlockOpen(state)
       context.events.push({
         type: "content_block_stop",
         index: state.contentBlockIndex,
       })
       state.contentBlockOpen = false
       state.contentBlockIndex++
       if (!toolBlockOpen) {
         handleReasoningOpaque(choice.delta, events, state)
       }
     }

     events.push(
       {
         type: "message_delta",
         delta: {
           stop_reason: mapOpenAIStopReasonToAnthropic(choice.finish_reason),
           stop_sequence: null,
         },
         usage: {
           input_tokens:
             (chunk.usage?.prompt_tokens ?? 0)
             - (chunk.usage?.prompt_tokens_details?.cached_tokens ?? 0),
           output_tokens: chunk.usage?.completion_tokens ?? 0,
           ...(chunk.usage?.prompt_tokens_details?.cached_tokens
             !== undefined && {
             cache_read_input_tokens:
               chunk.usage.prompt_tokens_details.cached_tokens,
           }),
         },
       },
       {
         type: "message_stop",
       },
     )
   }
 }
🧹 Nitpick comments (1)
src/routes/messages/stream-translation.ts (1)

250-288: Simplify signature-only thinking blocks by removing unnecessary empty thinking_delta

Line 269-271 emits a thinking_delta with an empty string before the signature_delta, but Anthropic's streaming protocol does not require this. A thinking block may contain only content_block_start, signature_delta, and content_block_stop without any thinking_delta event.

Remove the empty thinking_delta (lines 269-271) to reduce unnecessary events in the stream:

{
  type: "content_block_start",
  index: state.contentBlockIndex,
  content_block: {
    type: "thinking",
    thinking: "",
  },
},
-{
-  type: "content_block_delta",
-  index: state.contentBlockIndex,
-  delta: {
-    type: "thinking_delta",
-    thinking: "",
-  },
-},
{
  type: "content_block_delta",
  index: state.contentBlockIndex,
  delta: {
    type: "signature_delta",
    signature: delta.reasoning_opaque,
  },
},
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa5519 and dfb40d2.

📒 Files selected for processing (1)
  • src/routes/messages/stream-translation.ts (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)

Comment on lines +36 to +46
handleMessageStart(state, events, chunk)

handleThinkingText(delta, state, events)

handleContent(delta, state, events)

handleToolCalls(delta, state, events)

handleFinish(choice, state, { events, chunk })

return events
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify handling of standalone reasoning_opaque deltas

The previous review flagged that deltas containing only reasoning_opaque (without content, tool_calls, reasoning_text, or finish_reason) may be sent by upstream APIs and could be dropped entirely.

Current flow: The main sequence (lines 36-44) invokes handlers that each check for specific fields:

  • handleThinkingText → requires reasoning_text (line 295)
  • handleContent → requires content (line 181)
  • handleToolCalls → requires tool_calls (line 103)
  • handleFinish → requires finish_reason (line 58)

If a delta contains only reasoning_opaque with none of the above, no handler processes it and the data is silently lost.

Expected scenarios where this might occur:

  1. Opaque signature arrives in a separate delta after reasoning_text deltas
  2. Signature-only delta without any preceding reasoning text

Please verify whether Copilot/OpenAI can emit deltas with only reasoning_opaque (no other fields). If so, consider adding explicit handling:

If standalone reasoning_opaque deltas are possible, you may need to add a dedicated handler or extend handleThinkingText to process reasoning_opaque even when reasoning_text is absent.


🏁 Script executed:

#!/bin/bash
# Search for any documentation or usage patterns related to reasoning_opaque handling
rg -n -C3 -P '\breasoning_opaque\b' --type=ts

Length of output: 5685


The web search is returning unrelated results about i18n translation libraries. Let me search the codebase directly for the handler functions and their invocations.

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Find the function names that contain lines 255 and 317
echo "=== Finding function containing line 255 ==="
sed -n '240,270p' src/routes/messages/stream-translation.ts

echo ""
echo "=== Finding function containing line 317 ==="
sed -n '290,330p' src/routes/messages/stream-translation.ts

echo ""
echo "=== Lines 36-46 to show main flow ==="
sed -n '36,46p' src/routes/messages/stream-translation.ts

Length of output: 2165


[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Search for where handleReasoningOpaque is called
echo "=== Checking if handleReasoningOpaque is called ==="
rg -n 'handleReasoningOpaque' src/routes/messages/stream-translation.ts

echo ""
echo "=== Full context of lines 250-270 to see full function ==="
sed -n '250,270p' src/routes/messages/stream-translation.ts

echo ""
echo "=== Check if there's another main flow or if this is the only one ==="
sed -n '1,60p' src/routes/messages/stream-translation.ts

Length of output: 2492


[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Get context around line 68 to see when handleReasoningOpaque is called
echo "=== Context around line 68 (handleReasoningOpaque call) ==="
sed -n '55,80p' src/routes/messages/stream-translation.ts

echo ""
echo "=== Full handleFinish function ==="
sed -n '47,120p' src/routes/messages/stream-translation.ts

Length of output: 2860


Missing handler for standalone reasoning_opaque deltas in main flow

Verification confirms the original concern: standalone reasoning_opaque deltas (containing only reasoning_opaque without reasoning_text, tool_calls, content, or finish_reason) are silently dropped.

Current invocation paths for handleReasoningOpaque:

  • Line 68: Called only when choice.finish_reason exists (end-of-message only)
  • Line 173: Called only when delta.tool_calls exist (tool handling only)
  • Line 317: Embedded in handleThinkingText, only when reasoning_text exists

Problem scenario: If a delta contains only { reasoning_opaque: "..." } with no other fields:

  • handleThinkingText skips it (no reasoning_text)
  • handleToolCalls skips it (no tool_calls)
  • handleFinish skips it (no finish_reason)
  • The reasoning_opaque is lost

Add explicit handling in main flow (after line 44, before handleFinish):

handleReasoningOpaque(delta, events, state)

Or add a conditional check in the main flow to process standalone reasoning_opaque deltas that were not handled by other handlers.

🤖 Prompt for AI Agents
In src/routes/messages/stream-translation.ts around lines 36 to 46, standalone
deltas that only contain reasoning_opaque are currently dropped because existing
handlers skip them; add an explicit call or conditional to invoke
handleReasoningOpaque for deltas that include delta.reasoning_opaque after
handleContent and before handleFinish so these deltas are processed, or
alternatively add a conditional branch in the main flow that checks if
delta.reasoning_opaque is present (and not already handled by other handlers)
and calls handleReasoningOpaque(delta, events, state).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant