You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: plans/343/model-facade-overhaul-plan-step-1.md
+18-1Lines changed: 18 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -455,6 +455,10 @@ Implementation expectations:
455
455
- Fallback mode: chat-completion image extraction for autoregressive models
456
456
4. Parse usage from provider response if present.
457
457
5. Normalize tool calls and reasoning fields.
458
+
- Reasoning extraction must check both `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical) and `message.reasoning_content` (legacy/LiteLLM-normalized fallback), with `reasoning` taking precedence when both are present.
459
+
- This dual-field check should live in a shared helper in `parsing.py` so it is reusable across adapters.
460
+
- Internal canonical field remains `reasoning_content`; no downstream contract change.
- response with only `message.reasoning` (no `reasoning_content`) populates canonical `reasoning_content`
1356
+
- response with only `message.reasoning_content` still works (backward compat)
1357
+
- response with both fields uses `reasoning` (precedence rule)
1350
1358
1351
1359
Tools:
1352
1360
@@ -1516,6 +1524,15 @@ Mitigation:
1516
1524
1. central retry module with deterministic tests
1517
1525
2. preserve current defaults from `LiteLLMRouterDefaultKwargs`
1518
1526
1527
+
### Risk: silent reasoning trace loss after LiteLLM removal
1528
+
1529
+
Mitigation:
1530
+
1531
+
1. vLLM >= 0.16.0 uses `message.reasoning` as canonical field; `reasoning_content` is deprecated/backward-compat. LiteLLM currently normalizes this for us, masking the gap.
1532
+
2. Shared reasoning extraction helper in `parsing.py` checks `reasoning` first (canonical), falling back to `reasoning_content` (legacy).
1533
+
3. Adapter unit tests cover all three cases (only `reasoning`, only `reasoning_content`, both present).
0 commit comments