Skip to content

Commit e78be09

Browse files
authored
docs: account for vLLM reasoning field migration in plan 343 (#377)
1 parent 3f8d735 commit e78be09

1 file changed

Lines changed: 18 additions & 1 deletion

File tree

plans/343/model-facade-overhaul-plan-step-1.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,10 @@ Implementation expectations:
455455
- Fallback mode: chat-completion image extraction for autoregressive models
456456
4. Parse usage from provider response if present.
457457
5. Normalize tool calls and reasoning fields.
458+
- Reasoning extraction must check both `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical) and `message.reasoning_content` (legacy/LiteLLM-normalized fallback), with `reasoning` taking precedence when both are present.
459+
- This dual-field check should live in a shared helper in `parsing.py` so it is reusable across adapters.
460+
- Internal canonical field remains `reasoning_content`; no downstream contract change.
461+
- See: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)
458462
6. Normalize image outputs from either `b64_json`, data URI, or URL download.
459463

460464
### Image routing ownership contract
@@ -947,7 +951,7 @@ OpenAI-compatible response parsing:
947951

948952
1. `choices[0].message.content` -> canonical `message.content`
949953
2. `choices[0].message.tool_calls[*]` -> canonical `ToolCall`
950-
3. `choices[0].message.reasoning_content` if present -> canonical `reasoning_content`
954+
3. `choices[0].message.reasoning` **or** `choices[0].message.reasoning_content` -> canonical `reasoning_content` (`reasoning` takes precedence as the vLLM >= 0.16.0 canonical field; `reasoning_content` is the legacy/LiteLLM fallback)
951955
4. `usage.prompt_tokens/completion_tokens` -> canonical `Usage`
952956

953957
### Canonical -> Anthropic messages payload
@@ -1347,6 +1351,10 @@ Per adapter:
13471351
5. retry behavior tests
13481352
6. adaptive throttling behavior tests (drop on 429, gradual recovery)
13491353
7. auth status mapping tests (`401 -> AUTHENTICATION`, `403 -> PERMISSION_DENIED`)
1354+
8. reasoning field migration tests:
1355+
- response with only `message.reasoning` (no `reasoning_content`) populates canonical `reasoning_content`
1356+
- response with only `message.reasoning_content` still works (backward compat)
1357+
- response with both fields uses `reasoning` (precedence rule)
13501358

13511359
Tools:
13521360

@@ -1516,6 +1524,15 @@ Mitigation:
15161524
1. central retry module with deterministic tests
15171525
2. preserve current defaults from `LiteLLMRouterDefaultKwargs`
15181526

1527+
### Risk: silent reasoning trace loss after LiteLLM removal
1528+
1529+
Mitigation:
1530+
1531+
1. vLLM >= 0.16.0 uses `message.reasoning` as canonical field; `reasoning_content` is deprecated/backward-compat. LiteLLM currently normalizes this for us, masking the gap.
1532+
2. Shared reasoning extraction helper in `parsing.py` checks `reasoning` first (canonical), falling back to `reasoning_content` (legacy).
1533+
3. Adapter unit tests cover all three cases (only `reasoning`, only `reasoning_content`, both present).
1534+
4. Ref: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)
1535+
15191536
### Risk: throttle oscillation or starvation under bursty load
15201537

15211538
Mitigation:

0 commit comments

Comments
 (0)