- 
                Notifications
    You must be signed in to change notification settings 
- Fork 663
fix: slipping <tool_calls_end> tool call parsing for DeepSeek v3.1 #3995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Guan Luo <[email protected]>
Signed-off-by: Guan Luo <[email protected]>
Signed-off-by: Guan Luo <[email protected]>
| @CodeRabbit full review | 
| WalkthroughRefines deepseek_v3_1 parsing to recognize wrapper start/end tokens, adds an early-return when no end token is present, updates extraction calls to use token slice references, renames two detector tests, and adds two integration tests exercising single-chunk and chunked inputs. Changes
 Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 
 Poem
 Pre-merge checks❌ Failed checks (2 warnings)
 ✅ Passed checks (1 passed)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
144-147: Consider simplifying token handling approach.The current approach clones the config tokens and extends them with hardcoded inner tokens. While functional, this could be clearer. Consider either:
- Including all necessary tokens directly in the config
- Adding a clear helper method that documents which tokens are used for what purpose
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
- lib/llm/tests/test_jail.rs(1 hunks)
- lib/parsers/src/tool_calling/config.rs(1 hunks)
- lib/parsers/src/tool_calling/json/deepseek_parser.rs(3 hunks)
- lib/parsers/src/tool_calling/parsers.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.
Applied to files:
- lib/parsers/src/tool_calling/json/deepseek_parser.rs
- lib/parsers/src/tool_calling/config.rs
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
PR: ai-dynamo/dynamo#2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.
Applied to files:
- lib/parsers/src/tool_calling/json/deepseek_parser.rs
- lib/parsers/src/tool_calling/config.rs
- lib/llm/tests/test_jail.rs
- lib/parsers/src/tool_calling/parsers.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.
Applied to files:
- lib/llm/tests/test_jail.rs
🧬 Code graph analysis (1)
lib/llm/tests/test_jail.rs (2)
lib/llm/src/protocols/openai/chat_completions/jail.rs (4)
builder(427-429)
new(103-110)
new(377-379)
new(760-767)lib/llm/src/utils/prefix_matcher.rs (2)
new(56-75)
new(168-200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: operator (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: clippy (.)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
lib/parsers/src/tool_calling/config.rs (1)
155-178: LGTM! Clear rationale for wrapper-only token configuration.The approach of using only the wrapper tokens (
<|tool▁calls▁begin|>and<|tool▁calls▁end|>) as the authoritative boundaries is well-documented. The comment clearly explains the design decision and links to the DeepSeek documentation for reference.lib/parsers/src/tool_calling/parsers.rs (1)
2415-2428: LGTM! Test naming and assertions align with new wrapper-based detection.The test renaming clearly distinguishes between incomplete tool-call detection (which should now fail) and complete detection with wrapper tokens (which should succeed). The inverted assertion at Line 2420 correctly validates that incomplete tool calls are no longer detected.
| ✅ Actions performedFull review triggered. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (2)
129-142: Comment typo.“if we may all call(s) tokens” -> “if we mark all call(s) tokens”.
- // This is because if we may all call(s) tokens, we are not properly grouping + // This is because if we mark all call(s) tokens, we are not properly grouping
275-277: Remove debug print in test.Drop the print! to keep tests quiet.
- print!("**** {}", content.clone().unwrap());lib/llm/tests/test_jail.rs (2)
1813-1816: Remove debug prints from the test.The println! lines add noise to CI output; please delete them.
- for result in results.iter() { - println!("Result chunk: {:?}", result); - }
1891-1893: Remove debug prints from the chunked test as well.Same as above; delete noisy println!.
- for result in results.iter() { - println!("Result chunk: {:?}", result); - }
🧹 Nitpick comments (6)
lib/parsers/src/tool_calling/config.rs (1)
156-177: Switch to wrapper tokens is correct; consider centralizing DeepSeek tokens.Using only <|tool▁calls▁begin|>/<|tool▁calls▁end|> in config reduces false positives; parser augments individual call tokens internally. Suggest promoting tokens to shared consts to avoid drift/typos across modules.
+// Consider near top of this module: +pub const DS_CALLS_BEGIN: &str = "<|tool▁calls▁begin|>"; +pub const DS_CALLS_END: &str = "<|tool▁calls▁end|>"; +pub const DS_CALL_BEGIN: &str = "<|tool▁call▁begin|>"; +pub const DS_CALL_END: &str = "<|tool▁call▁end|>"; +pub const DS_SEP: &str = "<|tool▁sep|>"; pub fn deepseek_v3_1() -> Self { Self { format: ToolCallParserType::Json, json: JsonParserConfig { - tool_call_start_tokens: vec![ - "<|tool▁calls▁begin|>".to_string(), - // "<|tool▁call▁begin|>".to_string(), - ], - tool_call_end_tokens: vec![ - "<|tool▁calls▁end|>".to_string(), - // "<|tool▁call▁end|>".to_string(), - ], - tool_call_separator_tokens: vec!["<|tool▁sep|>".to_string()], + tool_call_start_tokens: vec![DS_CALLS_BEGIN.to_string()], + tool_call_end_tokens: vec![DS_CALLS_END.to_string()], + tool_call_separator_tokens: vec![DS_SEP.to_string()], parser_type: JsonParserType::DeepseekV31, ..Default::default() }, } }lib/llm/tests/test_jail.rs (3)
1823-1839: Validate all DeepSeek calls, not just the first.Add assertions for get_weather_forecast and get_air_quality so regressions don’t slip by when parallel calls are present.
let tool_call_idx = results .iter() .position(test_utils::has_tool_call) .expect("Should have a tool call result"); - test_utils::assert_tool_call( - &results[tool_call_idx], - "get_current_weather", - json!({"location": "Berlin", "units": "metric"}), - ); + // Expect 3 tool calls in the same emitted chunk + let tc = &results[tool_call_idx]; + let calls = tc.data.as_ref().unwrap().choices[0] + .delta.tool_calls.as_ref().unwrap(); + assert_eq!(calls.len(), 3, "Should emit 3 tool calls"); + test_utils::assert_tool_call(tc, "get_current_weather", json!({"location": "Berlin", "units": "metric"})); + // Order is model/config dependent; match by name to be robust + let args: Vec<_> = calls.iter().map(|c| serde_json::from_str::<serde_json::Value>(c.function.as_ref().unwrap().arguments.as_ref().unwrap()).unwrap()).collect(); + assert!(args.iter().any(|a| a["days"] == 7 && a["units"] == "imperial"), "Missing forecast call with days=7"); + assert!(args.iter().any(|a| a["radius"] == 50), "Missing air_quality call with radius=50");
1847-1873: Extract token-aware splitter into a helper for reuse and clarity.Move the angle-bracket token splitter into test_utils to reduce duplication and improve readability.
- // Split text into words, treating angle-bracketed tokens as one word - let mut words = Vec::new(); - ... - let chunks = words - .into_iter() - .map(|word| create_mock_response_chunk(word, 0)) + let chunks = test_utils::split_tokens_as_words(text) + .into_iter() + .map(|word| create_mock_response_chunk(word, 0)) .collect::<Vec<_>>();// In test_utils: pub fn split_tokens_as_words(text: &str) -> Vec<String> { let mut words = Vec::new(); let chars: Vec<char> = text.chars().collect(); let mut i = 0; while i < chars.len() { if chars[i] == '<' { if let Some(end) = chars[i..].iter().position(|&c| c == '>') { words.push(chars[i..=i + end].iter().collect()); i += end + 1; } else { words.push(chars[i..].iter().collect()); break; } } else if chars[i].is_whitespace() { i += 1; } else { let start = i; while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '<' { i += 1; } words.push(chars[start..i].iter().collect()); } } words }
1832-1838: Also assert wrapper-begin tokens don’t leak into content.You already assert end-token absence; add begin-token absence for symmetry.
- assert!( - !content.contains("<|tool▁calls▁end|>"), - "Should not contain deepseek special tokens in content" - ); + assert!(!content.contains("<|tool▁calls▁end|>")); + assert!(!content.contains("<|tool▁calls▁begin|>"));Also applies to: 1910-1916
lib/parsers/src/tool_calling/json/deepseek_parser.rs (2)
144-148: Bound extraction to the wrapper block to avoid stray matches.Early-exit on missing end tokens is good. However, extraction runs over the whole input; constrain it to <|tool▁calls▁begin|>…<|tool▁calls▁end|> to prevent accidental matches outside the wrapper.
- // Extract individual tool call blocks - let blocks = extract_tool_call_blocks(trimmed, &tool_call_start_tokens, &tool_call_end_tokens); + // Extract individual tool call blocks within the wrapper bounds when available + let wrapper_begin = config.tool_call_start_tokens.iter() + .find(|t| t.contains("tool_calls_begin") || t.contains("tool▁calls▁begin")) + .and_then(|t| trimmed.find(t.as_str()).map(|idx| (idx, t.len()))); + let wrapper_end = config.tool_call_end_tokens.iter() + .find(|t| t.contains("tool_calls_end") || t.contains("tool▁calls▁end")) + .and_then(|t| trimmed.rfind(t.as_str())); + + let bounded = if let (Some((start, start_len)), Some(end)) = (wrapper_begin, wrapper_end) { + if start + start_len < end { + &trimmed[start + start_len..end] + } else { + trimmed + } + } else { + trimmed + }; + + let blocks = extract_tool_call_blocks(bounded, &tool_call_start_tokens, &tool_call_end_tokens);Also applies to: 186-187
216-250: Optional: avoid repeated String allocations in partial-detection.Collecting prefixes each iteration is fine here, but you can micro-opt by building once and slicing by char indices.
- config.tool_call_start_tokens.iter().any(|token| { + config.tool_call_start_tokens.iter().any(|token| { if token.is_empty() { return false; } - // Check if the chunk could be a prefix of this start token - // Handle Unicode character boundaries properly - for i in 1..=token.chars().count() { - if let Some(prefix) = token.chars().take(i).collect::<String>().get(..) { - let prefix_str = &prefix[..prefix.len()]; - if trimmed == prefix_str || trimmed.ends_with(prefix_str) { - return true; - } - } - } + let chars: Vec<char> = token.chars().collect(); + for i in 1..=chars.len() { + let prefix: String = chars[..i].iter().collect(); + if trimmed == prefix || trimmed.ends_with(&prefix) { + return true; + } + } false })
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
- lib/llm/tests/test_jail.rs(1 hunks)
- lib/parsers/src/tool_calling/config.rs(1 hunks)
- lib/parsers/src/tool_calling/json/deepseek_parser.rs(3 hunks)
- lib/parsers/src/tool_calling/parsers.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
PR: ai-dynamo/dynamo#2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.
Applied to files:
- lib/parsers/src/tool_calling/config.rs
- lib/parsers/src/tool_calling/json/deepseek_parser.rs
- lib/parsers/src/tool_calling/parsers.rs
- lib/llm/tests/test_jail.rs
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.
Applied to files:
- lib/parsers/src/tool_calling/config.rs
- lib/parsers/src/tool_calling/json/deepseek_parser.rs
📚 Learning: 2025-09-11T03:24:47.820Z
Learnt from: kthui
PR: ai-dynamo/dynamo#3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.
Applied to files:
- lib/parsers/src/tool_calling/json/deepseek_parser.rs
- lib/llm/tests/test_jail.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.
Applied to files:
- lib/llm/tests/test_jail.rs
📚 Learning: 2025-09-19T07:32:44.210Z
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#0
File: :0-0
Timestamp: 2025-09-19T07:32:44.210Z
Learning: The SGLang backend has debug print statements in _get_input_param() and _process_text_stream() methods that should be removed for production as they cause synchronous I/O in async contexts.
Applied to files:
- lib/llm/tests/test_jail.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build and Test - dynamo
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (2)
lib/parsers/src/tool_calling/parsers.rs (2)
2416-2421: DeepSeek v3.1: correct inversion for incomplete single-call token.Asserting false when only <|tool▁call▁begin|>…<|tool▁call▁end|> appears (no wrapper) matches the new wrapper-first detection. LGTM.
2424-2428: Detection aligned to wrapper-start token.Asserting true when the wrapper begin token is present (even without the wrapper end yet) matches streaming jailing behavior. Good.
Signed-off-by: Guan Luo <[email protected]>
Signed-off-by: Guan Luo <[email protected]>
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Bug Fixes
Tests