Skip to content

Conversation

@GuanLuo
Copy link
Contributor

@GuanLuo GuanLuo commented Oct 30, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes

    • Improved Deepseek V3.1 tool-call parsing to correctly group wrapped tool-call blocks and avoid misgrouping when end markers are missing.
  • Tests

    • Added end-to-end tests covering single- and multi-chunk tool-call inputs and refined detector tests to validate incomplete vs. multi-start detection.

@GuanLuo GuanLuo requested a review from a team as a code owner October 30, 2025 18:18
@GuanLuo GuanLuo changed the title ix: fix slipping <tool_calls_end> tool call parsing for DeepSeek v3.1 fix: slipping <tool_calls_end> tool call parsing for DeepSeek v3.1 Oct 30, 2025
@github-actions github-actions bot added the fix label Oct 30, 2025
@GuanLuo GuanLuo linked an issue Oct 30, 2025 that may be closed by this pull request
Signed-off-by: Guan Luo <[email protected]>
@GuanLuo
Copy link
Contributor Author

GuanLuo commented Oct 30, 2025

@CodeRabbit full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 30, 2025

Walkthrough

Refines deepseek_v3_1 parsing to recognize wrapper start/end tokens, adds an early-return when no end token is present, updates extraction calls to use token slice references, renames two detector tests, and adds two integration tests exercising single-chunk and chunked inputs.

Changes

Cohort / File(s) Change Summary
Token configuration
lib/parsers/src/tool_calling/config.rs
Switched deepseek_v3_1 start/end tokens to wrapper markers <|tool▁calls▁begin|> / <|tool▁calls▁end|> (previous per-call tokens commented out); added explanatory comment about whole-block wrapping.
Deepseek JSON parser
lib/parsers/src/tool_calling/json/deepseek_parser.rs
Added check for absence of end token to return early with trimmed content; extended start/end token lists to include wrapper tokens; changed call to extract_tool_call_blocks to pass token slices by reference; added/adjusted assertions in tests ensuring content present and wrapper end token absent.
Detector tests (parsers)
lib/parsers/src/tool_calling/parsers.rs
Renamed test_e2e_detect_tool_call_start_deepseek_v3_1test_e2e_detect_incomplete_tool_call_start_deepseek_v3_1 and inverted its assertion to expect no detection; renamed another test to test_e2e_detect_tool_call_start_deepseek_v3_1 preserving multi-start detection behavior.
Integration tests (llm/jail)
lib/llm/tests/test_jail.rs
Added two async tests: test_deepseek_v3_1 (single text chunk with multiple encoded tool calls) and test_deepseek_v3_1_chunk (word-like chunked input); both validate parsed tool calls, argument values, and that emitted content excludes wrapper end tokens.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Pay attention to the early-return behavior in deepseek_parser.rs to ensure it doesn't regress multi-call extraction in edge cases.
  • Verify token configuration in config.rs matches parser expectations and test inputs.
  • Confirm the renamed/inverted detector test accurately reflects intended detection semantics.
  • Review new integration tests in lib/llm/tests/test_jail.rs for reliability and correct mocking of chunked responses.

Poem

🐰
I nibble tokens, one by one,
Wrapped calls now gather, all as one.
I hop through chunks and stitch the stream,
No stray end-markers break the dream.
Hooray — parsed calls in tidy team! ✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description provided is essentially a template skeleton with no substantive content. All required sections—Overview, Details, Where should the reviewer start, and Related Issues—contain only placeholder text or HTML comments without any actual information about the changes, rationale, or related issues. The Related Issues section references "closes GitHub issue: #xxx" which is a placeholder that was never filled in. For a pull request addressing multiple files and implementing a non-trivial fix, this level of incompleteness fails to meet the minimum expectations for a code review document. The author should complete the PR description by filling in each section with substantive information: provide an overview of the tool call parsing fix, detail the specific changes across the four files (config, parser, tests), explain why the token update was necessary, highlight which files deserve closest review attention, and reference the actual GitHub issue number being resolved. Even a brief but complete description would significantly improve clarity for reviewers.
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "fix: slipping <tool_calls_end> tool call parsing for DeepSeek v3.1" is directly related to the main changes in the pull request. The changeset focuses on fixing DeepSeek v3.1 tool call parsing by updating token configurations (changing from singular to plural forms: "tool_call" → "tool_calls"), enhancing the parser logic to properly handle wrapper tokens, and adding tests to validate the fix. The title correctly captures this fix and references the end token issue, making it clear and specific enough for a teammate reviewing the history. While the word "slipping" could be slightly more precise in phrasing, it adequately conveys that this addresses a parsing issue related to the end token.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)

144-147: Consider simplifying token handling approach.

The current approach clones the config tokens and extends them with hardcoded inner tokens. While functional, this could be clearer. Consider either:

  1. Including all necessary tokens directly in the config
  2. Adding a clear helper method that documents which tokens are used for what purpose
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1da9d70 and fe48e86.

📒 Files selected for processing (4)
  • lib/llm/tests/test_jail.rs (1 hunks)
  • lib/parsers/src/tool_calling/config.rs (1 hunks)
  • lib/parsers/src/tool_calling/json/deepseek_parser.rs (3 hunks)
  • lib/parsers/src/tool_calling/parsers.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.

Applied to files:

  • lib/parsers/src/tool_calling/json/deepseek_parser.rs
  • lib/parsers/src/tool_calling/config.rs
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
PR: ai-dynamo/dynamo#2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

  • lib/parsers/src/tool_calling/json/deepseek_parser.rs
  • lib/parsers/src/tool_calling/config.rs
  • lib/llm/tests/test_jail.rs
  • lib/parsers/src/tool_calling/parsers.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.

Applied to files:

  • lib/llm/tests/test_jail.rs
🧬 Code graph analysis (1)
lib/llm/tests/test_jail.rs (2)
lib/llm/src/protocols/openai/chat_completions/jail.rs (4)
  • builder (427-429)
  • new (103-110)
  • new (377-379)
  • new (760-767)
lib/llm/src/utils/prefix_matcher.rs (2)
  • new (56-75)
  • new (168-200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: operator (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
lib/parsers/src/tool_calling/config.rs (1)

155-178: LGTM! Clear rationale for wrapper-only token configuration.

The approach of using only the wrapper tokens (<|tool▁calls▁begin|> and <|tool▁calls▁end|>) as the authoritative boundaries is well-documented. The comment clearly explains the design decision and links to the DeepSeek documentation for reference.

lib/parsers/src/tool_calling/parsers.rs (1)

2415-2428: LGTM! Test naming and assertions align with new wrapper-based detection.

The test renaming clearly distinguishes between incomplete tool-call detection (which should now fail) and complete detection with wrapper tokens (which should succeed). The inverted assertion at Line 2420 correctly validates that incomplete tool calls are no longer detected.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 30, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (2)

129-142: Comment typo.

“if we may all call(s) tokens” -> “if we mark all call(s) tokens”.

-    // This is because if we may all call(s) tokens, we are not properly grouping
+    // This is because if we mark all call(s) tokens, we are not properly grouping

275-277: Remove debug print in test.

Drop the print! to keep tests quiet.

-        print!("**** {}", content.clone().unwrap());
lib/llm/tests/test_jail.rs (2)

1813-1816: Remove debug prints from the test.

The println! lines add noise to CI output; please delete them.

-        for result in results.iter() {
-            println!("Result chunk: {:?}", result);
-        }

1891-1893: Remove debug prints from the chunked test as well.

Same as above; delete noisy println!.

-        for result in results.iter() {
-            println!("Result chunk: {:?}", result);
-        }
🧹 Nitpick comments (6)
lib/parsers/src/tool_calling/config.rs (1)

156-177: Switch to wrapper tokens is correct; consider centralizing DeepSeek tokens.

Using only <|tool▁calls▁begin|>/<|tool▁calls▁end|> in config reduces false positives; parser augments individual call tokens internally. Suggest promoting tokens to shared consts to avoid drift/typos across modules.

+// Consider near top of this module:
+pub const DS_CALLS_BEGIN: &str = "<|tool▁calls▁begin|>";
+pub const DS_CALLS_END: &str = "<|tool▁calls▁end|>";
+pub const DS_CALL_BEGIN: &str = "<|tool▁call▁begin|>";
+pub const DS_CALL_END: &str = "<|tool▁call▁end|>";
+pub const DS_SEP: &str = "<|tool▁sep|>";

 pub fn deepseek_v3_1() -> Self {
     Self {
         format: ToolCallParserType::Json,
         json: JsonParserConfig {
-            tool_call_start_tokens: vec![
-                "<|tool▁calls▁begin|>".to_string(),
-                // "<|tool▁call▁begin|>".to_string(),
-            ],
-            tool_call_end_tokens: vec![
-                "<|tool▁calls▁end|>".to_string(),
-                // "<|tool▁call▁end|>".to_string(),
-            ],
-            tool_call_separator_tokens: vec!["<|tool▁sep|>".to_string()],
+            tool_call_start_tokens: vec![DS_CALLS_BEGIN.to_string()],
+            tool_call_end_tokens: vec![DS_CALLS_END.to_string()],
+            tool_call_separator_tokens: vec![DS_SEP.to_string()],
             parser_type: JsonParserType::DeepseekV31,
             ..Default::default()
         },
     }
 }
lib/llm/tests/test_jail.rs (3)

1823-1839: Validate all DeepSeek calls, not just the first.

Add assertions for get_weather_forecast and get_air_quality so regressions don’t slip by when parallel calls are present.

         let tool_call_idx = results
             .iter()
             .position(test_utils::has_tool_call)
             .expect("Should have a tool call result");
-        test_utils::assert_tool_call(
-            &results[tool_call_idx],
-            "get_current_weather",
-            json!({"location": "Berlin", "units": "metric"}),
-        );
+        // Expect 3 tool calls in the same emitted chunk
+        let tc = &results[tool_call_idx];
+        let calls = tc.data.as_ref().unwrap().choices[0]
+            .delta.tool_calls.as_ref().unwrap();
+        assert_eq!(calls.len(), 3, "Should emit 3 tool calls");
+        test_utils::assert_tool_call(tc, "get_current_weather", json!({"location": "Berlin", "units": "metric"}));
+        // Order is model/config dependent; match by name to be robust
+        let args: Vec<_> = calls.iter().map(|c| serde_json::from_str::<serde_json::Value>(c.function.as_ref().unwrap().arguments.as_ref().unwrap()).unwrap()).collect();
+        assert!(args.iter().any(|a| a["days"] == 7 && a["units"] == "imperial"), "Missing forecast call with days=7");
+        assert!(args.iter().any(|a| a["radius"] == 50), "Missing air_quality call with radius=50");

1847-1873: Extract token-aware splitter into a helper for reuse and clarity.

Move the angle-bracket token splitter into test_utils to reduce duplication and improve readability.

-        // Split text into words, treating angle-bracketed tokens as one word
-        let mut words = Vec::new();
-        ...
-        let chunks = words
-            .into_iter()
-            .map(|word| create_mock_response_chunk(word, 0))
+        let chunks = test_utils::split_tokens_as_words(text)
+            .into_iter()
+            .map(|word| create_mock_response_chunk(word, 0))
             .collect::<Vec<_>>();
// In test_utils:
pub fn split_tokens_as_words(text: &str) -> Vec<String> {
    let mut words = Vec::new();
    let chars: Vec<char> = text.chars().collect();
    let mut i = 0;
    while i < chars.len() {
        if chars[i] == '<' {
            if let Some(end) = chars[i..].iter().position(|&c| c == '>') {
                words.push(chars[i..=i + end].iter().collect());
                i += end + 1;
            } else {
                words.push(chars[i..].iter().collect());
                break;
            }
        } else if chars[i].is_whitespace() {
            i += 1;
        } else {
            let start = i;
            while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '<' {
                i += 1;
            }
            words.push(chars[start..i].iter().collect());
        }
    }
    words
}

1832-1838: Also assert wrapper-begin tokens don’t leak into content.

You already assert end-token absence; add begin-token absence for symmetry.

-                    assert!(
-                        !content.contains("<|tool▁calls▁end|>"),
-                        "Should not contain deepseek special tokens in content"
-                    );
+                    assert!(!content.contains("<|tool▁calls▁end|>"));
+                    assert!(!content.contains("<|tool▁calls▁begin|>"));

Also applies to: 1910-1916

lib/parsers/src/tool_calling/json/deepseek_parser.rs (2)

144-148: Bound extraction to the wrapper block to avoid stray matches.

Early-exit on missing end tokens is good. However, extraction runs over the whole input; constrain it to <|tool▁calls▁begin|>…<|tool▁calls▁end|> to prevent accidental matches outside the wrapper.

-    // Extract individual tool call blocks
-    let blocks = extract_tool_call_blocks(trimmed, &tool_call_start_tokens, &tool_call_end_tokens);
+    // Extract individual tool call blocks within the wrapper bounds when available
+    let wrapper_begin = config.tool_call_start_tokens.iter()
+        .find(|t| t.contains("tool_calls_begin") || t.contains("tool▁calls▁begin"))
+        .and_then(|t| trimmed.find(t.as_str()).map(|idx| (idx, t.len())));
+    let wrapper_end = config.tool_call_end_tokens.iter()
+        .find(|t| t.contains("tool_calls_end") || t.contains("tool▁calls▁end"))
+        .and_then(|t| trimmed.rfind(t.as_str()));
+
+    let bounded = if let (Some((start, start_len)), Some(end)) = (wrapper_begin, wrapper_end) {
+        if start + start_len < end {
+            &trimmed[start + start_len..end]
+        } else {
+            trimmed
+        }
+    } else {
+        trimmed
+    };
+
+    let blocks = extract_tool_call_blocks(bounded, &tool_call_start_tokens, &tool_call_end_tokens);

Also applies to: 186-187


216-250: Optional: avoid repeated String allocations in partial-detection.

Collecting prefixes each iteration is fine here, but you can micro-opt by building once and slicing by char indices.

-    config.tool_call_start_tokens.iter().any(|token| {
+    config.tool_call_start_tokens.iter().any(|token| {
         if token.is_empty() {
             return false;
         }
-        // Check if the chunk could be a prefix of this start token
-        // Handle Unicode character boundaries properly
-        for i in 1..=token.chars().count() {
-            if let Some(prefix) = token.chars().take(i).collect::<String>().get(..) {
-                let prefix_str = &prefix[..prefix.len()];
-                if trimmed == prefix_str || trimmed.ends_with(prefix_str) {
-                    return true;
-                }
-            }
-        }
+        let chars: Vec<char> = token.chars().collect();
+        for i in 1..=chars.len() {
+            let prefix: String = chars[..i].iter().collect();
+            if trimmed == prefix || trimmed.ends_with(&prefix) {
+                return true;
+            }
+        }
         false
     })
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1da9d70 and c56dded.

📒 Files selected for processing (4)
  • lib/llm/tests/test_jail.rs (1 hunks)
  • lib/parsers/src/tool_calling/config.rs (1 hunks)
  • lib/parsers/src/tool_calling/json/deepseek_parser.rs (3 hunks)
  • lib/parsers/src/tool_calling/parsers.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
PR: ai-dynamo/dynamo#2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

  • lib/parsers/src/tool_calling/config.rs
  • lib/parsers/src/tool_calling/json/deepseek_parser.rs
  • lib/parsers/src/tool_calling/parsers.rs
  • lib/llm/tests/test_jail.rs
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.

Applied to files:

  • lib/parsers/src/tool_calling/config.rs
  • lib/parsers/src/tool_calling/json/deepseek_parser.rs
📚 Learning: 2025-09-11T03:24:47.820Z
Learnt from: kthui
PR: ai-dynamo/dynamo#3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.

Applied to files:

  • lib/parsers/src/tool_calling/json/deepseek_parser.rs
  • lib/llm/tests/test_jail.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.

Applied to files:

  • lib/llm/tests/test_jail.rs
📚 Learning: 2025-09-19T07:32:44.210Z
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#0
File: :0-0
Timestamp: 2025-09-19T07:32:44.210Z
Learning: The SGLang backend has debug print statements in _get_input_param() and _process_text_stream() methods that should be removed for production as they cause synchronous I/O in async contexts.

Applied to files:

  • lib/llm/tests/test_jail.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (2)
lib/parsers/src/tool_calling/parsers.rs (2)

2416-2421: DeepSeek v3.1: correct inversion for incomplete single-call token.

Asserting false when only <|tool▁call▁begin|>…<|tool▁call▁end|> appears (no wrapper) matches the new wrapper-first detection. LGTM.


2424-2428: Detection aligned to wrapper-start token.

Asserting true when the wrapper begin token is present (even without the wrapper end yet) matches streaming jailing behavior. Good.

Signed-off-by: Guan Luo <[email protected]>
Signed-off-by: Guan Luo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: tool_call_end is not handled for DeepSeek V3.1 parser

2 participants