Skip to content

Conversation

@dominicshanshan
Copy link
Collaborator

@dominicshanshan dominicshanshan commented Nov 19, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced logprobs_mode parameter to configure log probability computation
    • Log probabilities now apply sampling transformations (temperature, top-k, top-p filtering) to logits before processing
    • New "processed_logprobs" mode enabled by default with full backward compatibility
  • Tests

    • Added comprehensive test coverage for processed logprobs across various sampling strategies and temperature settings

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a new "processed_logprobs" log probability computation mode for the PyTorch backend. It adds a logprobs_mode configuration parameter to the sampling and executor request pipeline, implements a process_logits function to apply sampling transformations before log probability calculation, and threads logprob parameters through the executor request chain. New test cases validate the processed logprobs behavior across various sampling configurations.

Changes

Cohort / File(s) Summary
Type system and sampling configuration
tensorrt_llm/sampling_params.py
Introduces LogprobsMode type alias for "processed_logprobs" and adds logprobs_mode field with default value to both LogprobParams NamedTuple and SamplingParams dataclass.
PyTorch executor request handling
tensorrt_llm/_torch/pyexecutor/llm_request.py
Conditionally transfers private _logprob_params attribute from executor request to constructed llm_request object.
PyTorch sampler implementation
tensorrt_llm/_torch/pyexecutor/sampler.py
Adds conditional branching on logprobs_mode to compute log probabilities. For "processed_logprobs" mode, applies sampling transformations (via process_logits) to logits before log_softmax; otherwise applies log_softmax directly to raw logits.
PyTorch sampling utilities
tensorrt_llm/_torch/pyexecutor/sampling_utils.py
Adds new process_logits(strategy, logits) function to mask logits with -inf and apply temperature scaling. Updates sample() return type signature to indicate three-tuple (though implementation still returns two items).
Executor request pipeline
tensorrt_llm/executor/base_worker.py
In _enqueue_request, retrieves logprob parameters via _get_logprob_params for PyTorch backends and attaches them as _logprob_params to the generated executor request.
Executor logprob configuration
tensorrt_llm/executor/executor.py
In _get_logprob_params, passes logprobs_mode from request.sampling_params.logprobs_mode to the LogprobParams constructor.
Result computation documentation
tensorrt_llm/executor/result.py
Adds explicit Args section to compute_logprobs docstring and includes minor formatting adjustment in _topk_logprobs.
Integration tests
tests/unittest/llmapi/test_llm_pytorch.py
Adds five new test methods validating processed_logprobs behavior: test_llm_logprobs_modes_basic, test_llm_processed_logprobs_with_temperature, test_llm_processed_logprobs_with_greedy_sampling, test_llm_logprobs_mode_backward_compatibility, and test_llm_processed_logprobs_with_top_p. Tests verify non-positivity of logprobs and correct masking behavior across varying sampling configurations.

Sequence Diagram

sequenceDiagram
    actor User
    participant LLM as LLM API
    participant Executor as Executor
    participant BaseWorker as BaseWorker
    participant PyTorch as PyTorch Sampler
    
    User->>LLM: Request with SamplingParams.logprobs_mode
    LLM->>Executor: GenerateRequest
    Executor->>Executor: _get_logprob_params(request)
    Note over Executor: Extract logprobs_mode from<br/>SamplingParams
    Executor->>BaseWorker: _enqueue_request
    BaseWorker->>BaseWorker: _get_logprob_params(request)
    Note over BaseWorker: Attach LogprobParams<br/>as _logprob_params
    BaseWorker->>PyTorch: executor_request with<br/>_logprob_params
    PyTorch->>PyTorch: Check logprobs_mode
    alt logprobs_mode == "processed_logprobs"
        PyTorch->>PyTorch: process_logits(strategy, logits)
        Note over PyTorch: Apply temperature scaling<br/>and top-k/top-p masking
        PyTorch->>PyTorch: log_softmax(processed)
    else other modes
        PyTorch->>PyTorch: log_softmax(raw_logits)
    end
    PyTorch->>User: Response with logprobs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Areas requiring extra attention:
    • tensorrt_llm/_torch/pyexecutor/sampling_utils.py: Return type signature mismatch between docstring (tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]]) and actual implementation (returns two items). Verify if this is intentional or if the implementation needs updating.
    • tensorrt_llm/_torch/pyexecutor/sampler.py: Logic branch for processed_logprobs mode—ensure masking and log_softmax operations are applied in correct order and match expected behavior across all sampling strategies.
    • Test coverage: Verify that test cases exercise the boundary between processed_logprobs and other modes, and that the -inf masking behavior is correctly validated across all strategy combinations (top_k, top_p, temperature, greedy).

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description consists entirely of the repository template and bot help documentation without providing a concrete explanation of changes, test coverage, or completed checklist items. Replace template content with actual description explaining the processed logprobs implementation, list test cases added (test_llm_logprobs_modes_basic, etc.), and confirm checklist items completion.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][feat] Add processed logprobs' clearly describes the main feature addition in the changeset - implementing processed logprobs functionality across the PyTorch executor.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/pyexecutor/sampling_utils.py (1)

322-366: Fix inconsistent return type annotation.

The function signature at line 322 declares a return type of tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]] (3 elements), but the implementation at line 366 returns only tokens, softmax (2 elements). The docstring at line 333 also describes a 2-tuple return value.

This inconsistency will cause type checking errors and mislead callers about the function's actual return value.

Apply this diff to correct the return type annotation:

-) -> tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]]:
+) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
     """
     Sample from logits using the specified strategy.
 
     Args:
         strategy: Sampling strategy tuple (strategy_name, *params)
         logits: Input logits tensor
         generator: Optional random generator
         return_probs: If True, return softmax probabilities
 
     Returns:
         Tuple of (sampled_tokens, softmax_probs)
     """
🧹 Nitpick comments (7)
tensorrt_llm/sampling_params.py (1)

5-5: Logprobs mode wiring is consistent; consider light validation & doc alignment

The introduction of LogprobsMode, the logprobs_mode field on LogprobParams, and the corresponding SamplingParams.logprobs_mode defaulting to "processed_logprobs" is internally consistent and threads the new mode cleanly through the config objects.

Two small follow‑ups you may want to consider:

  • The SamplingParams docstring documents logprobs_mode as str, while the field is typed as LogprobsMode = Literal["processed_logprobs"]. Either broaden the type alias or tighten the docstring so they agree.
  • There’s currently no runtime validation of logprobs_mode; if/when more modes are added, adding a simple check in _validate() (or normalizing unknown values to a default) would make the API failure mode clearer.

Also applies to: 13-17, 185-187, 263-265

tests/unittest/llmapi/test_llm_pytorch.py (2)

969-1003: Fix unused loop variable in temperature test (B007)

token_id is unused in the loop over first_token_logprobs.items(), which triggers Ruff B007 and slightly obscures intent. Renaming it to _token_id keeps the API clear and silences the warning without behavior change.

-    for token_id, logprob_obj in first_token_logprobs.items():
+    for _token_id, logprob_obj in first_token_logprobs.items():
         assert logprob_obj.logprob <= 0.0, (
             f"processed_logprobs should have non-positive values, got {logprob_obj.logprob}"
         )

1080-1124: Strengthen top‑p processed_logprobs test to actually check masking

The current test_llm_processed_logprobs_with_top_p only inspects the last iteration’s logprob_values, and it never asserts that any -inf values are present, despite the comment. As written, it mainly re‑checks that non‑-inf values are ≤ 0.0, which other tests already cover.

You can tighten this without changing semantics by aggregating across all positions and asserting -inf masking only when top_p < 1.0:

-    all_logprobs = outputs[0].outputs[0].logprobs
-    for token_idx, token_logprobs in enumerate(all_logprobs):
-        logprob_values = [obj.logprob for obj in token_logprobs.values()]
-        if token_idx == 0:
-            print(f"First token processed_logprobs values: {logprob_values}")
-        if any(val == float("-inf") for val in logprob_values):
-            break
-    # All non-inf values should be non-positive (log probabilities)
-    non_inf_values = [v for v in logprob_values if v != float("-inf")]
-    if non_inf_values:
-        assert all(v <= 0.0 for v in non_inf_values), (
-            "processed_logprobs non-inf values should be non-positive")
+    all_logprobs = outputs[0].outputs[0].logprobs
+    has_neg_inf = False
+    non_inf_values: list[float] = []
+    for token_idx, token_logprobs in enumerate(all_logprobs):
+        logprob_values = [obj.logprob for obj in token_logprobs.values()]
+        if token_idx == 0:
+            print(f"First token processed_logprobs values: {logprob_values}")
+        if any(val == float("-inf") for val in logprob_values):
+            has_neg_inf = True
+        non_inf_values.extend(v for v in logprob_values if v != float("-inf"))
+
+    # With top_p < 1.0 we expect at least some logits to be masked to -inf.
+    if top_p < 1.0:
+        assert has_neg_inf, "expected some processed_logprobs to be -inf when top_p < 1.0"
+
+    # All non-inf values should be non-positive (log probabilities)
+    if non_inf_values:
+        assert all(v <= 0.0 for v in non_inf_values), (
+            "processed_logprobs non-inf values should be non-positive"
+        )
tensorrt_llm/executor/base_worker.py (1)

570-574: Correctly threading logprob params into PyTorch executor requests

Attaching _logprob_params to executor_request for the PyTorch backend lines up with how executor_request_to_llm_request() picks it up and exposes it to TorchSampler. This keeps TRT behavior unchanged and preserves the existing _get_logprob_params usage for GenerationResult.

The extra _get_logprob_params call here duplicates the one in submit(), but the cost is negligible; if this becomes hot, you could later pass the precomputed logprob_params into _enqueue_request() instead.

tensorrt_llm/executor/result.py (1)

1019-1025: Docstring update is helpful; consider reflecting Optional[int] usage

The expanded Args section on compute_logprobs makes the function contract much clearer. One small mismatch: callers pass k_prompt_logprobs / k_logprobs as Optional[int] (and the implementation does if k_prompt_logprobs and ...), so documenting these as “int” only is slightly misleading. If you touch this again, calling out them as “int or None” (or Optional[int]) would more accurately reflect how they’re used.

tensorrt_llm/_torch/pyexecutor/sampler.py (1)

62-74: Processed‑logprobs implementation in TorchSampler is consistent with sampling; clarify comment & future‑proofing

The new processed_logprobs path in _process_requests looks sound:

  • It uses _request_strategy + process_logits per request to apply the same temperature / top‑k / top‑p transformations that sampling sees, then applies log_softmax, so the returned logprobs are truly “post‑sampling‑transform” while still using the original logits tensor.
  • Row ordering of logprobs_cuda remains consistent with the existing logprobs_req_indices / req_num_steps splitting logic, since you build processed_logits_list in the same request/step order that _PackedStepIndexer uses.
  • When _logprob_params is absent (older flows or TRT), logprobs_mode stays None and you fall back to the old behavior computing logprobs from raw_logits_cuda, preserving backward compatibility.

Two small improvements to consider:

  • The comment above this block still says logprobs “are specified to not reflect temperature scaling, top‑k/top‑p masking, etc.” — that’s no longer true in processed_logprobs mode. It’d be good to explicitly distinguish the two modes in that comment.
  • Right now you pick logprobs_mode from the first request with py_num_logprobs. That’s fine while there’s only "processed_logprobs", but if additional modes are added later and mixed in a batch, this will silently treat them all the same. A brief assertion or TODO about assuming uniform logprobs_mode across batched requests would make that constraint explicit.

Also applies to: 1743-1788

tensorrt_llm/_torch/pyexecutor/sampling_utils.py (1)

251-313: Consider refactoring to reduce code duplication.

The process_logits function duplicates significant logic from the existing sampling functions (top_k_sampling_batch, top_p_sampling_batch, top_k_top_p_sampling_batch). This duplication creates a maintenance burden where changes to sampling logic must be mirrored in both places.

Additionally, for the top_p and top_k_top_p cases (lines 281 and 300), computing cumulative_probs via torch.softmax during logit processing is computationally expensive and may impact performance.

Consider one of these approaches:

Option 1: Extract shared logic into helper functions that both process_logits and the sampling functions can use.

Option 2: Refactor existing sampling functions to use process_logits internally:

def top_k_sampling_batch(logits, *, top_k: int, temperature: float, generator=None):
    processed_logits = process_logits(("top_k", top_k, temperature), logits)
    softmax = torch.softmax(processed_logits, dim=-1)
    next_tokens = torch.multinomial(softmax, num_samples=1, generator=generator).squeeze(-1)
    return next_tokens, softmax

This would eliminate duplication and ensure consistency between logit processing and sampling.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46dd988 and a24df40.

📒 Files selected for processing (8)
  • tensorrt_llm/_torch/pyexecutor/llm_request.py (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/sampler.py (3 hunks)
  • tensorrt_llm/_torch/pyexecutor/sampling_utils.py (1 hunks)
  • tensorrt_llm/executor/base_worker.py (1 hunks)
  • tensorrt_llm/executor/executor.py (1 hunks)
  • tensorrt_llm/executor/result.py (2 hunks)
  • tensorrt_llm/sampling_params.py (4 hunks)
  • tests/unittest/llmapi/test_llm_pytorch.py (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-08-21T00:16:56.457Z
Learnt from: farshadghodsian
Repo: NVIDIA/TensorRT-LLM PR: 7101
File: docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md:36-36
Timestamp: 2025-08-21T00:16:56.457Z
Learning: TensorRT-LLM container release tags in documentation should only reference published NGC container images. The README badge version may be ahead of the actual published container versions.

Applied to files:

  • tensorrt_llm/executor/result.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/llmapi/test_llm_pytorch.py
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

Applied to files:

  • tests/unittest/llmapi/test_llm_pytorch.py
📚 Learning: 2025-08-28T10:25:22.370Z
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/pyexecutor/sampler.py:887-891
Timestamp: 2025-08-28T10:25:22.370Z
Learning: In tensorrt_llm/_torch/pyexecutor/sampler.py, the draft_probs and target_probs tensors have shapes [1, steps] not [steps, vocab_size] as might be expected, making the .squeeze(0) operations appropriate for removing the batch dimension of size 1.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/sampler.py
  • tensorrt_llm/executor/executor.py
📚 Learning: 2025-08-19T12:45:11.997Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 7033
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:0-0
Timestamp: 2025-08-19T12:45:11.997Z
Learning: In tensorrt_llm/_torch/pyexecutor/model_engine.py, DoRA (Delta Orthogonal Rank Adaptation) functionality was removed from the PyTorch flow to eliminate issues with inverted DoRA detection logic. The original is_dora condition was checking if scaling_vec_pointer == 0, which was potentially incorrect.

Applied to files:

  • tensorrt_llm/executor/executor.py
  • tensorrt_llm/executor/base_worker.py
  • tensorrt_llm/_torch/pyexecutor/llm_request.py
📚 Learning: 2025-08-14T15:38:01.771Z
Learnt from: MatthiasKohl
Repo: NVIDIA/TensorRT-LLM PR: 6904
File: cpp/tensorrt_llm/pybind/thop/bindings.cpp:55-57
Timestamp: 2025-08-14T15:38:01.771Z
Learning: In TensorRT-LLM Python bindings, tensor parameter collections like mla_tensor_params and spec_decoding_tensor_params are kept as required parameters without defaults to maintain API consistency, even when it might affect backward compatibility.

Applied to files:

  • tensorrt_llm/sampling_params.py
🧬 Code graph analysis (3)
tests/unittest/llmapi/test_llm_pytorch.py (3)
tensorrt_llm/llmapi/llm.py (3)
  • LLM (1101-1117)
  • generate (259-341)
  • prompt (86-87)
tensorrt_llm/llmapi/llm_args.py (1)
  • KvCacheConfig (1261-1405)
tensorrt_llm/sampling_params.py (1)
  • SamplingParams (120-553)
tensorrt_llm/_torch/pyexecutor/sampler.py (1)
tensorrt_llm/_torch/pyexecutor/sampling_utils.py (1)
  • process_logits (251-313)
tensorrt_llm/executor/base_worker.py (1)
tensorrt_llm/executor/executor.py (1)
  • _get_logprob_params (223-240)
🪛 Ruff (0.14.5)
tests/unittest/llmapi/test_llm_pytorch.py

998-998: Loop control variable token_id not used within loop body

Rename unused token_id to _token_id

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tensorrt_llm/_torch/pyexecutor/llm_request.py (1)

781-783: Correctly propagating logprob params into LlmRequest

Copying _logprob_params from executor_request to llm_request ensures the sampler can access logprob settings (including logprobs_mode) on the PyTorch path while remaining a no-op for older callers that don't set this attribute. Looks good and is backward compatible.

tests/unittest/llmapi/test_llm_pytorch.py (1)

918-967: New processed_logprobs tests cover key modes and look consistent with API

The new tests for basic behavior, greedy sampling, and backward compatibility (test_llm_logprobs_modes_basic, test_llm_processed_logprobs_with_greedy_sampling, test_llm_logprobs_mode_backward_compatibility) exercise the processed‑logprobs path across different temperatures/top‑k and confirm that the default logprobs_mode matches the explicit setting. The structure matches existing patterns in this file and looks correct.

Also applies to: 1006-1077

tensorrt_llm/executor/executor.py (1)

237-238: LGTM!

The addition of logprobs_mode parameter propagation is correct and consistent with the new processed logprobs feature. The parameter is safely passed from request.sampling_params.logprobs_mode to LogprobParams, and this code path is only executed when logprobs are requested (guarded by the conditional at line 227).

@dominicshanshan dominicshanshan marked this pull request as draft November 19, 2025 10:15
@dominicshanshan
Copy link
Collaborator Author

dominicshanshan commented Nov 19, 2025

logprobs API refactor and make the logprobs_mode with multiple return will filed in separate PR, this PR only implement processed_logprobs for RL usage.

@dominicshanshan dominicshanshan force-pushed the user/shanshan/add_processed_logprobs branch from 81c25d6 to c9bade2 Compare November 20, 2025 09:56
@dominicshanshan dominicshanshan marked this pull request as ready for review November 20, 2025 10:02
@dominicshanshan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25196 [ run ] triggered by Bot. Commit: c9bade2

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25196 [ run ] completed with state SUCCESS. Commit: c9bade2
/LLM/main/L0_MergeRequest_PR pipeline #19052 completed with status: 'FAILURE'

@dominicshanshan dominicshanshan requested a review from a team as a code owner November 21, 2025 02:25
@dominicshanshan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25278 [ run ] triggered by Bot. Commit: 7913e82

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25278 [ run ] completed with state SUCCESS. Commit: 7913e82
/LLM/main/L0_MergeRequest_PR pipeline #19123 completed with status: 'FAILURE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants