Skip to content

Conversation

@Tabrizian
Copy link
Member

@Tabrizian Tabrizian commented Oct 28, 2025

Summary by CodeRabbit

  • New Features
    • Added Indexer KCache support for optimized key-value cache management
    • Enabled configurable indexer KCache block size and dimension settings
    • Enhanced multi-GPU cache transfer capabilities with indexer KCache option
    • Improved MLA (Multi-Head Latent Attention) cache handling with new transfer paths

gsm8k accuracy for disagg:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match _ 95.7544 _ 0.5554
strict-match 5 exact_match _ 95.6785 _ 0.5601

[10/30/2025-21:00:20] [TRT-LLM] [I] lm-eval gsm8k average accuracy: 95.72
[10/30/2025-21:00:20] [TRT-LLM] [I] Hypothesis testing report:

Tasks Version Filter n-shot Metric Value Stderr
gpqa_diamond_cot_zeroshot_aa 1 strict-match 0 exact_match 79.798 ± 2.8606

[11/04/2025-23:53:54] [TRT-LLM] [I] lm-eval gpqa_diamond_cot_zeroshot_aa average accuracy: 79.80

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@Tabrizian Tabrizian changed the title User/imant/move indexer2 Add support for disagg in DSv3.2 Oct 28, 2025
@Tabrizian Tabrizian changed the title Add support for disagg in DSv3.2 [TRTLLM-8540][feat] Add support for disagg in DSv3.2 Oct 28, 2025
@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch from 9ce0552 to 7586763 Compare October 31, 2025 22:08
@Tabrizian
Copy link
Member Author

/bot run

@Tabrizian Tabrizian marked this pull request as ready for review October 31, 2025 22:18
@Tabrizian Tabrizian requested a review from a team as a code owner October 31, 2025 22:18
@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch from 7586763 to 864914e Compare October 31, 2025 22:19
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

📝 Walkthrough

Walkthrough

This PR introduces comprehensive support for an "Indexer K-Cache" feature across the TensorRT-LLM batch manager and executor. Changes include exposing new indexer K-cache configuration accessors through the manager class hierarchy, extending cache state to track indexer K-cache settings, modifying cache transfer buffer management to support indexer pools, and updating cache split/concat operations with isIndexerKCache parameters. Serialization of cache state is extended to persist these new fields, and MLA cache formatting is refactored to handle multiple transfer buffer managers.

Changes

Cohort / File(s) Summary
Indexer K-Cache Configuration Accessors
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
Added three new public accessor methods (isEnableIndexerKCache(), getIndexerKCacheQuantBlockSize(), getIndexerKCacheIndexHeadDim()) to WindowBlockManager, BlockManager, and KVCacheManager classes; added corresponding pure virtual methods to BaseKVCacheManager base class to standardize queries across the hierarchy.
Cache State Extension
cpp/include/tensorrt_llm/executor/dataTransceiverState.h
Extended CacheState with three new configuration fields: hasIndexerKCache (bool), indexerDimPerHead (SizeType32), and indexerKCacheQuantBlockSize (SizeType32, default 128); updated three constructors to accept these parameters and added corresponding getter methods.
Cache Utilities with Indexer Pool Support
cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
Modified BlockRange to support optional indexer K-cache pool selection via updated getBlockRangeForWindow() with new useIndexerKCache parameter; updated pool count queries to use explicit flags; added mIndexerKCachePool member initialization.
Cache Transfer Buffer Configuration
cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h, cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
Extended CacheTransBufferManager constructor with transferIndexerKCache boolean parameter; added data type selection logic conditional on this flag; added public getMaxNumTokens() accessor.
Cache Manager Integration
cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp, cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
Updated getNumPools() calls to use explicit boolean flags; modified MLACacheFormatter construction to accept vector of CacheTransBufferManager pointers instead of single pointer; updated CacheState instantiation to pass indexer K-cache parameters.
MLA Cache Formatting Refactor
cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h, cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
Changed MLACacheFormatter constructor signature from single CacheTransBufferManager* to std::vector<CacheTransBufferManager*>; refactored transfer logic to support per-buffer-manager paths with dynamic buffer allocation, zero-copy handling, and per-transferer timing measurements.
Cache Split/Concat Operations
cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h, cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
Added optional isIndexerKCache parameter (default false) to splitKVCacheDispatch(), concatKvCacheV2Dispatch(), and related functions; conditional data type handling switches to UINT8 for indexer caches; dynamic per-head dimension adjustment for indexer-specific formats.
Serialization Support
cpp/tensorrt_llm/executor/serialization.cpp
Extended CacheState serialization and deserialization to handle three new fields: hasIndexerKCache, indexerDimPerHead, and indexerKCacheQuantBlockSize; updated serialized size calculation and constructor invocation.
Comprehensive Test Updates
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
Added three new test parameters (isIndexerKCache, indexerDimPerHead, indexerKCacheQuantBlockSize) to test tuple; extended setUpCacheManager() and helper methods (fillBlockData, verifyBlockData, generateExpectedValue) to handle indexer K-cache scenarios; expanded test instantiation macros to cover indexer K-cache combinations.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Setup
    participant CacheMgr as CacheManager
    participant KVState as CacheState
    participant MLAFormatter as MLACacheFormatter
    participant TransBuffer as CacheTransBufferManager

    Test->>CacheMgr: Initialize with indexerKCache config
    CacheMgr->>KVState: Create CacheState(hasIndexerKCache, indexerDimPerHead, ...)
    KVState-->>CacheMgr: Store configuration
    
    Test->>MLAFormatter: Create MLACacheFormatter(cacheManager, vector<TransBuffer*>)
    Note over MLAFormatter: Multiple buffers for primary + indexer paths
    
    MLAFormatter->>TransBuffer: Initialize primary buffer (transferIndexerKCache=false)
    MLAFormatter->>TransBuffer: Initialize indexer buffer (transferIndexerKCache=true)
    
    Test->>MLAFormatter: Transfer cache
    alt Use IndexerKCache Path
        MLAFormatter->>TransBuffer: getOrAllocateRecvBuffers (indexer buffer)
        TransBuffer-->>MLAFormatter: Buffer handles (UINT8 dtype)
    else Use Primary Path
        MLAFormatter->>TransBuffer: getOrAllocateRecvBuffers (primary buffer)
        TransBuffer-->>MLAFormatter: Buffer handles (original dtype)
    end
    
    MLAFormatter->>CacheMgr: Query isEnableIndexerKCache(), getIndexerKCacheQuantBlockSize()
    CacheMgr-->>MLAFormatter: Configuration flags
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • mlaCacheFormatter.cpp: Significant refactoring with per-buffer-manager logic paths, zero-copy handling, and dynamic buffer management; requires careful tracing of data flow across multiple transfer scenarios
  • cacheSplitConcat.cu: Complex conditional data type handling and per-head dimension adjustments based on isIndexerKCache flag; multiple function signature changes propagating through call chain
  • cacheTransceiverTest.cpp: Extensive test parameter matrix expansion with new indexer K-cache combinations; helper method updates affect multiple test paths
  • kvCacheUtils.h: Subtle changes to pool selection logic with new conditional branching for indexer cache pools
  • Interconnected parameter threading: New parameters propagate through multiple abstraction layers (CacheState → MLACacheFormatter → CacheTransBufferManager), requiring verification of correct plumbing across all call sites

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.16% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ⚠️ Warning PR description lacks detailed explanation of changes, implementation approach, and test coverage validation despite extensive code modifications. Add clear description of what IndexerKCache feature does, why it's needed for disagg in DSv3.2, specific test results for this feature, and link to relevant issue/ticket.
Title Check ❓ Inconclusive The title "[TRTLLM-8540][feat] Add support for disagg in DSv3.2" follows the required format with a valid JIRA ticket and type indicator. However, the title is overly vague and high-level. The raw_summary reveals that the actual technical changes consist primarily of adding indexer KCache support infrastructure across multiple cache manager classes and related components—a specific implementation detail essential for understanding the changeset. The title describes the end goal (disagg support) rather than the primary technical mechanism (indexer KCache), making it less informative for developers scanning commit history.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23228 [ run ] triggered by Bot. Commit: 864914e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23229 [ run ] triggered by Bot. Commit: 864914e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23228 [ run ] completed with state ABORTED. Commit: 864914e

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu (1)

1060-1073: Variable shadowing bug: conditional data type assignment is overwritten.

Lines 1060-1068 conditionally set cacheDataType based on isIndexerKCache, but line 1073 shadows this variable by declaring a new local auto cacheDataType, effectively discarding the conditional logic. This will cause incorrect data type handling when isIndexerKCache is true.

Apply this diff to fix the shadowing:

     for (auto const& [window, blocks] : kVCacheBlocksPerWindow)
     {
         auto cacheBlockSize = blocks.front()->getSize();
-        auto cacheDataType = blocks.front()->getDataType();
+        auto blockDataType = blocks.front()->getDataType();
         windowSizes.push_back(window);

Then ensure validation uses the outer cacheDataType:

         for (auto&& kvCacheBlock : blocks)
         {
-            TLLM_CHECK(kvCacheBlock->getDataType() == cacheDataType);
+            TLLM_CHECK(kvCacheBlock->getDataType() == blockDataType);
             TLLM_CHECK(kvCacheBlock->getSize() == cacheBlockSize);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0dc746 and 864914e.

📒 Files selected for processing (13)
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (4 hunks)
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h (4 hunks)
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h (5 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp (6 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h (3 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp (4 hunks)
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu (11 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h (1 hunks)
  • cpp/tensorrt_llm/executor/serialization.cpp (3 hunks)
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (24 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh}: Namespace closing braces must include a trailing comment with the namespace name (e.g., '} // namespace foo').
Prefer const or constexpr variables over #define for constants.
Declare variables that are not modified after initialization as const.
Avoid magic literals in code; except for 0, nullptr, true, false. Use named constants for comparisons and logic.
Use Allman brace style for formatting.
Place the semicolon of an empty for/while loop on a new line.
Bodies of switch/while/do-while/for must be compound statements (brace-delimited), and if/else must always be followed by brace-delimited statements.
Type names (e.g., classes) must be CamelCase starting with an uppercase letter (e.g., FooBar).
Local variables, methods, and namespaces use lowerCamelCase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not in an anonymous namespace must be lowerCamelCase prefixed with 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number globals that are static or in an anonymous namespace use lowerCamelCase prefixed with 's' (e.g., sMutableStaticGlobal).
Locally visible static variables use lowerCamelCase with 's' prefix (e.g., static std::once_flag sFlag).
Private/protected member variables use 'm' prefix with CamelCase (e.g., mNbFooValues). Public members may omit, but 'm' is encouraged for clarity.
Constants (enums, global constants, static constants, and function-scope magic/literal constants) use uppercase SNAKE_CASE with 'k' prefix (e.g., kDIGIT_NUM).
Function-scope constants that are not magic numbers or literals are named like non-constant variables (e.g., bool const pass = a && b).
If macros are necessary, name them in UPPER_SNAKE_CASE (e.g., FOO_VERSION) and prefer constants over #define.
Use LLVM clang-format; wrap lines at a maximum of 120 columns; use '// clang-format off/on' sparingly with justification.
Use smart pointers for heap allocations; prefer unique_ptr for sole ownership, shared_ptr for shared...

Files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
**/*.{cpp,cxx,cc,cu,h,hpp,hh,hxx,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

C++ filenames should be lowerCamelCase (first letter lowercase) and must be case-insensitive unique within a compilation target.

Files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
**/*.{h,hpp,hh,hxx}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Document new class interfaces and function prototypes with Doxygen; use //! for single-line and //!< for members.

Files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
**/*.{h,hpp,hh,hxx,cpp,cxx,cc}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc}: Prefer anonymous namespaces over 'static' for internal linkage of functions.
All templates (class/function/member/static) must be instantiated at least once; non-POD classes should have private data members.

Files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
**/*.{h,hpp,hh,hxx,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use include guards named 'TRTLLM_<FILE_NAME_IN_CAPS_WITH_UNDERSCORES>_H' (no leading or trailing underscore; directory names excluded).

Files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
🧠 Learnings (12)
📓 Common learnings
Learnt from: thorjohnsen
Repo: NVIDIA/TensorRT-LLM PR: 6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:2010-2045
Timestamp: 2025-08-21T09:41:49.347Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is specifically for updating bookkeeping when blocks are added during the context phase, not for refreshing offsets after detach operations. During detach operations, GenerationRequest::removeFrontBlock handles the necessary cache block bookkeeping internally.
📚 Learning: 2025-08-20T06:56:02.889Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:577-579
Timestamp: 2025-08-20T06:56:02.889Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, maxSequenceLength is now enforced as a non-optional argument in the BlockManager constructor, so concerns about std::nullopt defaulting to 0 are not applicable. When windowSize > maxSequenceLength, a warning should be added instead of handling optional parameter cases.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-08-21T09:41:49.347Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:2010-2045
Timestamp: 2025-08-21T09:41:49.347Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is specifically for updating bookkeeping when blocks are added during the context phase, not for refreshing offsets after detach operations. During detach operations, GenerationRequest::removeFrontBlock handles the necessary cache block bookkeeping internally.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-08-15T06:46:54.897Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6767
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-15T06:46:54.897Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp addToken function, newly allocated blocks are unshared by design. The beam search path in addToken (when sequence.getNumTokens() > windowSize) is currently broken/non-functional with SWA, so the block allocation doesn't follow a shared-then-unshared pattern.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-08-20T06:48:45.368Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h:0-0
Timestamp: 2025-08-20T06:48:45.368Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is only called when adding a sequence, not during detach operations. During detach, the cache block bookkeeping is handled by GenerationRequest::removeFrontBlock.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-08-06T08:18:28.669Z
Learnt from: zhengd-nv
Repo: NVIDIA/TensorRT-LLM PR: 6633
File: cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp:145-155
Timestamp: 2025-08-06T08:18:28.669Z
Learning: In cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp, the existing `mMtxForMap` mutex in DataSenderImpl is sufficient to synchronize measurement file operations in the `release` method, as all file operations occur within the same critical section that protects the `mRequestToSession` map access.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
📚 Learning: 2025-08-14T21:04:50.248Z
Learnt from: thorjohnsen
Repo: NVIDIA/TensorRT-LLM PR: 6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-08-20T06:48:45.368Z
Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h:0-0
Timestamp: 2025-08-20T06:48:45.368Z
Learning: There is a planned refactoring to move cache block bookkeeping utilities from BlockManager/WindowBlockManager into the GenerationRequest class itself to improve code organization and make responsibilities clearer.

Applied to files:

  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-09-23T14:58:05.372Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:42-49
Timestamp: 2025-09-23T14:58:05.372Z
Learning: In TensorRT-LLM NCCL device kernels (cpp/tensorrt_llm/kernels/nccl_device/), the token partitioning intentionally uses ceil-like distribution (same token_per_rank for all ranks) to ensure all ranks launch the same number of blocks. This is required for optimal NCCL device API barrier performance, even though it may launch extra blocks for non-existent tokens on later ranks. Runtime bounds checking in the kernel (blockID validation) handles the overshoot cases.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-09-29T15:14:28.503Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 8063
File: tensorrt_llm/lora_manager.py:1080-1112
Timestamp: 2025-09-29T15:14:28.503Z
Learning: In tensorrt_llm/lora_manager.py, when calculating part_sizes for attn_qkv fused LoRA modules, the sizes are correctly multiplied by tp_size because model_config.num_heads and model_config.num_kv_heads are already divided by tp_size (per-TP-rank values), so multiplication is needed to get the original full concatenated dimension size. The interleave_fused_lora_weights_for_tp function provides proper validation.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
📚 Learning: 2025-09-29T15:14:28.503Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 8063
File: tensorrt_llm/lora_manager.py:1080-1112
Timestamp: 2025-09-29T15:14:28.503Z
Learning: In tensorrt_llm/lora_manager.py, when calculating part_sizes for attn_qkv fused LoRA modules, the sizes are correctly multiplied by tp_size because model_config.num_heads and model_config.num_kv_heads are already divided by tp_size (per-TP-rank values), so multiplication is needed to get the original full concatenated dimension size. The interleave_fused_lora_weights_for_tp function provides proper validation with asserts for total size and TP divisibility.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu
📚 Learning: 2025-09-23T15:01:00.070Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:15-17
Timestamp: 2025-09-23T15:01:00.070Z
Learning: In TensorRT-LLM NCCL device kernels, the <sstream> header is not needed as an explicit include in config.cu because it's provided transitively through other headers. Local compilation testing confirms this works without the explicit include.

Applied to files:

  • cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp
🧬 Code graph analysis (12)
cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.h (2)
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (2)
  • BaseKVCacheManager (1425-2008)
  • BaseKVCacheManager (1432-1432)
cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp (1)
  • CacheTransBufferManager (191-253)
cpp/tensorrt_llm/batch_manager/cacheTransBuffer.h (1)
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (1)
  • maxNumTokens (293-307)
cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp (1)
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (1)
  • maxNumTokens (293-307)
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (1)
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp (3)
  • nodiscard (1140-1147)
  • nodiscard (1693-1696)
  • nodiscard (2971-2989)
cpp/tensorrt_llm/executor/serialization.cpp (2)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (6)
  • deserialize (253-261)
  • deserialize (253-253)
  • serialize (244-251)
  • serialize (244-244)
  • serializedSize (263-272)
  • serializedSize (263-263)
cpp/tensorrt_llm/executor/requestImpl.h (1)
  • serialize (99-104)
cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu (1)
cpp/include/tensorrt_llm/common/dataType.h (1)
  • getDTypeSize (26-44)
cpp/include/tensorrt_llm/batch_manager/kvCacheUtils.h (2)
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (2)
  • `` (1232-1235)
  • `` (1786-1789)
cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp (2)
  • poolIdx (220-224)
  • poolIdx (220-220)
cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.h (2)
cpp/include/tensorrt_llm/executor/dataTransceiverState.h (1)
  • CacheState (40-102)
cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu (2)
  • concatKvCacheV2Dispatch (1687-1726)
  • concatKvCacheV2Dispatch (1687-1690)
cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp (1)
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (1)
  • maxNumTokens (293-307)
cpp/tensorrt_llm/batch_manager/mlaCacheFormatter.cpp (4)
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (9)
  • llmRequest (322-361)
  • llmRequest (322-322)
  • llmRequest (818-861)
  • llmRequest (818-818)
  • llmRequest (863-870)
  • llmRequest (863-863)
  • llmRequest (885-930)
  • llmRequest (885-885)
  • future (167-176)
cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp (2)
  • getBlockRangeForSending (45-86)
  • getBlockRangeForSending (45-46)
cpp/tensorrt_llm/common/envUtils.cpp (4)
  • getEnvTryZCopyForKVCacheTransfer (339-343)
  • getEnvTryZCopyForKVCacheTransfer (339-339)
  • getEnvEnableReceiveKVCacheParallel (333-337)
  • getEnvEnableReceiveKVCacheParallel (333-333)
cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu (4)
  • splitKVCacheDispatch (1342-1388)
  • splitKVCacheDispatch (1342-1345)
  • concatKvCacheV2Dispatch (1687-1726)
  • concatKvCacheV2Dispatch (1687-1690)
cpp/include/tensorrt_llm/executor/dataTransceiverState.h (1)
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (12)
  • nodiscard (255-1354)
  • nodiscard (667-675)
  • nodiscard (677-682)
  • nodiscard (689-697)
  • nodiscard (699-707)
  • nodiscard (709-712)
  • nodiscard (719-722)
  • nodiscard (724-734)
  • nodiscard (848-859)
  • nodiscard (883-887)
  • nodiscard (1137-1140)
  • nodiscard (1142-1164)
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (2)
cpp/tests/unit_tests/batch_manager/cacheTransBufferTest.cpp (4)
  • numLayers (32-69)
  • numLayers (32-33)
  • numLayers (77-88)
  • numLayers (77-77)
cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp (2)
  • createCacheFormatter (972-986)
  • createCacheFormatter (972-973)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
cpp/tensorrt_llm/executor/cache_transmission/cacheSplitConcat.cu (1)

1028-1029: LGTM: isIndexerKCache parameter added consistently.

The new isIndexerKCache parameter with default value false is properly threaded through the split/concat function hierarchy, maintaining backward compatibility while enabling specialized indexer KCache handling.

Also applies to: 1342-1345, 1390-1394, 1687-1690

cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (1)

598-611: LGTM: Indexer KCache accessors added consistently across the hierarchy.

The three new accessor methods (isEnableIndexerKCache(), getIndexerKCacheQuantBlockSize(), getIndexerKCacheIndexHeadDim()) are properly defined across the class hierarchy:

  • WindowBlockManager provides concrete implementations
  • BlockManager delegates to WindowBlockManager
  • BaseKVCacheManager declares pure virtuals
  • KVCacheManager overrides delegate to BlockManager

All methods follow const-correctness and naming conventions.

Also applies to: 1032-1045, 1518-1520, 1855-1868

cpp/include/tensorrt_llm/executor/dataTransceiverState.h (2)

51-66: LGTM: CacheState constructors consistently extended with indexer KCache parameters.

All three constructor overloads are updated uniformly to accept and initialize the new indexer KCache configuration: hasIndexerKCache, indexerDimPerHead, and indexerKCacheQuantBlockSize. Member initialization follows the established pattern.

Also applies to: 68-84, 86-102


189-202: LGTM: Indexer KCache state properly exposed and serialized.

The three new getters follow existing patterns, toString() is extended to include the new fields for debugging, and private members have appropriate default values (false, 0, 128).

Also applies to: 224-226, 237-239

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23229 [ run ] completed with state FAILURE. Commit: 864914e
/LLM/main/L0_MergeRequest_PR pipeline #17508 completed with status: 'FAILURE'

@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch 3 times, most recently from d27895c to 72af7bf Compare November 1, 2025 23:59
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23265 [ run ] triggered by Bot. Commit: 72af7bf

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23265 [ run ] completed with state FAILURE. Commit: 72af7bf
/LLM/main/L0_MergeRequest_PR pipeline #17531 completed with status: 'FAILURE'

@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch from 72af7bf to 765cbf4 Compare November 2, 2025 03:50
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23272 [ run ] triggered by Bot. Commit: 765cbf4

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23272 [ run ] completed with state FAILURE. Commit: 765cbf4
/LLM/main/L0_MergeRequest_PR pipeline #17536 completed with status: 'FAILURE'

@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch 2 times, most recently from e8be846 to 1efdcd2 Compare November 3, 2025 07:13
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24221 [ run ] triggered by Bot. Commit: 340536f

@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch 2 times, most recently from 5a470de to 5bafac4 Compare November 12, 2025 02:09
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch from 5bafac4 to 94cb31b Compare November 12, 2025 02:12
@tensorrt-cicd
Copy link
Collaborator

PR_Github #24231 [ run ] triggered by Bot. Commit: 94cb31b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24221 [ run ] completed with state ABORTED. Commit: 340536f
LLM/main/L0_MergeRequest_PR #18267 (Blue Ocean) completed with status: ABORTED

@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch from 94cb31b to 044d3bc Compare November 12, 2025 02:18
Signed-off-by: Iman Tabrizian <[email protected]>

Fixes for block size

Signed-off-by: Iman Tabrizian <[email protected]>

don't use unravel index

Signed-off-by: Iman Tabrizian <[email protected]>

Review commit

Signed-off-by: Iman Tabrizian <[email protected]>

minor fix

Signed-off-by: Iman Tabrizian <[email protected]>

Fix compile errors after rebase

Signed-off-by: Iman Tabrizian <[email protected]>

Bug fixes

Signed-off-by: Iman Tabrizian <[email protected]>

Remove print

Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
@Tabrizian Tabrizian force-pushed the user/imant/moveIndexer2 branch from 044d3bc to fc3e7f6 Compare November 12, 2025 02:19
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24234 [ run ] triggered by Bot. Commit: fc3e7f6

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24231 [ run ] completed with state ABORTED. Commit: 94cb31b
LLM/main/L0_MergeRequest_PR #18277 (Blue Ocean) completed with status: ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24234 [ run ] completed with state SUCCESS. Commit: fc3e7f6
/LLM/main/L0_MergeRequest_PR pipeline #18280 completed with status: 'FAILURE'

@Tabrizian
Copy link
Member Author

/bot skip --comment "unrelated flaky test failures"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24318 [ skip ] triggered by Bot. Commit: fc3e7f6

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24318 [ skip ] completed with state SUCCESS. Commit: fc3e7f6
Skipping testing for commit fc3e7f6

@Tabrizian Tabrizian merged commit cdde15b into NVIDIA:main Nov 12, 2025
5 checks passed
@Tabrizian Tabrizian deleted the user/imant/moveIndexer2 branch November 13, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants