Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions tensorrt_llm/_torch/auto_deploy/models/patches/starcoder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from transformers.models.starcoder2.configuration_starcoder2 import Starcoder2Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add the required NVIDIA Apache-2.0 copyright header.

Per coding guidelines, all Python source files must include the NVIDIA Apache-2.0 copyright header with the current year at the top of the file.

Add this header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from transformers.models.starcoder2.configuration_starcoder2 import Starcoder2Config

As per coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from transformers.models.starcoder2.configuration_starcoder2 import Starcoder2Config
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from transformers.models.starcoder2.configuration_starcoder2 import Starcoder2Config
🤖 Prompt for AI Agents
In tensorrt_llm/_torch/auto_deploy/models/patches/starcoder.py around line 1,
the file is missing the required NVIDIA Apache-2.0 copyright header; add the
standard NVIDIA Apache-2.0 header block (including the current year) at the very
top of the file before any imports, ensuring the exact header text and license
notice per project guidelines is used.


# Remove this patch after TRT-LLM upgrades to the HF transformers version >= 4.57
Starcoder2Config.base_model_tp_plan["layers.*.mlp.c_proj"] = "rowwise"
Comment on lines +3 to +4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify transformers version 4.57 availability and add tracking for patch removal.

The comment references transformers >= 4.57 as the removal threshold. Based on learnings, version 4.56.0 was released in August 2025, but 4.57 has not been mentioned yet. Please verify the actual version that includes the fix and confirm the timeline.

Additionally, consider adding a TODO or FIXME comment with an issue reference to ensure this temporary patch is tracked and removed appropriately.

Run the following script to check the latest transformers version and search for related PRs:


🏁 Script executed:

#!/bin/bash
# Check latest transformers version
curl -s https://pypi.org/pypi/transformers/json | jq -r '.info.version'

# Search for Starcoder2 TP plan fixes in transformers repo
gh api graphql -f query='
{
  search(query: "repo:huggingface/transformers starcoder2 base_model_tp_plan", type: ISSUE, first: 5) {
    nodes {
      ... on Issue {
        number
        title
        state
        createdAt
      }
      ... on PullRequest {
        number
        title
        state
        createdAt
        merged
      }
    }
  }
}'

Length of output: 325


Update patch removal comment for correct version and add tracking

Replace the existing line with:

# TODO: Remove this patch after upgrading to HuggingFace transformers ≥4.56.2 (fix included in HF PR #40814)
🤖 Prompt for AI Agents
tensorrt_llm/_torch/auto_deploy/models/patches/starcoder.py around lines 3-4:
the current removal comment is outdated; replace it with a TODO that specifies
removing the patch after upgrading to HuggingFace transformers ≥4.56.2 and
references the HF PR #40814 so the change is tracked (i.e., update the comment
text to mention the exact version and PR number and leave the patch assignment
line unchanged).

18 changes: 15 additions & 3 deletions tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ def detect_sharding_from_factory_config(
num_simple_shards = 0
num_row_col_shards = 0

for lin_node in filtered_nodes(gm.graph.nodes, is_linear_op):
for lin_node in filtered_nodes(gm.graph.nodes, [is_linear_op, is_fake_quantized_linear_op]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate nodes will be processed due to filtered_nodes bug.

The call to filtered_nodes with a list of predicates [is_linear_op, is_fake_quantized_linear_op] will yield duplicate nodes because the implementation in node_utils.py (lines 242-246) is missing a break statement after yielding. This will cause nodes that match both predicates to be processed multiple times, creating duplicate shard transforms.

This issue will be resolved once the bug in node_utils.py (lines 242-246) is fixed by adding a break statement after yield node.

🤖 Prompt for AI Agents
In tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py around line
295, duplicate nodes are being processed because filtered_nodes was called with
two predicates; fix the root cause in
tensorrt_llm/_torch/auto_deploy/transform/node_utils.py (around lines 242-246)
by adding a break immediately after the "yield node" inside the loop so a node
that matches the first predicate is not yielded again for subsequent predicates,
then run tests to confirm no duplicate shard transforms are produced.

# use node's weight name to get the module name
module_name = lin_node.args[1].target

Expand Down Expand Up @@ -368,7 +368,7 @@ def detect_sharding_from_factory_config(
)
num_row_col_shards += 1
else:
ad_logger.warning("Invalid sharding config. Skipping.")
ad_logger.warning(f"Unsupported sharding action {config}. Skipping.")
else:
# TODO: local refers to hybrid EP+TP parallelism. Not supported yet.
ad_logger.warning("Local EP+TP sharding is not supported yet. Skipping.")
Expand All @@ -387,7 +387,19 @@ def detect_sharding_from_factory_config(
)
num_simple_shards += 1
else:
ad_logger.warning("Invalid sharding config. Skipping.")
ad_logger.warning(
f"Unsupported sharding action {config}. Fallback to simple shard"
)
sharding_config.tp_transforms.append(
TPShardingInfo.from_node(
lin_node,
split_dim=SplitDimension.COLUMN,
rank=rank,
world_size=world_size,
dist_op="all_gather",
min_local_shape=1,
)
)
# after successful match, break the loop
break

Expand Down
5 changes: 5 additions & 0 deletions tensorrt_llm/_torch/auto_deploy/utils/node_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,11 @@ def filtered_nodes(
for node in nodes:
if target(node):
yield node
elif isinstance(target, Iterable) and all(isinstance(t, Callable) for t in target):
for node in nodes:
for t in target:
if t(node):
yield node
Comment on lines +242 to +246
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add break after yielding to prevent duplicate nodes.

The inner loop at line 244-246 will yield the same node multiple times if more than one predicate matches. This creates duplicates in the iteration results.

Apply this diff to add a break statement:

 elif isinstance(target, Iterable) and all(isinstance(t, Callable) for t in target):
     for node in nodes:
         for t in target:
             if t(node):
                 yield node
+                break
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif isinstance(target, Iterable) and all(isinstance(t, Callable) for t in target):
for node in nodes:
for t in target:
if t(node):
yield node
elif isinstance(target, Iterable) and all(isinstance(t, Callable) for t in target):
for node in nodes:
for t in target:
if t(node):
yield node
break
🤖 Prompt for AI Agents
In tensorrt_llm/_torch/auto_deploy/utils/node_utils.py around lines 242 to 246,
the inner loop yields the same node multiple times when multiple predicates
match; after yielding a node inside the inner for-loop, add a break to stop
checking further predicates for that node so each node is produced at most once.

else:
# Handle the case where target or ops contains operations
operations = ops if ops is not None else target
Expand Down
Loading