-
Couldn't load subscription status.
- Fork 1.8k
[None] [feat] [Do Not Review] Add C++ dependency scanning system #8662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughIntroduces a new dependency scanning system for TensorRT-LLM C++ builds. A CMakeLists.txt modification enables header dependency tracking. A Python scanner collects build artifacts from .d files, link.txt, and wheels, then resolves them via dpkg-query or YAML pattern matching, generating known/unknown YAML reports. Metadata YAML files define library patterns and path components. Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI/main()
participant AC as ArtifactCollector
participant DR as DpkgResolver
participant PM as PatternMatcher
participant OG as OutputGenerator
CLI->>AC: collect_all() from build_dir
activate AC
AC->>AC: parse .d files (headers)
AC->>AC: parse link.txt (libraries)
AC->>AC: scan .whl files
AC->>AC: readelf -d (NEEDED)
AC-->>CLI: List[Artifact]
deactivate AC
CLI->>DR: resolve artifacts (primary: dpkg)
activate DR
DR->>DR: dpkg-query for library paths
DR-->>CLI: Mapping[success]
deactivate DR
CLI->>PM: match remaining (fallback: YAML)
activate PM
PM->>PM: load metadata/*.yml
PM->>PM: exact pattern match
alt match found
PM-->>CLI: Mapping[high/medium/low confidence]
else no match
PM-->>CLI: unresolved Artifact
end
deactivate PM
CLI->>OG: generate(mappings, artifacts)
activate OG
OG->>OG: group by dependency → known.yml
OG->>OG: collect unmapped → unknown.yml
OG->>OG: compute coverage & confidence stats
OG-->>CLI: (known.yml, unknown.yml)
deactivate OG
CLI->>CLI: print summary report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Areas requiring extra attention:
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🧹 Nitpick comments (18)
cpp/dependency_scan/metadata/README.md (1)
1-760: Excellent comprehensive documentation.The README provides thorough guidance on the dependency metadata system, including clear examples, troubleshooting steps, and best practices. The structure flows well from quick start through advanced usage. This will significantly aid contributors in understanding and maintaining dependency metadata.
Consider extracting code block language specifiers as mentioned in individual comments above to align with markdown linting standards.
cpp/dependency_scan/metadata/pytorch.yml (1)
11-12: Broaden PyTorch linker flags for better matching.Add common flags used in practice so -l matches don’t fall back to generic inference.
linker_flags: -- - -ltorch_python +- - -ltorch_python + - -ltorch + - -lc10 + - -ltorch_cpu + - -ltorch_cudaOptionally add license/homepage/source fields if known to aid attribution.
Also applies to: 13-18
cpp/dependency_scan/metadata/tensorrt-llm.yml (2)
19-31: Reduce maintenance: use stable substring patterns for Python extensions.Replacing many ABI-specific entries with a stable prefix cuts churn across Python versions.
# Python bindings -- - bindings.cpython-312-x86_64-linux-gnu.so -- - bindings.cpython-310-x86_64-linux-gnu.so -- - bindings.cpython-311-x86_64-linux-gnu.so +- bindings.cpython- # Deep learning integration bindings -- - deep_ep_cpp_tllm.cpython-312-x86_64-linux-gnu.so -- - deep_ep_cpp_tllm.cpython-310-x86_64-linux-gnu.so -- - deep_ep_cpp_tllm.cpython-311-x86_64-linux-gnu.so -- - deep_gemm_cpp_tllm.cpython-312-x86_64-linux-gnu.so -- - deep_gemm_cpp_tllm.cpython-310-x86_64-linux-gnu.so -- - deep_gemm_cpp_tllm.cpython-311-x86_64-linux-gnu.so +- deep_ep_cpp_tllm.cpython- +- deep_gemm_cpp_tllm.cpython-The matcher already does substring checks. Add “-ltensorrt_llm” if the project links it via -l.
-linker_flags: [] +linker_flags: + - -ltensorrt_llmPlease confirm the exact shared object names (e.g., libnvinfer_plugin_tensorrt_llm.so) align with produced artifacts and whether Python 3.8/3.9 need coverage.
2-2: Version source of truth.Hardcoding 1.2.0rc1 can drift. Prefer generating this from the project build/version to avoid stale metadata.
cpp/dependency_scan/metadata/_template.yml (1)
58-65: Clarify path_components constraint.Document that components must be single segments (no “/”) to match PatternMatcher behavior.
# Path components that indicate this dependency # When these strings appear in a binary path, map to this dependency +# Note: use single directory/file name segments only (no slashes), +# because the matcher splits paths by "/" and compares components.cpp/dependency_scan/metadata/ucx.yml (1)
11-13: Optional: include UCX linker flags for completeness.Add commonly used flags so -l resolution can map without dpkg.
-linker_flags: [] +linker_flags: + - -lucp + - -lucs + - -luct + - -lucmcpp/dependency_scan/metadata/_schema.yml (1)
79-83: Harden schema with additionalProperties.Prevent typos/unexpected fields from silently passing.
anyOf: - required: ["patterns"] - required: ["linker_flags"] - required: ["path_components"] +additionalProperties: falseREADME currently lists “aliases” as part of “at least one required” sections; schema does not. Please align docs (don’t include aliases) or extend schema if that’s intended.
cpp/dependency_scan/README.md (6)
1-1: Use project-neutral title.Replace “Claude Code’s” with project naming.
-# Claude Code's Minimal Artifact Scanner +# TensorRT-LLM Minimal Artifact Scanner
21-21: Fix markdownlint MD040: specify languages for fenced code blocks.Add appropriate languages to silent the linter and improve rendering.
-``` +```text(repeat for the ASCII diagrams/outputs above)
Also applies to: 442-442, 598-598 --- `49-55`: **Correct dependencies statement.** The tool imports PyYAML and optionally jsonschema; also relies on external tools (dpkg-query, readelf). ```diff -## Installation - -No installation required! The script is self-contained with only Python stdlib dependencies. +## Installation + +Requires Python 3.8+, PyYAML, and optionally jsonschema for schema validation. +System tools: dpkg-query (Debian/Ubuntu) and readelf (binutils). +Install: `pip install pyyaml jsonschema` (jsonschema optional).
53-56: Avoid hardcoded absolute paths.Use repo-relative commands to improve portability.
-cd /home/gvenkatarama/tensorrt_llm/cpp/dependency_scan +cd cpp/dependency_scan @@ -# After building TensorRT-LLM -cd /home/gvenkatarama/tensorrt_llm +# After building TensorRT-LLM +cd <repo-root>Also applies to: 91-99
589-591: Align docs with schema: aliases aren’t a required matching section.Remove “aliases” from the “at least one required” list or update schema accordingly.
-**Pattern sections (at least one required):** -- `patterns`, `linker_flags`, `path_components`, `aliases` +**Pattern sections (at least one required):** +- `patterns`, `linker_flags`, `path_components`
645-646: Remove AI-generation footer.Drop “Generated by Claude Code” branding from project docs.
-**Generated by Claude Code** | **Phase 1 Complete** | **Target Coverage: ≥98%** +**Phase 1 Complete** | **Target Coverage: ≥98%**Also ensure referenced
metadata/README.mdexists or update link to_template.ymland_schema.yml.cpp/dependency_scan/scan_build_artifacts.py (5)
400-421: Avoid external 'find'; use pathlib.rglob within build_dir.Reduces subprocess usage (S603/S607) and improves portability.
- import subprocess - try: - result = subprocess.run([ - 'find', - str(self.build_dir), '-name', basename, '-type', 'f' - ], - capture_output=True, - text=True, - timeout=5) - if result.returncode == 0 and result.stdout.strip(): - matches = result.stdout.strip().split('\n') - # Try to find match with similar relative path structure - for match in matches: - if header_path in match or match.endswith( - header_path): - canonical_path = os.path.realpath(match) - break - # If no exact match, use first match - if not os.path.exists(canonical_path) and matches: - canonical_path = os.path.realpath(matches[0]) - except Exception: - pass + try: + matches = [str(p) for p in self.build_dir.rglob(basename)] + if matches: + # Prefer closest path-structure match + for match in matches: + if header_path in match or match.endswith(header_path): + canonical_path = os.path.realpath(match) + break + if not os.path.exists(canonical_path): + canonical_path = os.path.realpath(matches[0]) + except OSError: + pass
542-551: Stabilize wheel artifact paths for dedup/reporting.Prefix with wheel identifier to avoid collisions and ephemeral temp paths.
- artifacts.append( - Artifact(path=str(so_file.relative_to(temp_path)), + artifacts.append( + Artifact(path=f"wheel://{wheel_file.name}/{so_file.relative_to(temp_path)}", type='binary', source=str(wheel_file), metadata={ 'wheel': wheel_file.name, 'needed': needed_libs })) @@ - artifacts.append( - Artifact(path=needed_lib, + artifacts.append( + Artifact(path=needed_lib, type='library', source=str(wheel_file), metadata={ - 'from_binary': - str(so_file.relative_to(temp_path)), + 'from_binary': f"wheel://{wheel_file.name}/{so_file.relative_to(temp_path)}", 'dynamic_dependency': True }))Also applies to: 553-564
223-227: Replace broadexcept Exceptionand silentpass/continuewith specific exceptions + logging.Catching everything hides real failures; add
loggingand handleOSError,subprocess.TimeoutExpired,FileNotFoundErrorexplicitly. Keep try bodies minimal.I can post a focused diff wiring
logging.getLogger(__name__)and updating these sites in one pass. Based on coding guidelines.Also applies to: 337-338, 420-421, 430-431, 459-460, 498-499, 565-566, 606-607, 693-695, 1089-1091
971-1007: Prefer yaml.safe_dump for emitting reports.Avoids emitting arbitrary Python tags; preserves current structure.
- with open(known_file, 'w') as f: - yaml.dump( + with open(known_file, 'w') as f: + yaml.safe_dump( @@ - with open(unknown_file, 'w') as f: - yaml.dump( + with open(unknown_file, 'w') as f: + yaml.safe_dump(
616-631: Unify terminology: “metadata” vs “dependencies”.The code uses
--metadata-dirbut texts mention “dependencies/”. Standardize on “metadata” to reduce confusion.Apply small string edits in docstrings/help/prints to consistently reference the metadata directory.
Also applies to: 1000-1006, 1128-1130
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
cpp/CMakeLists.txt(1 hunks)cpp/dependency_scan/README.md(1 hunks)cpp/dependency_scan/metadata/README.md(1 hunks)cpp/dependency_scan/metadata/_schema.yml(1 hunks)cpp/dependency_scan/metadata/_template.yml(1 hunks)cpp/dependency_scan/metadata/cppzmq.yml(1 hunks)cpp/dependency_scan/metadata/cuda.yml(1 hunks)cpp/dependency_scan/metadata/cutlass.yml(1 hunks)cpp/dependency_scan/metadata/deep-ep.yml(1 hunks)cpp/dependency_scan/metadata/deepgemm.yml(1 hunks)cpp/dependency_scan/metadata/dpkg.yml(1 hunks)cpp/dependency_scan/metadata/nanobind.yml(1 hunks)cpp/dependency_scan/metadata/nlohmann-json.yml(1 hunks)cpp/dependency_scan/metadata/nvshmem.yml(1 hunks)cpp/dependency_scan/metadata/openmpi.yml(1 hunks)cpp/dependency_scan/metadata/pybind11.yml(1 hunks)cpp/dependency_scan/metadata/pytorch.yml(1 hunks)cpp/dependency_scan/metadata/tensorrt-llm.yml(1 hunks)cpp/dependency_scan/metadata/tensorrt.yml(1 hunks)cpp/dependency_scan/metadata/ucx.yml(1 hunks)cpp/dependency_scan/metadata/ucxx.yml(1 hunks)cpp/dependency_scan/metadata/xgrammar.yml(1 hunks)cpp/dependency_scan/scan_build_artifacts.py(1 hunks)cpp/dependency_scan/scan_output/unknown.yml(1 hunks)cpp/dependency_scan/tests/test_scan_build_artifacts.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
cpp/dependency_scan/tests/test_scan_build_artifacts.pycpp/dependency_scan/scan_build_artifacts.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
cpp/dependency_scan/tests/test_scan_build_artifacts.pycpp/dependency_scan/scan_build_artifacts.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
cpp/dependency_scan/tests/test_scan_build_artifacts.pycpp/dependency_scan/scan_build_artifacts.py
🧠 Learnings (3)
📚 Learning: 2025-08-21T21:48:35.135Z
Learnt from: djns99
PR: NVIDIA/TensorRT-LLM#7104
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:399-417
Timestamp: 2025-08-21T21:48:35.135Z
Learning: CUTLASS extensions in TensorRT-LLM (located under cpp/tensorrt_llm/cutlass_extensions/) are designed to integrate with and extend functionality in the external CUTLASS repository. When analyzing these extensions, their consumers and functionality wiring may exist in the CUTLASS codebase rather than within TensorRT-LLM itself.
Applied to files:
cpp/dependency_scan/metadata/cutlass.yml
📚 Learning: 2025-09-16T09:30:09.716Z
Learnt from: tongyuantongyu
PR: NVIDIA/TensorRT-LLM#7763
File: cpp/tensorrt_llm/CMakeLists.txt:297-301
Timestamp: 2025-09-16T09:30:09.716Z
Learning: In the TensorRT-LLM project, NCCL libraries are loaded earlier by PyTorch libraries or the bindings library, so the main shared library doesn't need NCCL paths in its RPATH - the libraries will already be available in the process address space when needed.
Applied to files:
cpp/dependency_scan/metadata/tensorrt-llm.yml
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
cpp/dependency_scan/metadata/tensorrt-llm.yml
🧬 Code graph analysis (1)
cpp/dependency_scan/tests/test_scan_build_artifacts.py (1)
cpp/dependency_scan/scan_build_artifacts.py (21)
Artifact(47-57)ArtifactCollector(261-609)DpkgResolver(81-253)Mapping(61-73)OutputGenerator(910-1008)PatternMatcher(617-902)to_dict(55-57)to_dict(69-73)find_library_path(128-164)get_package(166-227)_normalize_cuda_package(230-253)_parse_d_file(311-433)_parse_link_file(435-501)_get_needed_libraries(571-609)_scan_wheel(503-568)collect_all(274-309)_match_exact_library(792-836)_match_path_alias(838-870)_match_generic_library(872-902)match(765-790)generate(920-1008)
🪛 LanguageTool
cpp/dependency_scan/metadata/README.md
[grammar] ~148-~148: Use a hyphen to join words.
Context: ...ir validation_output/ ``` ## Pattern Matching Behavior The scanner tries mat...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
cpp/dependency_scan/metadata/README.md
21-21: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
442-442: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
598-598: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
cpp/dependency_scan/README.md
21-21: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
442-442: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
598-598: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.1)
cpp/dependency_scan/tests/test_scan_build_artifacts.py
1-1: Shebang is present but file is not executable
(EXE001)
229-229: f-string without any placeholders
Remove extraneous f prefix
(F541)
260-260: f-string without any placeholders
Remove extraneous f prefix
(F541)
284-284: Avoid equality comparisons to True; use artifacts[0].metadata.get('static'): for truth checks
Replace with artifacts[0].metadata.get('static')
(E712)
cpp/dependency_scan/scan_build_artifacts.py
200-200: subprocess call: check for execution of untrusted input
(S603)
200-200: Starting a process with a partial executable path
(S607)
223-224: try-except-pass detected, consider logging the exception
(S110)
223-223: Do not catch blind exception: Exception
(BLE001)
337-337: Do not catch blind exception: Exception
(BLE001)
402-402: subprocess call: check for execution of untrusted input
(S603)
402-405: Starting a process with a partial executable path
(S607)
420-421: try-except-pass detected, consider logging the exception
(S110)
420-420: Do not catch blind exception: Exception
(BLE001)
430-431: try-except-continue detected, consider logging the exception
(S112)
430-430: Do not catch blind exception: Exception
(BLE001)
459-459: Do not catch blind exception: Exception
(BLE001)
498-499: try-except-continue detected, consider logging the exception
(S112)
498-498: Do not catch blind exception: Exception
(BLE001)
565-566: try-except-pass detected, consider logging the exception
(S110)
565-565: Do not catch blind exception: Exception
(BLE001)
592-592: subprocess call: check for execution of untrusted input
(S603)
593-593: Starting a process with a partial executable path
(S607)
606-607: try-except-pass detected, consider logging the exception
(S110)
606-606: Do not catch blind exception: Exception
(BLE001)
693-693: Do not catch blind exception: Exception
(BLE001)
1089-1089: Do not catch blind exception: Exception
(BLE001)
1241-1241: f-string without any placeholders
Remove extraneous f prefix
(F541)
1306-1306: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (13)
cpp/dependency_scan/metadata/nlohmann-json.yml (1)
1-17: Well-formed metadata with complete field coverage.This metadata file is well-structured with all expected fields populated. The information aligns with the actual nlohmann-json project (correct license, copyright, homepage). Empty
patternsandlinker_flagsare appropriate for a header-only library.cpp/dependency_scan/metadata/cuda.yml (1)
1-18: Reasonable metadata for system dependencies; version pinning may be restrictive.The metadata is well-formed for CUDA runtime dependencies. However, version
13.0.48-1is pinned to a specific CUDA release. If builds target multiple CUDA versions, consider whether this metadata should be versioned or support version ranges. Confirm that the scanner gracefully handles CUDA installations with different versions.cpp/dependency_scan/metadata/nanobind.yml (1)
1-17: Well-formed bundled dependency metadata.This metadata is complete and consistent with other bundled dependencies. Empty patterns and linker_flags are appropriate for a header-only binding library.
cpp/dependency_scan/metadata/deepgemm.yml (1)
12-12: Path component casing may be inconsistent.The
path_componentsvalue is["DeepGEMM"](PascalCase), whiledeep-ep.ymluses snake_case (["deep_ep_download-src"]). Verify that this casing matches the actual directory name in the repository to ensure proper pattern matching during scanning.cpp/dependency_scan/metadata/pybind11.yml (1)
1-17: Well-formed bundled dependency metadata.Complete and consistent with other properly-defined bundled dependencies. Metadata appears accurate for the pybind11 project.
cpp/dependency_scan/metadata/tensorrt.yml (1)
1-15: Well-formed system dependency metadata with version-aware patterns.The patterns and path components provide good coverage for detecting TensorRT installations across multiple versions. The approach of including version-specific path variants (tensorrt-8, -9, -10) is reasonable for supporting multiple TensorRT versions.
cpp/dependency_scan/metadata/deep-ep.yml (1)
1-14: ****The original review comment is based on an incorrect premise. The schema file (
_schema.yml) explicitly defineslicense,copyright, andhomepageas optional fields—they are not in therequiredarray. The only required fields arename,version, anddescription, all of whichdeep-ep.ymlcontains.Moreover, this pattern of omitting optional metadata is prevalent across the codebase: 10 other dependency files (cuda, pytorch, tensorrt, ucx, ucxx, etc.) also lack these fields. The inconsistency exists but does not constitute a schema violation or deficiency in
deep-ep.yml.The file meets all schema requirements and should be approved as-is.
Likely an incorrect or invalid review comment.
cpp/dependency_scan/metadata/xgrammar.yml (1)
1-15: Structure and content look good.The metadata is well-formed and appropriate for a bundled library that doesn't require direct linking. Version "unknown" is acceptable per documentation, and the single
path_componentwill capture inclusion-based artifacts correctly.cpp/dependency_scan/metadata/cppzmq.yml (1)
1-17: Well-formed metadata with good attribution.The metadata correctly identifies cppzmq as a bundled, header-only C++ library. The copyright attribution and homepage reference are helpful for provenance tracking.
cpp/dependency_scan/metadata/openmpi.yml (1)
1-11: Structure is sound, but verify linker_flags.The metadata correctly captures OpenMPI patterns and common path components. However, the empty
linker_flagsis unusual—OpenMPI typically provides-lmpilinker flag. Verify whether this is intentional (e.g., if OpenMPI is always dynamically loaded in this codebase) or if-lmpishould be added.If OpenMPI is explicitly linked in C++ builds, add the linker flag:
linker_flags: - -lmpicpp/dependency_scan/metadata/ucxx.yml (1)
1-15: Metadata structure is appropriate.The entry correctly identifies UCXX as a bundled C++ bindings library with no direct patterns or linking requirements, which aligns with the header-only nature of C++ bindings.
cpp/dependency_scan/metadata/cutlass.yml (1)
1-17: Comprehensive metadata with proper attribution.The CUTLASS metadata includes appropriate licensing (BSD-3-Clause), copyright attribution, and correctly identifies it as a bundled template library. The empty patterns and linker flags are appropriate for header-only template code.
cpp/dependency_scan/metadata/dpkg.yml (1)
1-29: System library metadata is well-structured.The dpkg.yml entries correctly follow the documented list format and include appropriate linker flags for common system libraries. The version strings match Ubuntu package versioning conventions. Empty
patternsis correct as these libraries are resolved viadpkg-query.
| # Generate dependency files (.d) to track all header dependencies This creates | ||
| # .d files alongside .o files showing all headers used | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP") | ||
| set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify .gitignore coverage for generated .d files.
The -MD -MP flags will generate many .d (dependency) files during builds. These should not be committed to version control. Confirm that a .gitignore rule exists to exclude *.d files (or a more specific pattern like **/*.d).
Run the following command to verify:
🏁 Script executed:
#!/bin/bash
# Check if .gitignore covers .d files
git check-ignore -v **/*.d 2>/dev/null | head -20 || echo "No .gitignore match found for .d files"Length of output: 0
I need to verify this differently since .d files are generated during builds and don't currently exist in the repository. Let me check the .gitignore file directly for patterns that would exclude .d files.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and examine .gitignore files for .d file patterns
echo "=== Checking for .gitignore files ==="
fd -name ".gitignore" -type f
echo -e "\n=== Examining .gitignore content for .d file patterns ==="
if [ -f .gitignore ]; then
echo "--- Root .gitignore ---"
cat .gitignore | head -50
echo -e "\n--- Searching for .d patterns ---"
grep -n "\.d\|\.d/" .gitignore || echo "No .d patterns found in root .gitignore"
fi
echo -e "\n=== Checking cpp directory for .gitignore ==="
if [ -f cpp/.gitignore ]; then
echo "--- cpp/.gitignore ---"
cat cpp/.gitignore
fi
Length of output: 1345
Let me verify where CMake places build artifacts and if the existing cpp/build* gitignore pattern is sufficient.
---
🏁 Script executed:
```shell
#!/bin/bash
# Check CMakeLists.txt for build output configuration
echo "=== Checking CMakeLists.txt for build output directory setup ==="
rg -n "set\(CMAKE_BINARY_DIR|set\(CMAKE_BUILD_TYPE|CMAKE_RUNTIME_OUTPUT_DIRECTORY|CMAKE_LIBRARY_OUTPUT_DIRECTORY|CMAKE_ARCHIVE_OUTPUT_DIRECTORY" cpp/CMakeLists.txt -A2 | head -30
echo -e "\n=== Checking for other potential build locations ==="
rg -n "add_executable|add_library" cpp/CMakeLists.txt | head -10
echo -e "\n=== Verifying what cpp/build* pattern covers ==="
echo "Pattern 'cpp/build*' will match:"
echo " - cpp/build"
echo " - cpp/build/"
echo " - cpp/build_debug"
echo " - cpp/build_release"
echo " - etc."
echo -e "\n=== Checking if .gitignore patterns are recursive ==="
grep -n "\.d\|\*\.o\|\*\.a\|\*\.so" .gitignore || echo "No object file patterns found"
Length of output: 660
Add explicit *.d pattern to .gitignore to ensure comprehensive coverage.
The review comment is correct. While the existing .gitignore includes cpp/build* (which covers standard build directories), this pattern is directory-specific and does not guarantee coverage if build artifacts are generated elsewhere. The -MD -MP flags add dependency tracking globally across CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS, so .d files could potentially be generated in various locations depending on the build configuration.
Adding an explicit *.d pattern to .gitignore provides robust, convention-independent coverage for all generated dependency files, regardless of build directory location.
🤖 Prompt for AI Agents
In cpp/CMakeLists.txt around lines 321 to 324, the project enables generation of
.d dependency files via -MD -MP but there is no explicit .gitignore entry for
them; update the repository .gitignore (root or relevant cpp/.gitignore) to add
a single line with the pattern *.d so generated dependency files are ignored
regardless of build directory, then commit that change.
🧩 Analysis chain
Platform compatibility concern: -MD -MP flags may not work on Windows MSVC builds.
The -MD -MP compiler flags are GCC/Clang specific. Windows MSVC uses /showIncludes instead, which produces a different output format. The code lacks a platform guard (e.g., if(NOT WIN32) or compiler ID check), which could cause build failures or silently skip dependency tracking on Windows.
Please verify that this works on all supported platforms. If Windows builds are supported, consider:
# Generate dependency files (.d) to track all header dependencies This creates
# .d files alongside .o files showing all headers used
-set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
-set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
+if(NOT WIN32)
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
+ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
+endif()Alternatively, if MSVC support is planned, please generate a separate code path using /showIncludes.
I'll verify the review comment by examining the codebase and confirming the compiler flag claims.
🏁 Script executed:
#!/bin/bash
# Examine the actual code at lines 321-324 in cpp/CMakeLists.txt
cat cpp/CMakeLists.txt | sed -n '310,335p'Length of output: 1150
🏁 Script executed:
#!/bin/bash
# Search for platform guards and WIN32 checks in CMakeLists.txt
rg -n "WIN32|CMAKE_CXX_COMPILER_ID|MSVC" cpp/CMakeLists.txt -A2 -B2Length of output: 1490
🏁 Script executed:
#!/bin/bash
# Check .gitignore for .d files handling
cat .gitignore | grep -E '\.d$|\.d\s' || echo "No .d pattern found in .gitignore"Length of output: 33
🏁 Script executed:
#!/bin/bash
# Search for build documentation or platform support statements
fd -e md -e txt -e rst | xargs grep -l -i "platform\|windows\|msvc" | head -20Length of output: 838
🌐 Web query:
GCC Clang -MD -MP compiler flags Windows MSVC compatibility
💡 Result:
Short answer
-
On GCC/Clang, -MD tells the driver to emit a Make-style dependency file (.d) as a side-effect of compilation; -MMD is like -MD but omits system headers; -MP adds phony targets for each header (avoids “missing file” errors). Use -MD (or -MMD) together with -MF or accept the default .d name. [1][2]
-
On MSVC, /MD means something completely different: it selects the DLL C runtime (multithreaded DLL). It is not a dependency-file option. Do not confuse GCC/Clang’s -MD with MSVC’s /MD. [3]
-
clang-cl (the MSVC-compatible Clang driver) aims to be compatible with MSVC options and emit MSVC-style include notes (/showIncludes), but the GCC-style -MD/-MMD/-MP depfile behavior is what you should use when you want Make/Ninja-style .d files from clang/gcc. Build tools (CMake/Ninja) handle the different formats (MSVC /showIncludes vs GCC depfiles). [4][1]
Practical recommendations
- If you want GCC/Clang to produce Make/Ninja dependency files use:
- gcc/clang: -MMD -MF .d -MP (or -MD instead of -MMD if you want system headers included). [1][2]
- If you want MSVC-style include parsing (MSBuild, cl.exe workflows) use cl.exe /showIncludes (or clang-cl which emits a similar “Note: including file:” prefix). Let your build system parse the output appropriately. [4]
- Remember: don’t mix meanings — /MD (MSVC) ≠ -MD (gcc/clang). Ensure your build rules use the right flag for the toolchain.
Sources
- GCC manual: -MD, -MMD, -MF, -MP descriptions. [1]
- Practical explanation and examples (Stack Overflow). [2]
- MSVC /MD meaning (runtime selection). [3]
- clang-cl /showIncludes / MSVC compatibility and build-tool handling. [4]
References
[1] GCC manual (dependency options).
[2] Stack Overflow — using -MD/-MMD with -o.
[3] MSVC docs / Stack Overflow explanation of /MD vs /MT.
[4] CMake/clang-cl /showIncludes compatibility / clang-cl blog posts.
Add platform guard to prevent MSVC build failures on Windows.
The -MD -MP flags are GCC/Clang specific for generating .d dependency files. On MSVC, /MD means something completely different: it selects the DLL C runtime (multithreaded DLL) and is not a dependency-file option. The codebase already demonstrates this awareness—see line 356-358 which wraps similar CMAKE_CXX_FLAGS modifications with if(NOT WIN32). Without a guard, Windows MSVC builds will either fail or silently skip dependency tracking.
Additionally, .d files are not listed in .gitignore, risking accidental commits.
Wrap lines 321-324 with a platform guard:
# Generate dependency files (.d) to track all header dependencies This creates
# .d files alongside .o files showing all headers used
+if(NOT WIN32)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
+endif()And add .d files to .gitignore:
+*.d📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Generate dependency files (.d) to track all header dependencies This creates | |
| # .d files alongside .o files showing all headers used | |
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP") | |
| set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP") | |
| # Generate dependency files (.d) to track all header dependencies This creates | |
| # .d files alongside .o files showing all headers used | |
| if(NOT WIN32) | |
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP") | |
| set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP") | |
| endif() |
🤖 Prompt for AI Agents
In cpp/CMakeLists.txt around lines 321 to 324, the flags `-MD -MP` are being
unconditionally appended to CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS which will
break or misconfigure MSVC on Windows; wrap those two set(...) lines in an
`if(NOT WIN32)` / `endif()` guard (matching the style used later at lines
~356-358) so GCC/Clang-only dependency flags are only applied on non-Windows
platforms, and also add a `.d` pattern (e.g. `*.d`) to the repository .gitignore
so generated dependency files are not accidentally committed.
| name: deepgemm | ||
| version: unknown | ||
| description: DeepGEMM optimized matrix multiplication library | ||
|
|
||
| source: bundled | ||
|
|
||
| patterns: [] | ||
|
|
||
| linker_flags: [] | ||
|
|
||
| path_components: | ||
| - DeepGEMM | ||
|
|
||
| aliases: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete metadata: missing license, copyright, and homepage fields.
This file lacks license, copyright, and homepage fields that are present in other bundled dependency metadata (e.g., nlohmann-json.yml, pybind11.yml). For consistency and discoverability, these fields should be added if available.
🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/deepgemm.yml lines 1-14, the metadata is missing
the standard fields `license`, `copyright`, and `homepage`; add these three keys
to match other bundled metadata files (e.g., nlohmann-json.yml, pybind11.yml).
Populate them with the actual values if available (preferred), otherwise set
them to a clear placeholder such as "unknown" for license and copyright and
"https://unknown" or "unknown" for homepage so the file structure remains
consistent.
| path_components: | ||
| - nvshmem-build | ||
| - nvshmem_project-prefix | ||
| - libs/nvshmem | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix path_components: remove slashed entry.
Matcher compares individual components; “libs/nvshmem” won’t match.
path_components:
- nvshmem-build
- nvshmem_project-prefix
- - libs/nvshmem
+ - nvshmemConsider adding additional patterns (e.g., libnvshmem.so, libnvshmem_host.so) if present in your builds.
Is “license: NVIDIA” the correct SPDX form (e.g., “Proprietary”/“NVIDIA-proprietary”)? Is “source: bundled” accurate for this repo?
I'll verify the review comment by examining the file structure, the path_components matching logic, and license conventions.
---
🏁 Script executed:
```shell
#!/bin/bash
# Read the full nvshmem.yml file to understand context
cat cpp/dependency_scan/metadata/nvshmem.yml
Length of output: 362
🏁 Script executed:
#!/bin/bash
# Search for how path_components is used/matched in the codebase
rg -n "path_components" --type yaml -C 5 | head -100Length of output: 5213
🏁 Script executed:
#!/bin/bash
# Look for matcher/parser logic that processes path_components
rg -n "path_component" --type py --type ts --type js -C 3 | head -150Length of output: 12639
🏁 Script executed:
#!/bin/bash
# Find other similar metadata yml files to understand naming conventions
fd -e yml cpp/dependency_scan/metadata | head -20Length of output: 417
🏁 Script executed:
#!/bin/bash
# Check for examples of license field in yml files
rg -n "^license:" cpp/dependency_scan/metadata/ --type yaml -A 1 | head -50Length of output: 1220
---
🏁 Script executed:
```shell
#!/bin/bash
# Find the actual path matching implementation
rg -n "rightmost\|path_alias" cpp/dependency_scan/scan_build_artifacts.py -A 10 -B 2
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for how path_components matching is used
rg -n "def.*match\|path_component" cpp/dependency_scan/scan_build_artifacts.py -C 5Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for test cases that show path_components usage patterns
rg -n "path_components.*/" cpp/dependency_scan/ -C 2Length of output: 495
🏁 Script executed:
#!/bin/bash
# Verify SPDX license formats
rg -n "^license:" cpp/dependency_scan/metadata/ --type yaml | grep -E "(MIT|Apache|BSD|Proprietary|LicenseRef|NVIDIA)"Length of output: 440
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for path_alias related code and matching logic
rg -n "path_alias" cpp/dependency_scan/scan_build_artifacts.py -C 8
Length of output: 6150
🏁 Script executed:
#!/bin/bash
# Look for how paths are matched in the dependency resolution
rg -n "def.*resolve\|def.*match\|def.*find" cpp/dependency_scan/scan_build_artifacts.py -A 15 | head -100Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there's a split or path parsing function
rg -n "split.*path\|/\|separator" cpp/dependency_scan/scan_build_artifacts.py -C 3 | head -80Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look at the full context around line 743-745 where path_components are processed
sed -n '740,800p' cpp/dependency_scan/scan_build_artifacts.pyLength of output: 2672
---
🌐 Web query:
```web
SPDX license identifier for NVIDIA proprietary software
💡 Result:
Use a LicenseRef identifier. NVIDIA commonly uses "SPDX-License-Identifier: LicenseRef-NvidiaProprietary". (docs.nvidia.com)
Citations:
Fix path_components and license format.
The path matcher splits artifact paths by / and checks individual components right-to-left. The entry libs/nvshmem is stored as a composite key and will never match.
Additionally, NVIDIA is not a valid SPDX license identifier. Use LicenseRef-NvidiaProprietary instead.
-license: NVIDIA
+license: LicenseRef-NvidiaProprietary
path_components:
- nvshmem-build
- nvshmem_project-prefix
-- libs/nvshmem
+- nvshmemCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/nvshmem.yml around lines 15–19, split the
composite path_components entry "libs/nvshmem" into two separate components in
right-to-left order (i.e., "nvshmem" then "libs") so the path matcher can match
individual path segments, and replace the non-SPDX "NVIDIA" license identifier
with "LicenseRef-NvidiaProprietary"; update the YAML values accordingly.
| ├── tensorrt-llm.yml # Individual dependency | ||
| ├── pytorch.yml | ||
| ├── cuda-cudart-12.yml | ||
| └── ... (62+ dependencies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specifier to fenced code block.
Markdown best practices require specifying the language for syntax highlighting. This block appears to be YAML or shell—please add the appropriate identifier.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
21-21: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/README.md at line 21, the fenced code block
lacks a language specifier; update the opening fence to include the correct
language (e.g., ```yaml or ```bash depending on the block contents) so Markdown
renders syntax highlighting, ensuring the chosen identifier matches the block
content.
| ## Pattern Matching Behavior | ||
|
|
||
| The scanner tries matches in this order: | ||
|
|
||
| 1. **Exact match on basename**: `libfoo.so` == `libfoo.so` | ||
| 2. **Exact match on full path**: `/path/to/libfoo.so` == `/path/to/libfoo.so` | ||
| 3. **Substring match**: `foo.cpython` in `foo.cpython-312-x86_64.so` | ||
|
|
||
| **No need to specify match type** - the scanner tries both automatically! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor grammar: Use hyphen for compound modifier.
"Pattern Matching Behavior" should be hyphenated when used as a compound modifier, though as a section header this is stylistic. More critically, ensure consistency with other headers in the document.
🧰 Tools
🪛 LanguageTool
[grammar] ~148-~148: Use a hyphen to join words.
Context: ...ir validation_output/ ``` ## Pattern Matching Behavior The scanner tries mat...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/README.md around lines 148 to 156, the section
header "Pattern Matching Behavior" should use a hyphen when used as a compound
modifier to be consistent with other headers; change the header to
"Pattern-Matching Behavior" (or update other headers to match if you prefer no
hyphens) and ensure the same hyphenation style is applied consistently across
the document.
| def __init__(self): | ||
| self._cache: Dict[str, Optional[str]] = {} | ||
| self._lib_search_paths = self._get_library_search_paths() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make dpkg resolution robust: pre-check binary and resolve non-absolute lib basenames.
- Avoid repeated FileNotFoundError by resolving dpkg-query once.
- Handle inputs like "libcudart.so.12" (from wheel NEEDED) by searching standard lib paths (reuse -l logic).
@@
- def __init__(self):
- self._cache: Dict[str, Optional[str]] = {}
- self._lib_search_paths = self._get_library_search_paths()
+ def __init__(self):
+ import shutil
+ self._cache: Dict[str, Optional[str]] = {}
+ self._lib_search_paths = self._get_library_search_paths()
+ self._dpkg_query = shutil.which("dpkg-query")
+ self._dpkg_available = self._dpkg_query is not None
@@
- # Handle linker flags
+ # Handle linker flags
if file_path.startswith("-l"):
resolved_path = self.find_library_path(file_path)
if not resolved_path:
self._cache[file_path] = None
return None
file_path = resolved_path
+
+ # Handle non-absolute library basenames like "libfoo.so" / "libfoo.so.6" / "libfoo.a"
+ if not os.path.isabs(file_path):
+ base = os.path.basename(file_path)
+ # Try mapping to -l<name> and reuse search paths
+ m = re.match(r'^lib([A-Za-z0-9_\-\+]+)\.(?:so(?:\.\d+)*)$', base)
+ if m:
+ name = m.group(1)
+ resolved_path = self.find_library_path(f"-l{name}")
+ if resolved_path:
+ file_path = resolved_path
+ elif base.startswith("lib") and base.endswith(".a"):
+ name = base[3:-2]
+ resolved_path = self.find_library_path(f"-l{name}")
+ if resolved_path:
+ file_path = resolved_path
@@
- # Query dpkg
+ # Short-circuit if dpkg-query is unavailable
+ if not getattr(self, "_dpkg_available", False):
+ self._cache[file_path] = None
+ return None
+
+ # Query dpkg
try:
- result = subprocess.run(["dpkg-query", "-S", file_path],
+ result = subprocess.run([self._dpkg_query, "-S", file_path],
capture_output=True,
text=True,
timeout=5)Also applies to: 166-228
🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around lines 98 to 101 (and apply
same changes to 166–228), the dpkg resolution and library basename handling are
brittle: resolve the dpkg-query binary once during initialization and cache its
path or mark it unavailable to avoid repeated FileNotFoundError attempts; when a
library name is not an absolute path (e.g., "libcudart.so.12"), treat it as a
basename and attempt to locate it by searching the collected _lib_search_paths
(reusing the existing -l style search logic) before falling back to dpkg-query;
update logic to normalize inputs, try search paths first for non-absolute names,
and only call dpkg-query once (using cached path) if search fails, handling and
logging the case where dpkg-query is not present.
| print( | ||
| f" Resolved {len(dpkg_mappings)} artifacts via dpkg ({len(dpkg_mappings) / len(artifacts) * 100:.1f}%)" | ||
| ) | ||
| print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard division by zero in percentage prints.
When no artifacts are found, current f-strings raise ZeroDivisionError.
- print(
- f" Resolved {len(dpkg_mappings)} artifacts via dpkg ({len(dpkg_mappings) / len(artifacts) * 100:.1f}%)"
- )
+ total_artifacts = len(artifacts)
+ dpkg_pct = (len(dpkg_mappings) / total_artifacts * 100) if total_artifacts else 0.0
+ print(f" Resolved {len(dpkg_mappings)} artifacts via dpkg ({dpkg_pct:.1f}%)")
@@
- print(
- f" Resolved {len(pattern_mappings)} additional artifacts via patterns ({len(pattern_mappings) / len(artifacts) * 100:.1f}%)"
- )
+ pattern_pct = (len(pattern_mappings) / total_artifacts * 100) if total_artifacts else 0.0
+ print(f" Resolved {len(pattern_mappings)} additional artifacts via patterns ({pattern_pct:.1f}%)")
@@
- print(
- f" Mapped (dpkg): {dpkg_known} ({dpkg_known / len(artifacts) * 100:.1f}%)"
- )
- print(
- f" Mapped (patterns): {pattern_known} ({pattern_known / len(artifacts) * 100:.1f}%)"
- )
- print(
- f" Unknown: {total_unknown} ({total_unknown / len(artifacts) * 100:.1f}%)"
- )
+ print(f" Mapped (dpkg): {dpkg_known} ({(dpkg_known / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)")
+ print(f" Mapped (patterns): {pattern_known} ({(pattern_known / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)")
+ print(f" Unknown: {total_unknown} ({(total_unknown / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)")Also applies to: 1230-1233, 1265-1272
🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around lines 1210-1213 (and
similarly at 1230-1233 and 1265-1272), the f-strings calculate a percentage
using len(artifacts) as the denominator which will raise ZeroDivisionError when
artifacts is empty; update each print to guard the division by checking if
len(artifacts) is zero and use 0.0% (or format 0.0) in that case, e.g. compute
total = len(artifacts) and pct = (len(dpkg_mappings) / total * 100) if total
else 0.0, then print using pct to avoid dividing by zero.
| print(f" Reports written to:") | ||
| print(f" - {known_file}") | ||
| print(f" - {unknown_file}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove redundant f-strings (no placeholders).
Minor cleanup flagged by Ruff F541.
- print(f" Reports written to:")
+ print(" Reports written to:")
@@
- print(f" Re-run scanner to improve coverage")
+ print(" Re-run scanner to improve coverage")Also applies to: 1306-1306
🧰 Tools
🪛 Ruff (0.14.1)
1241-1241: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around lines 1241-1243 (and also
at line 1306), some print statements use f-strings without any placeholders
(e.g. print(f" Reports written to:")), which Ruff flags as redundant F541;
replace these f-strings with plain string literals (remove the leading 'f') so
the prints become regular strings and no formatting is attempted.
| summary: | ||
| count: 0 | ||
| action_required: Add patterns to YAML files in dependencies/ for these artifacts | ||
| artifacts: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Correct path reference in action_required message.
Line 3 references dependencies/ directory, but the actual metadata directory is metadata/ throughout the codebase and documentation. This will confuse users trying to follow the guidance.
Update the action_required message to reference the correct directory:
- action_required: Add patterns to YAML files in dependencies/ for these artifacts
+ action_required: Add patterns to YAML files in metadata/ for these artifactsVerify that the scanner code generating this file outputs the correct path reference. If the scanner hardcodes "dependencies/", that should be corrected to match the actual directory structure.
🏁 Script executed:
#!/bin/bash
# Verify the actual content of unknown.yml and directory structure
echo "=== Checking unknown.yml content ==="
find . -name "unknown.yml" -type f -exec cat {} \;
echo -e "\n=== Checking directory structure for 'dependencies' and 'metadata' ==="
find . -type d \( -name "dependencies" -o -name "metadata" \) | head -20
echo -e "\n=== Searching for 'dependencies/' references in documentation ==="
rg -n "dependencies/" --type=md --type=txt | head -20
echo -e "\n=== Searching for 'metadata/' references in documentation ==="
rg -n "metadata/" --type=md --type=txt | head -20
echo -e "\n=== Checking README.md for path references ==="
find . -name "README.md" -exec grep -n -E "(dependencies|metadata)" {} + | head -20Length of output: 4447
🏁 Script executed:
#!/bin/bash
# Find scanner code that generates unknown.yml and the "dependencies/" message
echo "=== Searching for files that generate unknown.yml ==="
rg -n "unknown\.yml" --type=py --type=sh | head -20
echo -e "\n=== Searching for 'dependencies/' string in Python/code files ==="
rg -n "dependencies/" --type=py --type=sh | head -30
echo -e "\n=== Searching for 'action_required' in code ==="
rg -n "action_required" --type=py --type=sh | head -30
echo -e "\n=== Find all Python files in cpp/dependency_scan ==="
find cpp/dependency_scan -name "*.py" -type fLength of output: 2022
🏁 Script executed:
#!/bin/bash
# Examine the actual code around line 1000 where the bug is
echo "=== Code context around line 1000 in scan_build_artifacts.py ==="
sed -n '995,1005p' cpp/dependency_scan/scan_build_artifacts.py
echo -e "\n=== Check the parameter definition for dependencies/metadata ==="
rg -n "dependencies.dir|metadata.dir" --type=py -A2 -B2
echo -e "\n=== Check PatternMatcher initialization and metadata_dir parameter ==="
sed -n '619,670p' cpp/dependency_scan/scan_build_artifacts.pyLength of output: 14794
Correct hardcoded path reference in action_required message and update stale docstrings.
The output message in cpp/dependency_scan/scan_build_artifacts.py at line 1000 references the incorrect directory. The actual implementation uses metadata/ (parameter --metadata-dir, default ./metadata/), but the message tells users to add patterns to dependencies/.
Update line 1000:
- 'Add patterns to YAML files in dependencies/ for these artifacts'
+ 'Add patterns to YAML files in metadata/ for these artifacts'Also update stale docstring references to dependencies/ in lines 619, 626, 636, and 657 to consistently reference metadata/ instead.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around line 1000 (and docstrings
at lines 619, 626, 636, 657), the user-facing message incorrectly references
"dependencies/" instead of the actual metadata directory; update the
action_required message at line 1000 to reference the --metadata-dir (default
./metadata/) and replace all stale docstring occurrences of "dependencies/" with
"metadata/" so wording is consistent and mentions the --metadata-dir flag and
default path where appropriate.
| @@ -0,0 +1,952 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add NVIDIA header; drop shebang; fix lint nits (F541, E712).
Complies with repo guidelines and cleans Ruff warnings.
-#!/usr/bin/env python3
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
@@
- d_file.write_text(f"build/foo.o: ../include/test.h\n")
+ d_file.write_text("build/foo.o: ../include/test.h\n")
@@
- link_file.write_text(f"/usr/bin/c++ -o foo @response.rsp\n")
+ link_file.write_text("/usr/bin/c++ -o foo @response.rsp\n")
@@
- assert artifacts[0].metadata.get('static') == True
+ assert artifacts[0].metadata.get('static') is TrueImport style: prefer module namespace per guidelines.
-from scan_build_artifacts import (Artifact, ArtifactCollector, DpkgResolver,
- Mapping, OutputGenerator, PatternMatcher)
+import scan_build_artifacts as sbaand reference as sba.Artifact, etc. Apply only if it doesn’t overcomplicate the test changes.
Also applies to: 229-229, 260-260, 284-284
🧰 Tools
🪛 Ruff (0.14.1)
1-1: Shebang is present but file is not executable
(EXE001)
🤖 Prompt for AI Agents
In cpp/dependency_scan/tests/test_scan_build_artifacts.py around lines 1 (and
also apply similar changes at lines 229, 260, 284): remove the leading shebang,
add the standard NVIDIA file header/license block at the top, fix Ruff lint
issues by replacing bare f-strings with regular strings where there are no
interpolations (F541) and avoid comparisons like "== True/False" by using direct
boolean checks or "is True"/"is False" as appropriate (E712), and adjust imports
to the module namespace style (e.g., import the module as sba and reference
types/functions as sba.Artifact) unless doing so would overcomplicate the tests.
Ensure changes are minimal and only alter imports/strings/boolean comparisons
and header/shebang as described.
dd4c6ca to
dc413c7
Compare
Signed-off-by: Venky Ganesh <[email protected]>
dc413c7 to
22a3372
Compare
|
@venkywonka Hi Venky, can you briefly explain the principle of this scanning approach? And how long it can take to finish one round of scanning? Thanks |
| @@ -0,0 +1,4 @@ | |||
| summary: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add these files to the gitignore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! planning to add it
| Usage: | ||
| python scan_build_artifacts.py --build-dir build/ --output-dir scan_output/ | ||
| python scan_build_artifacts.py --validate # Validate YAML files | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of code here I'll want to review, but I can do that as I work on integrating it with the dependency database. What do you think about adding an "in developement" or "not ready for use" disclaimer to the README for now? That way you can submit your work now, and I can remove the disclaimer once I'm done reviewing and integrating the code. I'll also patch this script as necessary if I find any issues, so you can review the patches instead of implementing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure @tburt-nv - I agree with that too - i wanted to put this out here even though its very much WIP so that there's concreteness on how this looks to you, so that your async dev can be informed of the same - while i continue to smoothen the edges... I don't intend to merge this immediately, but if you think merging it with a "not ready for use" is better I'm good with that too.
Signed-off-by: Venky Ganesh <[email protected]>
Signed-off-by: Venky Ganesh <[email protected]>
hey @juney-nvidia , I have changed this to draft. On top of that, the code also takes a built wheel, looks at the bundled binaries we ship with it, and reads what are the runtime dependencies mandated by it. As regards to how long it takes to finish it, it takes maybe a minute max. |
- Added a section on recent improvements, highlighting the achievement of 100% artifact coverage through parser bug fixes and pattern enhancements. - Updated usage instructions for the script, including command-line arguments. - Improved metadata structure by introducing new fields and updating existing ones to better reflect dependency sources and matching patterns. - Removed the deprecated file and consolidated system libraries into . - Enhanced test coverage for artifact collection, including handling of malformed paths and CMakeFiles artifacts. This commit aims to improve the clarity and functionality of the dependency scanning process. Signed-off-by: Venky Ganesh <[email protected]>
Signed-off-by: Venky Ganesh <[email protected]>
Summary by CodeRabbit
New Features
Documentation
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.