Skip to content

Conversation

@venkywonka
Copy link
Collaborator

@venkywonka venkywonka commented Oct 25, 2025

Summary by CodeRabbit

  • New Features

    • Added automated build artifact dependency scanning for C++ and CUDA projects.
    • Implemented header dependency tracking during compilation.
    • Created extensible metadata system for managing and resolving dependencies.
  • Documentation

    • Added comprehensive guides for the dependency scanner, metadata management, and workflow integration.
    • Included schema validation and templates for dependency definitions.
  • Tests

    • Added comprehensive test suite for artifact collection, dependency resolution, and report generation.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@venkywonka venkywonka marked this pull request as ready for review October 25, 2025 06:43
@venkywonka venkywonka requested a review from a team as a code owner October 25, 2025 06:43
@coderabbitai coderabbitai bot changed the title [None] @coderabbitai title [None] [feat] Add C++ dependency scanning system Oct 25, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 25, 2025

📝 Walkthrough

Walkthrough

Introduces a new dependency scanning system for TensorRT-LLM C++ builds. A CMakeLists.txt modification enables header dependency tracking. A Python scanner collects build artifacts from .d files, link.txt, and wheels, then resolves them via dpkg-query or YAML pattern matching, generating known/unknown YAML reports. Metadata YAML files define library patterns and path components.

Changes

Cohort / File(s) Summary
Build configuration
cpp/CMakeLists.txt
Appends -MD -MP compiler flags to CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS for automatic dependency tracking file (.d) generation.
Scanning documentation
cpp/dependency_scan/README.md, cpp/dependency_scan/metadata/README.md
Comprehensive guides documenting the artifact scanner architecture, two-tier resolution strategy (dpkg PRIMARY / YAML FALLBACK), workflow, metadata structure, installation, usage, and troubleshooting.
Metadata schema and template
cpp/dependency_scan/metadata/_schema.yml, cpp/dependency_scan/metadata/_template.yml
JSON Schema for validating dependency YAML files (required fields: name, version, description; constraints on naming, lengths, linker flags). YAML template providing reusable blueprint with inline documentation for creating dependency manifests.
Dependency metadata
cpp/dependency_scan/metadata/{cuda,cutlass,deep-ep,deepgemm,cppzmq,nanobind,nlohmann-json,nvshmem,openmpi,pybind11,pytorch,tensorrt,tensorrt-llm,ucx,ucxx,xgrammar}.yml, cpp/dependency_scan/metadata/dpkg.yml
16 new YAML metadata files defining dependencies with fields: name, version, description, license, patterns (binary/library names), linker_flags, path_components, and aliases. Includes bundled libraries (PyTorch, CUTLASS, Pybind11) and system packages (libc6, libzmq5, libnuma1).
Core scanner implementation
cpp/dependency_scan/scan_build_artifacts.py
New Python module implementing a multi-stage artifact scanner with 6 classes: Artifact (data model), Mapping (artifact→dependency binding), DpkgResolver (library→package resolution via dpkg-query), ArtifactCollector (parses .d files, link.txt, wheels; extracts NEEDED libraries via readelf), PatternMatcher (YAML-driven fallback resolution with exact/path/generic strategies), OutputGenerator (produces known.yml and unknown.yml reports). CLI with --build-dir, --output-dir, --metadata-dir, --validate options.
Testing
cpp/dependency_scan/tests/test_scan_build_artifacts.py
Extensive test suite with 100+ test cases covering: data model serialization, DpkgResolver caching and CUDA normalization, ArtifactCollector D/link file parsing with continuations and response files, PatternMatcher exact/substring/path alias matching and YAML schema validation, OutputGenerator report structure, and end-to-end integration workflows with mocked filesystem and subprocess calls.
Sample output
cpp/dependency_scan/scan_output/unknown.yml
Example report template with empty artifacts list and summary showing count=0.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI/main()
    participant AC as ArtifactCollector
    participant DR as DpkgResolver
    participant PM as PatternMatcher
    participant OG as OutputGenerator

    CLI->>AC: collect_all() from build_dir
    activate AC
    AC->>AC: parse .d files (headers)
    AC->>AC: parse link.txt (libraries)
    AC->>AC: scan .whl files
    AC->>AC: readelf -d (NEEDED)
    AC-->>CLI: List[Artifact]
    deactivate AC

    CLI->>DR: resolve artifacts (primary: dpkg)
    activate DR
    DR->>DR: dpkg-query for library paths
    DR-->>CLI: Mapping[success]
    deactivate DR

    CLI->>PM: match remaining (fallback: YAML)
    activate PM
    PM->>PM: load metadata/*.yml
    PM->>PM: exact pattern match
    alt match found
        PM-->>CLI: Mapping[high/medium/low confidence]
    else no match
        PM-->>CLI: unresolved Artifact
    end
    deactivate PM

    CLI->>OG: generate(mappings, artifacts)
    activate OG
    OG->>OG: group by dependency → known.yml
    OG->>OG: collect unmapped → unknown.yml
    OG->>OG: compute coverage & confidence stats
    OG-->>CLI: (known.yml, unknown.yml)
    deactivate OG

    CLI->>CLI: print summary report
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • scan_build_artifacts.py: Six new classes with intricate parsing logic (D files with line continuations, link.txt response file handling, readelf dynamic section parsing, multi-stage resolution pipeline with three matching strategies). Requires careful review of regex patterns, subprocess calls, and error handling around file I/O.
  • test_scan_build_artifacts.py: Over 20 test functions with heavy mocking of subprocess, filesystem, and YAML loading. Validates cascading fallback logic and edge cases (CUDA package normalization, duplicate patterns, underscore-file filtering). Complexity in test fixture setup and assertion chains.
  • Metadata YAML files: While individually simple and repetitive (15+ dependency files), the collective addition is substantial. Verify schema compliance, pattern correctness, and no cross-dependency conflicts.
  • CMakeLists.txt & documentation: Straightforward additions; low review burden.

Areas requiring extra attention:

  • Path canonicalization and LD_LIBRARY_PATH handling in DpkgResolver (security/correctness)
  • Regex patterns in .d file parsing for line continuations and special path cases
  • Subprocess call robustness (readelf, dpkg-query) and error message handling
  • YAML schema validation logic and dpkg.yml format parsing (nested dependency lists)
  • Confidence scoring strategy and coverage calculation in PatternMatcher and OutputGenerator

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is entirely incomplete. It contains only the template structure with no substantive content filled in across any of the required sections. The Description section is empty, the Test Coverage section is empty, the PR Checklist items are all unchecked with no supporting details, and there is no explanation of what the changes accomplish or why they are necessary. The PR body appears to be submitted as-is from the template without any meaningful information provided by the author. The author must complete the PR description by: (1) providing a clear explanation of what the C++ dependency scanning system does and why it is needed in the Description section; (2) documenting the test coverage by referencing the test file (cpp/dependency_scan/tests/test_scan_build_artifacts.py) and explaining what test cases validate the new functionality; and (3) addressing the PR Checklist items, particularly ensuring that the changes follow coding guidelines, test coverage is adequate, and any necessary documentation updates have been made.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 96.55% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The pull request title "Add C++ dependency scanning system" is directly and clearly related to the main change in the changeset. The raw summary confirms that the PR introduces a comprehensive dependency scanning system consisting of CMakeLists.txt modifications, README documentation, multiple metadata YAML files, a Python scanning script (scan_build_artifacts.py), and corresponding tests. The title accurately captures this primary objective, though it does contain formatting deviations from the template (extra spacing between tags and the non-standard "[Do Not Review]" annotation that adds noise).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (18)
cpp/dependency_scan/metadata/README.md (1)

1-760: Excellent comprehensive documentation.

The README provides thorough guidance on the dependency metadata system, including clear examples, troubleshooting steps, and best practices. The structure flows well from quick start through advanced usage. This will significantly aid contributors in understanding and maintaining dependency metadata.

Consider extracting code block language specifiers as mentioned in individual comments above to align with markdown linting standards.

cpp/dependency_scan/metadata/pytorch.yml (1)

11-12: Broaden PyTorch linker flags for better matching.

Add common flags used in practice so -l matches don’t fall back to generic inference.

 linker_flags:
-- - -ltorch_python
+- - -ltorch_python
+  - -ltorch
+  - -lc10
+  - -ltorch_cpu
+  - -ltorch_cuda

Optionally add license/homepage/source fields if known to aid attribution.

Also applies to: 13-18

cpp/dependency_scan/metadata/tensorrt-llm.yml (2)

19-31: Reduce maintenance: use stable substring patterns for Python extensions.

Replacing many ABI-specific entries with a stable prefix cuts churn across Python versions.

 # Python bindings
-- - bindings.cpython-312-x86_64-linux-gnu.so
-- - bindings.cpython-310-x86_64-linux-gnu.so
-- - bindings.cpython-311-x86_64-linux-gnu.so
+- bindings.cpython-

 # Deep learning integration bindings
-- - deep_ep_cpp_tllm.cpython-312-x86_64-linux-gnu.so
-- - deep_ep_cpp_tllm.cpython-310-x86_64-linux-gnu.so
-- - deep_ep_cpp_tllm.cpython-311-x86_64-linux-gnu.so
-- - deep_gemm_cpp_tllm.cpython-312-x86_64-linux-gnu.so
-- - deep_gemm_cpp_tllm.cpython-310-x86_64-linux-gnu.so
-- - deep_gemm_cpp_tllm.cpython-311-x86_64-linux-gnu.so
+- deep_ep_cpp_tllm.cpython-
+- deep_gemm_cpp_tllm.cpython-

The matcher already does substring checks. Add “-ltensorrt_llm” if the project links it via -l.

-linker_flags: []
+linker_flags:
+  - -ltensorrt_llm

Please confirm the exact shared object names (e.g., libnvinfer_plugin_tensorrt_llm.so) align with produced artifacts and whether Python 3.8/3.9 need coverage.


2-2: Version source of truth.

Hardcoding 1.2.0rc1 can drift. Prefer generating this from the project build/version to avoid stale metadata.

cpp/dependency_scan/metadata/_template.yml (1)

58-65: Clarify path_components constraint.

Document that components must be single segments (no “/”) to match PatternMatcher behavior.

 # Path components that indicate this dependency
 # When these strings appear in a binary path, map to this dependency
+# Note: use single directory/file name segments only (no slashes),
+# because the matcher splits paths by "/" and compares components.
cpp/dependency_scan/metadata/ucx.yml (1)

11-13: Optional: include UCX linker flags for completeness.

Add commonly used flags so -l resolution can map without dpkg.

-linker_flags: []
+linker_flags:
+  - -lucp
+  - -lucs
+  - -luct
+  - -lucm
cpp/dependency_scan/metadata/_schema.yml (1)

79-83: Harden schema with additionalProperties.

Prevent typos/unexpected fields from silently passing.

 anyOf:
   - required: ["patterns"]
   - required: ["linker_flags"]
   - required: ["path_components"]
+additionalProperties: false

README currently lists “aliases” as part of “at least one required” sections; schema does not. Please align docs (don’t include aliases) or extend schema if that’s intended.

cpp/dependency_scan/README.md (6)

1-1: Use project-neutral title.

Replace “Claude Code’s” with project naming.

-# Claude Code's Minimal Artifact Scanner
+# TensorRT-LLM Minimal Artifact Scanner

21-21: Fix markdownlint MD040: specify languages for fenced code blocks.

Add appropriate languages to silent the linter and improve rendering.

-```
+```text

(repeat for the ASCII diagrams/outputs above)



Also applies to: 442-442, 598-598

---

`49-55`: **Correct dependencies statement.**

The tool imports PyYAML and optionally jsonschema; also relies on external tools (dpkg-query, readelf).

```diff
-## Installation
-
-No installation required! The script is self-contained with only Python stdlib dependencies.
+## Installation
+
+Requires Python 3.8+, PyYAML, and optionally jsonschema for schema validation.
+System tools: dpkg-query (Debian/Ubuntu) and readelf (binutils).
+Install: `pip install pyyaml jsonschema` (jsonschema optional).

53-56: Avoid hardcoded absolute paths.

Use repo-relative commands to improve portability.

-cd /home/gvenkatarama/tensorrt_llm/cpp/dependency_scan
+cd cpp/dependency_scan
@@
-# After building TensorRT-LLM
-cd /home/gvenkatarama/tensorrt_llm
+# After building TensorRT-LLM
+cd <repo-root>

Also applies to: 91-99


589-591: Align docs with schema: aliases aren’t a required matching section.

Remove “aliases” from the “at least one required” list or update schema accordingly.

-**Pattern sections (at least one required):**
-- `patterns`, `linker_flags`, `path_components`, `aliases`
+**Pattern sections (at least one required):**
+- `patterns`, `linker_flags`, `path_components`

645-646: Remove AI-generation footer.

Drop “Generated by Claude Code” branding from project docs.

-**Generated by Claude Code** | **Phase 1 Complete** | **Target Coverage: ≥98%**
+**Phase 1 Complete** | **Target Coverage: ≥98%**

Also ensure referenced metadata/README.md exists or update link to _template.yml and _schema.yml.

cpp/dependency_scan/scan_build_artifacts.py (5)

400-421: Avoid external 'find'; use pathlib.rglob within build_dir.

Reduces subprocess usage (S603/S607) and improves portability.

-                    import subprocess
-                    try:
-                        result = subprocess.run([
-                            'find',
-                            str(self.build_dir), '-name', basename, '-type', 'f'
-                        ],
-                                                capture_output=True,
-                                                text=True,
-                                                timeout=5)
-                        if result.returncode == 0 and result.stdout.strip():
-                            matches = result.stdout.strip().split('\n')
-                            # Try to find match with similar relative path structure
-                            for match in matches:
-                                if header_path in match or match.endswith(
-                                        header_path):
-                                    canonical_path = os.path.realpath(match)
-                                    break
-                            # If no exact match, use first match
-                            if not os.path.exists(canonical_path) and matches:
-                                canonical_path = os.path.realpath(matches[0])
-                    except Exception:
-                        pass
+                    try:
+                        matches = [str(p) for p in self.build_dir.rglob(basename)]
+                        if matches:
+                            # Prefer closest path-structure match
+                            for match in matches:
+                                if header_path in match or match.endswith(header_path):
+                                    canonical_path = os.path.realpath(match)
+                                    break
+                            if not os.path.exists(canonical_path):
+                                canonical_path = os.path.realpath(matches[0])
+                    except OSError:
+                        pass

542-551: Stabilize wheel artifact paths for dedup/reporting.

Prefix with wheel identifier to avoid collisions and ephemeral temp paths.

-                    artifacts.append(
-                        Artifact(path=str(so_file.relative_to(temp_path)),
+                    artifacts.append(
+                        Artifact(path=f"wheel://{wheel_file.name}/{so_file.relative_to(temp_path)}",
                                  type='binary',
                                  source=str(wheel_file),
                                  metadata={
                                      'wheel': wheel_file.name,
                                      'needed': needed_libs
                                  }))
@@
-                        artifacts.append(
-                            Artifact(path=needed_lib,
+                        artifacts.append(
+                            Artifact(path=needed_lib,
                                     type='library',
                                     source=str(wheel_file),
                                     metadata={
-                                        'from_binary':
-                                        str(so_file.relative_to(temp_path)),
+                                        'from_binary': f"wheel://{wheel_file.name}/{so_file.relative_to(temp_path)}",
                                         'dynamic_dependency':
                                         True
                                     }))

Also applies to: 553-564


223-227: Replace broad except Exception and silent pass/continue with specific exceptions + logging.

Catching everything hides real failures; add logging and handle OSError, subprocess.TimeoutExpired, FileNotFoundError explicitly. Keep try bodies minimal.

I can post a focused diff wiring logging.getLogger(__name__) and updating these sites in one pass. Based on coding guidelines.

Also applies to: 337-338, 420-421, 430-431, 459-460, 498-499, 565-566, 606-607, 693-695, 1089-1091


971-1007: Prefer yaml.safe_dump for emitting reports.

Avoids emitting arbitrary Python tags; preserves current structure.

-        with open(known_file, 'w') as f:
-            yaml.dump(
+        with open(known_file, 'w') as f:
+            yaml.safe_dump(
@@
-        with open(unknown_file, 'w') as f:
-            yaml.dump(
+        with open(unknown_file, 'w') as f:
+            yaml.safe_dump(

616-631: Unify terminology: “metadata” vs “dependencies”.

The code uses --metadata-dir but texts mention “dependencies/”. Standardize on “metadata” to reduce confusion.

Apply small string edits in docstrings/help/prints to consistently reference the metadata directory.

Also applies to: 1000-1006, 1128-1130

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b27810 and dd4c6ca.

📒 Files selected for processing (25)
  • cpp/CMakeLists.txt (1 hunks)
  • cpp/dependency_scan/README.md (1 hunks)
  • cpp/dependency_scan/metadata/README.md (1 hunks)
  • cpp/dependency_scan/metadata/_schema.yml (1 hunks)
  • cpp/dependency_scan/metadata/_template.yml (1 hunks)
  • cpp/dependency_scan/metadata/cppzmq.yml (1 hunks)
  • cpp/dependency_scan/metadata/cuda.yml (1 hunks)
  • cpp/dependency_scan/metadata/cutlass.yml (1 hunks)
  • cpp/dependency_scan/metadata/deep-ep.yml (1 hunks)
  • cpp/dependency_scan/metadata/deepgemm.yml (1 hunks)
  • cpp/dependency_scan/metadata/dpkg.yml (1 hunks)
  • cpp/dependency_scan/metadata/nanobind.yml (1 hunks)
  • cpp/dependency_scan/metadata/nlohmann-json.yml (1 hunks)
  • cpp/dependency_scan/metadata/nvshmem.yml (1 hunks)
  • cpp/dependency_scan/metadata/openmpi.yml (1 hunks)
  • cpp/dependency_scan/metadata/pybind11.yml (1 hunks)
  • cpp/dependency_scan/metadata/pytorch.yml (1 hunks)
  • cpp/dependency_scan/metadata/tensorrt-llm.yml (1 hunks)
  • cpp/dependency_scan/metadata/tensorrt.yml (1 hunks)
  • cpp/dependency_scan/metadata/ucx.yml (1 hunks)
  • cpp/dependency_scan/metadata/ucxx.yml (1 hunks)
  • cpp/dependency_scan/metadata/xgrammar.yml (1 hunks)
  • cpp/dependency_scan/scan_build_artifacts.py (1 hunks)
  • cpp/dependency_scan/scan_output/unknown.yml (1 hunks)
  • cpp/dependency_scan/tests/test_scan_build_artifacts.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • cpp/dependency_scan/tests/test_scan_build_artifacts.py
  • cpp/dependency_scan/scan_build_artifacts.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • cpp/dependency_scan/tests/test_scan_build_artifacts.py
  • cpp/dependency_scan/scan_build_artifacts.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • cpp/dependency_scan/tests/test_scan_build_artifacts.py
  • cpp/dependency_scan/scan_build_artifacts.py
🧠 Learnings (3)
📚 Learning: 2025-08-21T21:48:35.135Z
Learnt from: djns99
PR: NVIDIA/TensorRT-LLM#7104
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:399-417
Timestamp: 2025-08-21T21:48:35.135Z
Learning: CUTLASS extensions in TensorRT-LLM (located under cpp/tensorrt_llm/cutlass_extensions/) are designed to integrate with and extend functionality in the external CUTLASS repository. When analyzing these extensions, their consumers and functionality wiring may exist in the CUTLASS codebase rather than within TensorRT-LLM itself.

Applied to files:

  • cpp/dependency_scan/metadata/cutlass.yml
📚 Learning: 2025-09-16T09:30:09.716Z
Learnt from: tongyuantongyu
PR: NVIDIA/TensorRT-LLM#7763
File: cpp/tensorrt_llm/CMakeLists.txt:297-301
Timestamp: 2025-09-16T09:30:09.716Z
Learning: In the TensorRT-LLM project, NCCL libraries are loaded earlier by PyTorch libraries or the bindings library, so the main shared library doesn't need NCCL paths in its RPATH - the libraries will already be available in the process address space when needed.

Applied to files:

  • cpp/dependency_scan/metadata/tensorrt-llm.yml
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • cpp/dependency_scan/metadata/tensorrt-llm.yml
🧬 Code graph analysis (1)
cpp/dependency_scan/tests/test_scan_build_artifacts.py (1)
cpp/dependency_scan/scan_build_artifacts.py (21)
  • Artifact (47-57)
  • ArtifactCollector (261-609)
  • DpkgResolver (81-253)
  • Mapping (61-73)
  • OutputGenerator (910-1008)
  • PatternMatcher (617-902)
  • to_dict (55-57)
  • to_dict (69-73)
  • find_library_path (128-164)
  • get_package (166-227)
  • _normalize_cuda_package (230-253)
  • _parse_d_file (311-433)
  • _parse_link_file (435-501)
  • _get_needed_libraries (571-609)
  • _scan_wheel (503-568)
  • collect_all (274-309)
  • _match_exact_library (792-836)
  • _match_path_alias (838-870)
  • _match_generic_library (872-902)
  • match (765-790)
  • generate (920-1008)
🪛 LanguageTool
cpp/dependency_scan/metadata/README.md

[grammar] ~148-~148: Use a hyphen to join words.
Context: ...ir validation_output/ ``` ## Pattern Matching Behavior The scanner tries mat...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
cpp/dependency_scan/metadata/README.md

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


442-442: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


598-598: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

cpp/dependency_scan/README.md

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


442-442: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


598-598: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.1)
cpp/dependency_scan/tests/test_scan_build_artifacts.py

1-1: Shebang is present but file is not executable

(EXE001)


229-229: f-string without any placeholders

Remove extraneous f prefix

(F541)


260-260: f-string without any placeholders

Remove extraneous f prefix

(F541)


284-284: Avoid equality comparisons to True; use artifacts[0].metadata.get('static'): for truth checks

Replace with artifacts[0].metadata.get('static')

(E712)

cpp/dependency_scan/scan_build_artifacts.py

200-200: subprocess call: check for execution of untrusted input

(S603)


200-200: Starting a process with a partial executable path

(S607)


223-224: try-except-pass detected, consider logging the exception

(S110)


223-223: Do not catch blind exception: Exception

(BLE001)


337-337: Do not catch blind exception: Exception

(BLE001)


402-402: subprocess call: check for execution of untrusted input

(S603)


402-405: Starting a process with a partial executable path

(S607)


420-421: try-except-pass detected, consider logging the exception

(S110)


420-420: Do not catch blind exception: Exception

(BLE001)


430-431: try-except-continue detected, consider logging the exception

(S112)


430-430: Do not catch blind exception: Exception

(BLE001)


459-459: Do not catch blind exception: Exception

(BLE001)


498-499: try-except-continue detected, consider logging the exception

(S112)


498-498: Do not catch blind exception: Exception

(BLE001)


565-566: try-except-pass detected, consider logging the exception

(S110)


565-565: Do not catch blind exception: Exception

(BLE001)


592-592: subprocess call: check for execution of untrusted input

(S603)


593-593: Starting a process with a partial executable path

(S607)


606-607: try-except-pass detected, consider logging the exception

(S110)


606-606: Do not catch blind exception: Exception

(BLE001)


693-693: Do not catch blind exception: Exception

(BLE001)


1089-1089: Do not catch blind exception: Exception

(BLE001)


1241-1241: f-string without any placeholders

Remove extraneous f prefix

(F541)


1306-1306: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (13)
cpp/dependency_scan/metadata/nlohmann-json.yml (1)

1-17: Well-formed metadata with complete field coverage.

This metadata file is well-structured with all expected fields populated. The information aligns with the actual nlohmann-json project (correct license, copyright, homepage). Empty patterns and linker_flags are appropriate for a header-only library.

cpp/dependency_scan/metadata/cuda.yml (1)

1-18: Reasonable metadata for system dependencies; version pinning may be restrictive.

The metadata is well-formed for CUDA runtime dependencies. However, version 13.0.48-1 is pinned to a specific CUDA release. If builds target multiple CUDA versions, consider whether this metadata should be versioned or support version ranges. Confirm that the scanner gracefully handles CUDA installations with different versions.

cpp/dependency_scan/metadata/nanobind.yml (1)

1-17: Well-formed bundled dependency metadata.

This metadata is complete and consistent with other bundled dependencies. Empty patterns and linker_flags are appropriate for a header-only binding library.

cpp/dependency_scan/metadata/deepgemm.yml (1)

12-12: Path component casing may be inconsistent.

The path_components value is ["DeepGEMM"] (PascalCase), while deep-ep.yml uses snake_case (["deep_ep_download-src"]). Verify that this casing matches the actual directory name in the repository to ensure proper pattern matching during scanning.

cpp/dependency_scan/metadata/pybind11.yml (1)

1-17: Well-formed bundled dependency metadata.

Complete and consistent with other properly-defined bundled dependencies. Metadata appears accurate for the pybind11 project.

cpp/dependency_scan/metadata/tensorrt.yml (1)

1-15: Well-formed system dependency metadata with version-aware patterns.

The patterns and path components provide good coverage for detecting TensorRT installations across multiple versions. The approach of including version-specific path variants (tensorrt-8, -9, -10) is reasonable for supporting multiple TensorRT versions.

cpp/dependency_scan/metadata/deep-ep.yml (1)

1-14: ****

The original review comment is based on an incorrect premise. The schema file (_schema.yml) explicitly defines license, copyright, and homepage as optional fields—they are not in the required array. The only required fields are name, version, and description, all of which deep-ep.yml contains.

Moreover, this pattern of omitting optional metadata is prevalent across the codebase: 10 other dependency files (cuda, pytorch, tensorrt, ucx, ucxx, etc.) also lack these fields. The inconsistency exists but does not constitute a schema violation or deficiency in deep-ep.yml.

The file meets all schema requirements and should be approved as-is.

Likely an incorrect or invalid review comment.

cpp/dependency_scan/metadata/xgrammar.yml (1)

1-15: Structure and content look good.

The metadata is well-formed and appropriate for a bundled library that doesn't require direct linking. Version "unknown" is acceptable per documentation, and the single path_component will capture inclusion-based artifacts correctly.

cpp/dependency_scan/metadata/cppzmq.yml (1)

1-17: Well-formed metadata with good attribution.

The metadata correctly identifies cppzmq as a bundled, header-only C++ library. The copyright attribution and homepage reference are helpful for provenance tracking.

cpp/dependency_scan/metadata/openmpi.yml (1)

1-11: Structure is sound, but verify linker_flags.

The metadata correctly captures OpenMPI patterns and common path components. However, the empty linker_flags is unusual—OpenMPI typically provides -lmpi linker flag. Verify whether this is intentional (e.g., if OpenMPI is always dynamically loaded in this codebase) or if -lmpi should be added.

If OpenMPI is explicitly linked in C++ builds, add the linker flag:

linker_flags:
  - -lmpi
cpp/dependency_scan/metadata/ucxx.yml (1)

1-15: Metadata structure is appropriate.

The entry correctly identifies UCXX as a bundled C++ bindings library with no direct patterns or linking requirements, which aligns with the header-only nature of C++ bindings.

cpp/dependency_scan/metadata/cutlass.yml (1)

1-17: Comprehensive metadata with proper attribution.

The CUTLASS metadata includes appropriate licensing (BSD-3-Clause), copyright attribution, and correctly identifies it as a bundled template library. The empty patterns and linker flags are appropriate for header-only template code.

cpp/dependency_scan/metadata/dpkg.yml (1)

1-29: System library metadata is well-structured.

The dpkg.yml entries correctly follow the documented list format and include appropriate linker flags for common system libraries. The version strings match Ubuntu package versioning conventions. Empty patterns is correct as these libraries are resolved via dpkg-query.

Comment on lines +321 to +334
# Generate dependency files (.d) to track all header dependencies This creates
# .d files alongside .o files showing all headers used
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify .gitignore coverage for generated .d files.

The -MD -MP flags will generate many .d (dependency) files during builds. These should not be committed to version control. Confirm that a .gitignore rule exists to exclude *.d files (or a more specific pattern like **/*.d).

Run the following command to verify:


🏁 Script executed:

#!/bin/bash
# Check if .gitignore covers .d files
git check-ignore -v **/*.d 2>/dev/null | head -20 || echo "No .gitignore match found for .d files"

Length of output: 0


I need to verify this differently since .d files are generated during builds and don't currently exist in the repository. Let me check the .gitignore file directly for patterns that would exclude .d files.


---

🏁 Script executed:

```shell
#!/bin/bash
# Find and examine .gitignore files for .d file patterns
echo "=== Checking for .gitignore files ==="
fd -name ".gitignore" -type f

echo -e "\n=== Examining .gitignore content for .d file patterns ==="
if [ -f .gitignore ]; then
  echo "--- Root .gitignore ---"
  cat .gitignore | head -50
  echo -e "\n--- Searching for .d patterns ---"
  grep -n "\.d\|\.d/" .gitignore || echo "No .d patterns found in root .gitignore"
fi

echo -e "\n=== Checking cpp directory for .gitignore ==="
if [ -f cpp/.gitignore ]; then
  echo "--- cpp/.gitignore ---"
  cat cpp/.gitignore
fi

Length of output: 1345


Let me verify where CMake places build artifacts and if the existing cpp/build* gitignore pattern is sufficient.


---

🏁 Script executed:

```shell
#!/bin/bash
# Check CMakeLists.txt for build output configuration
echo "=== Checking CMakeLists.txt for build output directory setup ==="
rg -n "set\(CMAKE_BINARY_DIR|set\(CMAKE_BUILD_TYPE|CMAKE_RUNTIME_OUTPUT_DIRECTORY|CMAKE_LIBRARY_OUTPUT_DIRECTORY|CMAKE_ARCHIVE_OUTPUT_DIRECTORY" cpp/CMakeLists.txt -A2 | head -30

echo -e "\n=== Checking for other potential build locations ==="
rg -n "add_executable|add_library" cpp/CMakeLists.txt | head -10

echo -e "\n=== Verifying what cpp/build* pattern covers ==="
echo "Pattern 'cpp/build*' will match:"
echo "  - cpp/build"
echo "  - cpp/build/"
echo "  - cpp/build_debug"
echo "  - cpp/build_release"
echo "  - etc."

echo -e "\n=== Checking if .gitignore patterns are recursive ==="
grep -n "\.d\|\*\.o\|\*\.a\|\*\.so" .gitignore || echo "No object file patterns found"

Length of output: 660


Add explicit *.d pattern to .gitignore to ensure comprehensive coverage.

The review comment is correct. While the existing .gitignore includes cpp/build* (which covers standard build directories), this pattern is directory-specific and does not guarantee coverage if build artifacts are generated elsewhere. The -MD -MP flags add dependency tracking globally across CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS, so .d files could potentially be generated in various locations depending on the build configuration.

Adding an explicit *.d pattern to .gitignore provides robust, convention-independent coverage for all generated dependency files, regardless of build directory location.

🤖 Prompt for AI Agents
In cpp/CMakeLists.txt around lines 321 to 324, the project enables generation of
.d dependency files via -MD -MP but there is no explicit .gitignore entry for
them; update the repository .gitignore (root or relevant cpp/.gitignore) to add
a single line with the pattern *.d so generated dependency files are ignored
regardless of build directory, then commit that change.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Platform compatibility concern: -MD -MP flags may not work on Windows MSVC builds.

The -MD -MP compiler flags are GCC/Clang specific. Windows MSVC uses /showIncludes instead, which produces a different output format. The code lacks a platform guard (e.g., if(NOT WIN32) or compiler ID check), which could cause build failures or silently skip dependency tracking on Windows.

Please verify that this works on all supported platforms. If Windows builds are supported, consider:

 # Generate dependency files (.d) to track all header dependencies This creates
 # .d files alongside .o files showing all headers used
-set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
-set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
+if(NOT WIN32)
+  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
+  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
+endif()

Alternatively, if MSVC support is planned, please generate a separate code path using /showIncludes.


I'll verify the review comment by examining the codebase and confirming the compiler flag claims.





🏁 Script executed:

#!/bin/bash
# Examine the actual code at lines 321-324 in cpp/CMakeLists.txt
cat cpp/CMakeLists.txt | sed -n '310,335p'

Length of output: 1150


🏁 Script executed:

#!/bin/bash
# Search for platform guards and WIN32 checks in CMakeLists.txt
rg -n "WIN32|CMAKE_CXX_COMPILER_ID|MSVC" cpp/CMakeLists.txt -A2 -B2

Length of output: 1490


🏁 Script executed:

#!/bin/bash
# Check .gitignore for .d files handling
cat .gitignore | grep -E '\.d$|\.d\s' || echo "No .d pattern found in .gitignore"

Length of output: 33


🏁 Script executed:

#!/bin/bash
# Search for build documentation or platform support statements
fd -e md -e txt -e rst | xargs grep -l -i "platform\|windows\|msvc" | head -20

Length of output: 838


🌐 Web query:

GCC Clang -MD -MP compiler flags Windows MSVC compatibility

💡 Result:

Short answer

  • On GCC/Clang, -MD tells the driver to emit a Make-style dependency file (.d) as a side-effect of compilation; -MMD is like -MD but omits system headers; -MP adds phony targets for each header (avoids “missing file” errors). Use -MD (or -MMD) together with -MF or accept the default .d name. [1][2]

  • On MSVC, /MD means something completely different: it selects the DLL C runtime (multithreaded DLL). It is not a dependency-file option. Do not confuse GCC/Clang’s -MD with MSVC’s /MD. [3]

  • clang-cl (the MSVC-compatible Clang driver) aims to be compatible with MSVC options and emit MSVC-style include notes (/showIncludes), but the GCC-style -MD/-MMD/-MP depfile behavior is what you should use when you want Make/Ninja-style .d files from clang/gcc. Build tools (CMake/Ninja) handle the different formats (MSVC /showIncludes vs GCC depfiles). [4][1]

Practical recommendations

  • If you want GCC/Clang to produce Make/Ninja dependency files use:
    • gcc/clang: -MMD -MF .d -MP (or -MD instead of -MMD if you want system headers included). [1][2]
  • If you want MSVC-style include parsing (MSBuild, cl.exe workflows) use cl.exe /showIncludes (or clang-cl which emits a similar “Note: including file:” prefix). Let your build system parse the output appropriately. [4]
  • Remember: don’t mix meanings — /MD (MSVC) ≠ -MD (gcc/clang). Ensure your build rules use the right flag for the toolchain.

Sources

  • GCC manual: -MD, -MMD, -MF, -MP descriptions. [1]
  • Practical explanation and examples (Stack Overflow). [2]
  • MSVC /MD meaning (runtime selection). [3]
  • clang-cl /showIncludes / MSVC compatibility and build-tool handling. [4]

References
[1] GCC manual (dependency options).
[2] Stack Overflow — using -MD/-MMD with -o.
[3] MSVC docs / Stack Overflow explanation of /MD vs /MT.
[4] CMake/clang-cl /showIncludes compatibility / clang-cl blog posts.


Add platform guard to prevent MSVC build failures on Windows.

The -MD -MP flags are GCC/Clang specific for generating .d dependency files. On MSVC, /MD means something completely different: it selects the DLL C runtime (multithreaded DLL) and is not a dependency-file option. The codebase already demonstrates this awareness—see line 356-358 which wraps similar CMAKE_CXX_FLAGS modifications with if(NOT WIN32). Without a guard, Windows MSVC builds will either fail or silently skip dependency tracking.

Additionally, .d files are not listed in .gitignore, risking accidental commits.

Wrap lines 321-324 with a platform guard:

 # Generate dependency files (.d) to track all header dependencies This creates
 # .d files alongside .o files showing all headers used
+if(NOT WIN32)
   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
   set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
+endif()

And add .d files to .gitignore:

+*.d
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Generate dependency files (.d) to track all header dependencies This creates
# .d files alongside .o files showing all headers used
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
# Generate dependency files (.d) to track all header dependencies This creates
# .d files alongside .o files showing all headers used
if(NOT WIN32)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")
endif()
🤖 Prompt for AI Agents
In cpp/CMakeLists.txt around lines 321 to 324, the flags `-MD -MP` are being
unconditionally appended to CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS which will
break or misconfigure MSVC on Windows; wrap those two set(...) lines in an
`if(NOT WIN32)` / `endif()` guard (matching the style used later at lines
~356-358) so GCC/Clang-only dependency flags are only applied on non-Windows
platforms, and also add a `.d` pattern (e.g. `*.d`) to the repository .gitignore
so generated dependency files are not accidentally committed.

Comment on lines 1 to 14
name: deepgemm
version: unknown
description: DeepGEMM optimized matrix multiplication library

source: bundled

patterns: []

linker_flags: []

path_components:
- DeepGEMM

aliases: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incomplete metadata: missing license, copyright, and homepage fields.

This file lacks license, copyright, and homepage fields that are present in other bundled dependency metadata (e.g., nlohmann-json.yml, pybind11.yml). For consistency and discoverability, these fields should be added if available.

🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/deepgemm.yml lines 1-14, the metadata is missing
the standard fields `license`, `copyright`, and `homepage`; add these three keys
to match other bundled metadata files (e.g., nlohmann-json.yml, pybind11.yml).
Populate them with the actual values if available (preferred), otherwise set
them to a clear placeholder such as "unknown" for license and copyright and
"https://unknown" or "unknown" for homepage so the file structure remains
consistent.

Comment on lines 15 to 19
path_components:
- nvshmem-build
- nvshmem_project-prefix
- libs/nvshmem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix path_components: remove slashed entry.

Matcher compares individual components; “libs/nvshmem” won’t match.

 path_components:
   - nvshmem-build
   - nvshmem_project-prefix
-  - libs/nvshmem
+  - nvshmem

Consider adding additional patterns (e.g., libnvshmem.so, libnvshmem_host.so) if present in your builds.

Is “license: NVIDIA” the correct SPDX form (e.g., “Proprietary”/“NVIDIA-proprietary”)? Is “source: bundled” accurate for this repo?


I'll verify the review comment by examining the file structure, the path_components matching logic, and license conventions.


---

🏁 Script executed:

```shell
#!/bin/bash
# Read the full nvshmem.yml file to understand context
cat cpp/dependency_scan/metadata/nvshmem.yml

Length of output: 362


🏁 Script executed:

#!/bin/bash
# Search for how path_components is used/matched in the codebase
rg -n "path_components" --type yaml -C 5 | head -100

Length of output: 5213


🏁 Script executed:

#!/bin/bash
# Look for matcher/parser logic that processes path_components
rg -n "path_component" --type py --type ts --type js -C 3 | head -150

Length of output: 12639


🏁 Script executed:

#!/bin/bash
# Find other similar metadata yml files to understand naming conventions
fd -e yml cpp/dependency_scan/metadata | head -20

Length of output: 417


🏁 Script executed:

#!/bin/bash
# Check for examples of license field in yml files
rg -n "^license:" cpp/dependency_scan/metadata/ --type yaml -A 1 | head -50

Length of output: 1220



---

🏁 Script executed:

```shell
#!/bin/bash
# Find the actual path matching implementation
rg -n "rightmost\|path_alias" cpp/dependency_scan/scan_build_artifacts.py -A 10 -B 2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for how path_components matching is used
rg -n "def.*match\|path_component" cpp/dependency_scan/scan_build_artifacts.py -C 5

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for test cases that show path_components usage patterns
rg -n "path_components.*/" cpp/dependency_scan/ -C 2

Length of output: 495


🏁 Script executed:

#!/bin/bash
# Verify SPDX license formats
rg -n "^license:" cpp/dependency_scan/metadata/ --type yaml | grep -E "(MIT|Apache|BSD|Proprietary|LicenseRef|NVIDIA)"

Length of output: 440



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for path_alias related code and matching logic
rg -n "path_alias" cpp/dependency_scan/scan_build_artifacts.py -C 8

Length of output: 6150


🏁 Script executed:

#!/bin/bash
# Look for how paths are matched in the dependency resolution
rg -n "def.*resolve\|def.*match\|def.*find" cpp/dependency_scan/scan_build_artifacts.py -A 15 | head -100

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there's a split or path parsing function
rg -n "split.*path\|/\|separator" cpp/dependency_scan/scan_build_artifacts.py -C 3 | head -80

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look at the full context around line 743-745 where path_components are processed
sed -n '740,800p' cpp/dependency_scan/scan_build_artifacts.py

Length of output: 2672



---

🌐 Web query:

```web
SPDX license identifier for NVIDIA proprietary software

💡 Result:

Use a LicenseRef identifier. NVIDIA commonly uses "SPDX-License-Identifier: LicenseRef-NvidiaProprietary". (docs.nvidia.com)

Citations:


Fix path_components and license format.

The path matcher splits artifact paths by / and checks individual components right-to-left. The entry libs/nvshmem is stored as a composite key and will never match.

Additionally, NVIDIA is not a valid SPDX license identifier. Use LicenseRef-NvidiaProprietary instead.

-license: NVIDIA
+license: LicenseRef-NvidiaProprietary

 path_components:
 - nvshmem-build
 - nvshmem_project-prefix
-- libs/nvshmem
+- nvshmem

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/nvshmem.yml around lines 15–19, split the
composite path_components entry "libs/nvshmem" into two separate components in
right-to-left order (i.e., "nvshmem" then "libs") so the path matcher can match
individual path segments, and replace the non-SPDX "NVIDIA" license identifier
with "LicenseRef-NvidiaProprietary"; update the YAML values accordingly.

├── tensorrt-llm.yml # Individual dependency
├── pytorch.yml
├── cuda-cudart-12.yml
└── ... (62+ dependencies)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifier to fenced code block.

Markdown best practices require specifying the language for syntax highlighting. This block appears to be YAML or shell—please add the appropriate identifier.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/README.md at line 21, the fenced code block
lacks a language specifier; update the opening fence to include the correct
language (e.g., ```yaml or ```bash depending on the block contents) so Markdown
renders syntax highlighting, ensuring the chosen identifier matches the block
content.

Comment on lines 148 to 156
## Pattern Matching Behavior

The scanner tries matches in this order:

1. **Exact match on basename**: `libfoo.so` == `libfoo.so`
2. **Exact match on full path**: `/path/to/libfoo.so` == `/path/to/libfoo.so`
3. **Substring match**: `foo.cpython` in `foo.cpython-312-x86_64.so`

**No need to specify match type** - the scanner tries both automatically!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor grammar: Use hyphen for compound modifier.

"Pattern Matching Behavior" should be hyphenated when used as a compound modifier, though as a section header this is stylistic. More critically, ensure consistency with other headers in the document.

🧰 Tools
🪛 LanguageTool

[grammar] ~148-~148: Use a hyphen to join words.
Context: ...ir validation_output/ ``` ## Pattern Matching Behavior The scanner tries mat...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In cpp/dependency_scan/metadata/README.md around lines 148 to 156, the section
header "Pattern Matching Behavior" should use a hyphen when used as a compound
modifier to be consistent with other headers; change the header to
"Pattern-Matching Behavior" (or update other headers to match if you prefer no
hyphens) and ensure the same hyphenation style is applied consistently across
the document.

Comment on lines +98 to +101
def __init__(self):
self._cache: Dict[str, Optional[str]] = {}
self._lib_search_paths = self._get_library_search_paths()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make dpkg resolution robust: pre-check binary and resolve non-absolute lib basenames.

  • Avoid repeated FileNotFoundError by resolving dpkg-query once.
  • Handle inputs like "libcudart.so.12" (from wheel NEEDED) by searching standard lib paths (reuse -l logic).
@@
-    def __init__(self):
-        self._cache: Dict[str, Optional[str]] = {}
-        self._lib_search_paths = self._get_library_search_paths()
+    def __init__(self):
+        import shutil
+        self._cache: Dict[str, Optional[str]] = {}
+        self._lib_search_paths = self._get_library_search_paths()
+        self._dpkg_query = shutil.which("dpkg-query")
+        self._dpkg_available = self._dpkg_query is not None
@@
-        # Handle linker flags
+        # Handle linker flags
         if file_path.startswith("-l"):
             resolved_path = self.find_library_path(file_path)
             if not resolved_path:
                 self._cache[file_path] = None
                 return None
             file_path = resolved_path
+
+        # Handle non-absolute library basenames like "libfoo.so" / "libfoo.so.6" / "libfoo.a"
+        if not os.path.isabs(file_path):
+            base = os.path.basename(file_path)
+            # Try mapping to -l<name> and reuse search paths
+            m = re.match(r'^lib([A-Za-z0-9_\-\+]+)\.(?:so(?:\.\d+)*)$', base)
+            if m:
+                name = m.group(1)
+                resolved_path = self.find_library_path(f"-l{name}")
+                if resolved_path:
+                    file_path = resolved_path
+            elif base.startswith("lib") and base.endswith(".a"):
+                name = base[3:-2]
+                resolved_path = self.find_library_path(f"-l{name}")
+                if resolved_path:
+                    file_path = resolved_path
@@
-        # Query dpkg
+        # Short-circuit if dpkg-query is unavailable
+        if not getattr(self, "_dpkg_available", False):
+            self._cache[file_path] = None
+            return None
+
+        # Query dpkg
         try:
-            result = subprocess.run(["dpkg-query", "-S", file_path],
+            result = subprocess.run([self._dpkg_query, "-S", file_path],
                                     capture_output=True,
                                     text=True,
                                     timeout=5)

Also applies to: 166-228

🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around lines 98 to 101 (and apply
same changes to 166–228), the dpkg resolution and library basename handling are
brittle: resolve the dpkg-query binary once during initialization and cache its
path or mark it unavailable to avoid repeated FileNotFoundError attempts; when a
library name is not an absolute path (e.g., "libcudart.so.12"), treat it as a
basename and attempt to locate it by searching the collected _lib_search_paths
(reusing the existing -l style search logic) before falling back to dpkg-query;
update logic to normalize inputs, try search paths first for non-absolute names,
and only call dpkg-query once (using cached path) if search fails, handling and
logging the case where dpkg-query is not present.

Comment on lines +1210 to +1213
print(
f" Resolved {len(dpkg_mappings)} artifacts via dpkg ({len(dpkg_mappings) / len(artifacts) * 100:.1f}%)"
)
print()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard division by zero in percentage prints.

When no artifacts are found, current f-strings raise ZeroDivisionError.

-    print(
-        f"  Resolved {len(dpkg_mappings)} artifacts via dpkg ({len(dpkg_mappings) / len(artifacts) * 100:.1f}%)"
-    )
+    total_artifacts = len(artifacts)
+    dpkg_pct = (len(dpkg_mappings) / total_artifacts * 100) if total_artifacts else 0.0
+    print(f"  Resolved {len(dpkg_mappings)} artifacts via dpkg ({dpkg_pct:.1f}%)")
@@
-    print(
-        f"  Resolved {len(pattern_mappings)} additional artifacts via patterns ({len(pattern_mappings) / len(artifacts) * 100:.1f}%)"
-    )
+    pattern_pct = (len(pattern_mappings) / total_artifacts * 100) if total_artifacts else 0.0
+    print(f"  Resolved {len(pattern_mappings)} additional artifacts via patterns ({pattern_pct:.1f}%)")
@@
-    print(
-        f"  Mapped (dpkg): {dpkg_known} ({dpkg_known / len(artifacts) * 100:.1f}%)"
-    )
-    print(
-        f"  Mapped (patterns): {pattern_known} ({pattern_known / len(artifacts) * 100:.1f}%)"
-    )
-    print(
-        f"  Unknown: {total_unknown} ({total_unknown / len(artifacts) * 100:.1f}%)"
-    )
+    print(f"  Mapped (dpkg): {dpkg_known} ({(dpkg_known / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)")
+    print(f"  Mapped (patterns): {pattern_known} ({(pattern_known / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)")
+    print(f"  Unknown: {total_unknown} ({(total_unknown / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)")

Also applies to: 1230-1233, 1265-1272

🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around lines 1210-1213 (and
similarly at 1230-1233 and 1265-1272), the f-strings calculate a percentage
using len(artifacts) as the denominator which will raise ZeroDivisionError when
artifacts is empty; update each print to guard the division by checking if
len(artifacts) is zero and use 0.0% (or format 0.0) in that case, e.g. compute
total = len(artifacts) and pct = (len(dpkg_mappings) / total * 100) if total
else 0.0, then print using pct to avoid dividing by zero.

Comment on lines +1241 to +1243
print(f" Reports written to:")
print(f" - {known_file}")
print(f" - {unknown_file}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove redundant f-strings (no placeholders).

Minor cleanup flagged by Ruff F541.

-    print(f"  Reports written to:")
+    print("  Reports written to:")
@@
-        print(f"  Re-run scanner to improve coverage")
+        print("  Re-run scanner to improve coverage")

Also applies to: 1306-1306

🧰 Tools
🪛 Ruff (0.14.1)

1241-1241: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around lines 1241-1243 (and also
at line 1306), some print statements use f-strings without any placeholders
(e.g. print(f"  Reports written to:")), which Ruff flags as redundant F541;
replace these f-strings with plain string literals (remove the leading 'f') so
the prints become regular strings and no formatting is attempted.

Comment on lines 1 to 4
summary:
count: 0
action_required: Add patterns to YAML files in dependencies/ for these artifacts
artifacts: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Correct path reference in action_required message.

Line 3 references dependencies/ directory, but the actual metadata directory is metadata/ throughout the codebase and documentation. This will confuse users trying to follow the guidance.

Update the action_required message to reference the correct directory:

- action_required: Add patterns to YAML files in dependencies/ for these artifacts
+ action_required: Add patterns to YAML files in metadata/ for these artifacts

Verify that the scanner code generating this file outputs the correct path reference. If the scanner hardcodes "dependencies/", that should be corrected to match the actual directory structure.


🏁 Script executed:

#!/bin/bash
# Verify the actual content of unknown.yml and directory structure
echo "=== Checking unknown.yml content ==="
find . -name "unknown.yml" -type f -exec cat {} \;

echo -e "\n=== Checking directory structure for 'dependencies' and 'metadata' ==="
find . -type d \( -name "dependencies" -o -name "metadata" \) | head -20

echo -e "\n=== Searching for 'dependencies/' references in documentation ==="
rg -n "dependencies/" --type=md --type=txt | head -20

echo -e "\n=== Searching for 'metadata/' references in documentation ==="
rg -n "metadata/" --type=md --type=txt | head -20

echo -e "\n=== Checking README.md for path references ==="
find . -name "README.md" -exec grep -n -E "(dependencies|metadata)" {} + | head -20

Length of output: 4447



🏁 Script executed:

#!/bin/bash
# Find scanner code that generates unknown.yml and the "dependencies/" message

echo "=== Searching for files that generate unknown.yml ==="
rg -n "unknown\.yml" --type=py --type=sh | head -20

echo -e "\n=== Searching for 'dependencies/' string in Python/code files ==="
rg -n "dependencies/" --type=py --type=sh | head -30

echo -e "\n=== Searching for 'action_required' in code ==="
rg -n "action_required" --type=py --type=sh | head -30

echo -e "\n=== Find all Python files in cpp/dependency_scan ==="
find cpp/dependency_scan -name "*.py" -type f

Length of output: 2022



🏁 Script executed:

#!/bin/bash
# Examine the actual code around line 1000 where the bug is

echo "=== Code context around line 1000 in scan_build_artifacts.py ==="
sed -n '995,1005p' cpp/dependency_scan/scan_build_artifacts.py

echo -e "\n=== Check the parameter definition for dependencies/metadata ==="
rg -n "dependencies.dir|metadata.dir" --type=py -A2 -B2

echo -e "\n=== Check PatternMatcher initialization and metadata_dir parameter ==="
sed -n '619,670p' cpp/dependency_scan/scan_build_artifacts.py

Length of output: 14794


Correct hardcoded path reference in action_required message and update stale docstrings.

The output message in cpp/dependency_scan/scan_build_artifacts.py at line 1000 references the incorrect directory. The actual implementation uses metadata/ (parameter --metadata-dir, default ./metadata/), but the message tells users to add patterns to dependencies/.

Update line 1000:

- 'Add patterns to YAML files in dependencies/ for these artifacts'
+ 'Add patterns to YAML files in metadata/ for these artifacts'

Also update stale docstring references to dependencies/ in lines 619, 626, 636, and 657 to consistently reference metadata/ instead.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cpp/dependency_scan/scan_build_artifacts.py around line 1000 (and docstrings
at lines 619, 626, 636, 657), the user-facing message incorrectly references
"dependencies/" instead of the actual metadata directory; update the
action_required message at line 1000 to reference the --metadata-dir (default
./metadata/) and replace all stale docstring occurrences of "dependencies/" with
"metadata/" so wording is consistent and mentions the --metadata-dir flag and
default path where appropriate.

@@ -0,0 +1,952 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add NVIDIA header; drop shebang; fix lint nits (F541, E712).

Complies with repo guidelines and cleans Ruff warnings.

-#!/usr/bin/env python3
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
@@
-        d_file.write_text(f"build/foo.o: ../include/test.h\n")
+        d_file.write_text("build/foo.o: ../include/test.h\n")
@@
-        link_file.write_text(f"/usr/bin/c++ -o foo @response.rsp\n")
+        link_file.write_text("/usr/bin/c++ -o foo @response.rsp\n")
@@
-            assert artifacts[0].metadata.get('static') == True
+            assert artifacts[0].metadata.get('static') is True

Import style: prefer module namespace per guidelines.

-from scan_build_artifacts import (Artifact, ArtifactCollector, DpkgResolver,
-                                  Mapping, OutputGenerator, PatternMatcher)
+import scan_build_artifacts as sba

and reference as sba.Artifact, etc. Apply only if it doesn’t overcomplicate the test changes.

Also applies to: 229-229, 260-260, 284-284

🧰 Tools
🪛 Ruff (0.14.1)

1-1: Shebang is present but file is not executable

(EXE001)

🤖 Prompt for AI Agents
In cpp/dependency_scan/tests/test_scan_build_artifacts.py around lines 1 (and
also apply similar changes at lines 229, 260, 284): remove the leading shebang,
add the standard NVIDIA file header/license block at the top, fix Ruff lint
issues by replacing bare f-strings with regular strings where there are no
interpolations (F541) and avoid comparisons like "== True/False" by using direct
boolean checks or "is True"/"is False" as appropriate (E712), and adjust imports
to the module namespace style (e.g., import the module as sba and reference
types/functions as sba.Artifact) unless doing so would overcomplicate the tests.
Ensure changes are minimal and only alter imports/strings/boolean comparisons
and header/shebang as described.

@venkywonka venkywonka force-pushed the venky/gen-dependency-artifacts branch from dd4c6ca to dc413c7 Compare October 25, 2025 06:53
Signed-off-by: Venky Ganesh <[email protected]>
@venkywonka venkywonka force-pushed the venky/gen-dependency-artifacts branch from dc413c7 to 22a3372 Compare October 25, 2025 06:59
@venkywonka venkywonka marked this pull request as draft October 25, 2025 07:00
@juney-nvidia
Copy link
Collaborator

@venkywonka Hi Venky, can you briefly explain the principle of this scanning approach? And how long it can take to finish one round of scanning?

Thanks
June

@@ -0,0 +1,4 @@
summary:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add these files to the gitignore, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! planning to add it

Usage:
python scan_build_artifacts.py --build-dir build/ --output-dir scan_output/
python scan_build_artifacts.py --validate # Validate YAML files
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code here I'll want to review, but I can do that as I work on integrating it with the dependency database. What do you think about adding an "in developement" or "not ready for use" disclaimer to the README for now? That way you can submit your work now, and I can remove the disclaimer once I'm done reviewing and integrating the code. I'll also patch this script as necessary if I find any issues, so you can review the patches instead of implementing them.

Copy link
Collaborator Author

@venkywonka venkywonka Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure @tburt-nv - I agree with that too - i wanted to put this out here even though its very much WIP so that there's concreteness on how this looks to you, so that your async dev can be informed of the same - while i continue to smoothen the edges... I don't intend to merge this immediately, but if you think merging it with a "not ready for use" is better I'm good with that too.

@venkywonka venkywonka changed the title [None] [feat] Add C++ dependency scanning system [None] [feat] [Do Not Review] Add C++ dependency scanning system Oct 27, 2025
@venkywonka
Copy link
Collaborator Author

@venkywonka Hi Venky, can you briefly explain the principle of this scanning approach? And how long it can take to finish one round of scanning?

Thanks June

hey @juney-nvidia , I have changed this to draft.
I have tried to outline things in the README attached - but basically to find the dependencies that end up in the binaries we build and ship, trying to take a first-principles-approach of actually tracking them from the information cmake gives. These are the .d (depfiles) for header files that get actually included + link.txt files for any included precompiled libraries.

On top of that, the code also takes a built wheel, looks at the bundled binaries we ship with it, and reads what are the runtime dependencies mandated by it.

As regards to how long it takes to finish it, it takes maybe a minute max.

- Added a section on recent improvements, highlighting the achievement of 100% artifact coverage through parser bug fixes and pattern enhancements.
- Updated usage instructions for the  script, including command-line arguments.
- Improved metadata structure by introducing new fields and updating existing ones to better reflect dependency sources and matching patterns.
- Removed the deprecated  file and consolidated system libraries into .
- Enhanced test coverage for artifact collection, including handling of malformed paths and CMakeFiles artifacts.

This commit aims to improve the clarity and functionality of the dependency scanning process.

Signed-off-by: Venky Ganesh <[email protected]>
Signed-off-by: Venky Ganesh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants