[None] [feat] [Do Not Review] Add C++ dependency scanning system #8662

venkywonka · 2025-10-25T06:40:35Z

Summary by CodeRabbit

New Features
- Added automated build artifact dependency scanning for C++ and CUDA projects.
- Implemented header dependency tracking during compilation.
- Created extensible metadata system for managing and resolving dependencies.
Documentation
- Added comprehensive guides for the dependency scanner, metadata management, and workflow integration.
- Included schema validation and templates for dependency definitions.
Tests
- Added comprehensive test suite for artifact collection, dependency resolution, and report generation.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-10-25T06:52:37Z

📝 Walkthrough

Walkthrough

Introduces a new dependency scanning system for TensorRT-LLM C++ builds. A CMakeLists.txt modification enables header dependency tracking. A Python scanner collects build artifacts from .d files, link.txt, and wheels, then resolves them via dpkg-query or YAML pattern matching, generating known/unknown YAML reports. Metadata YAML files define library patterns and path components.

Changes

Cohort / File(s)	Summary
Build configuration `cpp/CMakeLists.txt`	Appends `-MD -MP` compiler flags to CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS for automatic dependency tracking file (.d) generation.
Scanning documentation `cpp/dependency_scan/README.md`, `cpp/dependency_scan/metadata/README.md`	Comprehensive guides documenting the artifact scanner architecture, two-tier resolution strategy (dpkg PRIMARY / YAML FALLBACK), workflow, metadata structure, installation, usage, and troubleshooting.
Metadata schema and template `cpp/dependency_scan/metadata/_schema.yml`, `cpp/dependency_scan/metadata/_template.yml`	JSON Schema for validating dependency YAML files (required fields: name, version, description; constraints on naming, lengths, linker flags). YAML template providing reusable blueprint with inline documentation for creating dependency manifests.
Dependency metadata `cpp/dependency_scan/metadata/{cuda,cutlass,deep-ep,deepgemm,cppzmq,nanobind,nlohmann-json,nvshmem,openmpi,pybind11,pytorch,tensorrt,tensorrt-llm,ucx,ucxx,xgrammar}.yml`, `cpp/dependency_scan/metadata/dpkg.yml`	16 new YAML metadata files defining dependencies with fields: name, version, description, license, patterns (binary/library names), linker_flags, path_components, and aliases. Includes bundled libraries (PyTorch, CUTLASS, Pybind11) and system packages (libc6, libzmq5, libnuma1).
Core scanner implementation `cpp/dependency_scan/scan_build_artifacts.py`	New Python module implementing a multi-stage artifact scanner with 6 classes: `Artifact` (data model), `Mapping` (artifact→dependency binding), `DpkgResolver` (library→package resolution via dpkg-query), `ArtifactCollector` (parses .d files, link.txt, wheels; extracts NEEDED libraries via readelf), `PatternMatcher` (YAML-driven fallback resolution with exact/path/generic strategies), `OutputGenerator` (produces known.yml and unknown.yml reports). CLI with `--build-dir`, `--output-dir`, `--metadata-dir`, `--validate` options.
Testing `cpp/dependency_scan/tests/test_scan_build_artifacts.py`	Extensive test suite with 100+ test cases covering: data model serialization, DpkgResolver caching and CUDA normalization, ArtifactCollector D/link file parsing with continuations and response files, PatternMatcher exact/substring/path alias matching and YAML schema validation, OutputGenerator report structure, and end-to-end integration workflows with mocked filesystem and subprocess calls.
Sample output `cpp/dependency_scan/scan_output/unknown.yml`	Example report template with empty artifacts list and summary showing count=0.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI/main()
    participant AC as ArtifactCollector
    participant DR as DpkgResolver
    participant PM as PatternMatcher
    participant OG as OutputGenerator

    CLI->>AC: collect_all() from build_dir
    activate AC
    AC->>AC: parse .d files (headers)
    AC->>AC: parse link.txt (libraries)
    AC->>AC: scan .whl files
    AC->>AC: readelf -d (NEEDED)
    AC-->>CLI: List[Artifact]
    deactivate AC

    CLI->>DR: resolve artifacts (primary: dpkg)
    activate DR
    DR->>DR: dpkg-query for library paths
    DR-->>CLI: Mapping[success]
    deactivate DR

    CLI->>PM: match remaining (fallback: YAML)
    activate PM
    PM->>PM: load metadata/*.yml
    PM->>PM: exact pattern match
    alt match found
        PM-->>CLI: Mapping[high/medium/low confidence]
    else no match
        PM-->>CLI: unresolved Artifact
    end
    deactivate PM

    CLI->>OG: generate(mappings, artifacts)
    activate OG
    OG->>OG: group by dependency → known.yml
    OG->>OG: collect unmapped → unknown.yml
    OG->>OG: compute coverage & confidence stats
    OG-->>CLI: (known.yml, unknown.yml)
    deactivate OG

    CLI->>CLI: print summary report

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

scan_build_artifacts.py: Six new classes with intricate parsing logic (D files with line continuations, link.txt response file handling, readelf dynamic section parsing, multi-stage resolution pipeline with three matching strategies). Requires careful review of regex patterns, subprocess calls, and error handling around file I/O.
test_scan_build_artifacts.py: Over 20 test functions with heavy mocking of subprocess, filesystem, and YAML loading. Validates cascading fallback logic and edge cases (CUDA package normalization, duplicate patterns, underscore-file filtering). Complexity in test fixture setup and assertion chains.
Metadata YAML files: While individually simple and repetitive (15+ dependency files), the collective addition is substantial. Verify schema compliance, pattern correctness, and no cross-dependency conflicts.
CMakeLists.txt & documentation: Straightforward additions; low review burden.

Areas requiring extra attention:

Path canonicalization and LD_LIBRARY_PATH handling in DpkgResolver (security/correctness)
Regex patterns in .d file parsing for line continuations and special path cases
Subprocess call robustness (readelf, dpkg-query) and error message handling
YAML schema validation logic and dpkg.yml format parsing (nested dependency lists)
Confidence scoring strategy and coverage calculation in PatternMatcher and OutputGenerator

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is entirely incomplete. It contains only the template structure with no substantive content filled in across any of the required sections. The Description section is empty, the Test Coverage section is empty, the PR Checklist items are all unchecked with no supporting details, and there is no explanation of what the changes accomplish or why they are necessary. The PR body appears to be submitted as-is from the template without any meaningful information provided by the author.	The author must complete the PR description by: (1) providing a clear explanation of what the C++ dependency scanning system does and why it is needed in the Description section; (2) documenting the test coverage by referencing the test file (cpp/dependency_scan/tests/test_scan_build_artifacts.py) and explaining what test cases validate the new functionality; and (3) addressing the PR Checklist items, particularly ensuring that the changes follow coding guidelines, test coverage is adequate, and any necessary documentation updates have been made.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 96.55% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The pull request title "Add C++ dependency scanning system" is directly and clearly related to the main change in the changeset. The raw summary confirms that the PR introduces a comprehensive dependency scanning system consisting of CMakeLists.txt modifications, README documentation, multiple metadata YAML files, a Python scanning script (scan_build_artifacts.py), and corresponding tests. The title accurately captures this primary objective, though it does contain formatting deviations from the template (extra spacing between tags and the non-standard "[Do Not Review]" annotation that adds noise).

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (18)

cpp/dependency_scan/metadata/README.md (1)

1-760: Excellent comprehensive documentation.

The README provides thorough guidance on the dependency metadata system, including clear examples, troubleshooting steps, and best practices. The structure flows well from quick start through advanced usage. This will significantly aid contributors in understanding and maintaining dependency metadata.

Consider extracting code block language specifiers as mentioned in individual comments above to align with markdown linting standards.

cpp/dependency_scan/metadata/pytorch.yml (1)

11-12: Broaden PyTorch linker flags for better matching.

Add common flags used in practice so -l matches don’t fall back to generic inference.
 linker_flags:
-- - -ltorch_python
+- - -ltorch_python
+  - -ltorch
+  - -lc10
+  - -ltorch_cpu
+  - -ltorch_cuda
Optionally add license/homepage/source fields if known to aid attribution.

Also applies to: 13-18

cpp/dependency_scan/metadata/tensorrt-llm.yml (2)

19-31: Reduce maintenance: use stable substring patterns for Python extensions.

Replacing many ABI-specific entries with a stable prefix cuts churn across Python versions.
 # Python bindings
-- - bindings.cpython-312-x86_64-linux-gnu.so
-- - bindings.cpython-310-x86_64-linux-gnu.so
-- - bindings.cpython-311-x86_64-linux-gnu.so
+- bindings.cpython-

 # Deep learning integration bindings
-- - deep_ep_cpp_tllm.cpython-312-x86_64-linux-gnu.so
-- - deep_ep_cpp_tllm.cpython-310-x86_64-linux-gnu.so
-- - deep_ep_cpp_tllm.cpython-311-x86_64-linux-gnu.so
-- - deep_gemm_cpp_tllm.cpython-312-x86_64-linux-gnu.so
-- - deep_gemm_cpp_tllm.cpython-310-x86_64-linux-gnu.so
-- - deep_gemm_cpp_tllm.cpython-311-x86_64-linux-gnu.so
+- deep_ep_cpp_tllm.cpython-
+- deep_gemm_cpp_tllm.cpython-
The matcher already does substring checks. Add “-ltensorrt_llm” if the project links it via -l.
-linker_flags: []
+linker_flags:
+  - -ltensorrt_llm
Please confirm the exact shared object names (e.g., libnvinfer_plugin_tensorrt_llm.so) align with produced artifacts and whether Python 3.8/3.9 need coverage.

2-2: Version source of truth.

Hardcoding 1.2.0rc1 can drift. Prefer generating this from the project build/version to avoid stale metadata.

cpp/dependency_scan/metadata/_template.yml (1)

58-65: Clarify path_components constraint.

Document that components must be single segments (no “/”) to match PatternMatcher behavior.
 # Path components that indicate this dependency
 # When these strings appear in a binary path, map to this dependency
+# Note: use single directory/file name segments only (no slashes),
+# because the matcher splits paths by "/" and compares components.

cpp/dependency_scan/metadata/ucx.yml (1)

11-13: Optional: include UCX linker flags for completeness.

Add commonly used flags so -l resolution can map without dpkg.
-linker_flags: []
+linker_flags:
+  - -lucp
+  - -lucs
+  - -luct
+  - -lucm

cpp/dependency_scan/metadata/_schema.yml (1)

79-83: Harden schema with additionalProperties.

Prevent typos/unexpected fields from silently passing.
 anyOf:
   - required: ["patterns"]
   - required: ["linker_flags"]
   - required: ["path_components"]
+additionalProperties: false
README currently lists “aliases” as part of “at least one required” sections; schema does not. Please align docs (don’t include aliases) or extend schema if that’s intended.

cpp/dependency_scan/README.md (6)

1-1: Use project-neutral title.

Replace “Claude Code’s” with project naming.
-# Claude Code's Minimal Artifact Scanner
+# TensorRT-LLM Minimal Artifact Scanner
21-21: Fix markdownlint MD040: specify languages for fenced code blocks.

Add appropriate languages to silent the linter and improve rendering.
-```
+```text
(repeat for the ASCII diagrams/outputs above)
Also applies to: 442-442, 598-598

---

`49-55`: **Correct dependencies statement.**

The tool imports PyYAML and optionally jsonschema; also relies on external tools (dpkg-query, readelf).

```diff
-## Installation
-
-No installation required! The script is self-contained with only Python stdlib dependencies.
+## Installation
+
+Requires Python 3.8+, PyYAML, and optionally jsonschema for schema validation.
+System tools: dpkg-query (Debian/Ubuntu) and readelf (binutils).
+Install: `pip install pyyaml jsonschema` (jsonschema optional).
53-56: Avoid hardcoded absolute paths.

Use repo-relative commands to improve portability.
-cd /home/gvenkatarama/tensorrt_llm/cpp/dependency_scan
+cd cpp/dependency_scan
@@
-# After building TensorRT-LLM
-cd /home/gvenkatarama/tensorrt_llm
+# After building TensorRT-LLM
+cd <repo-root>
Also applies to: 91-99

589-591: Align docs with schema: aliases aren’t a required matching section.

Remove “aliases” from the “at least one required” list or update schema accordingly.
-**Pattern sections (at least one required):**
-- `patterns`, `linker_flags`, `path_components`, `aliases`
+**Pattern sections (at least one required):**
+- `patterns`, `linker_flags`, `path_components`
645-646: Remove AI-generation footer.

Drop “Generated by Claude Code” branding from project docs.
-**Generated by Claude Code** | **Phase 1 Complete** | **Target Coverage: ≥98%**
+**Phase 1 Complete** | **Target Coverage: ≥98%**
Also ensure referenced metadata/README.md exists or update link to _template.yml and _schema.yml.

cpp/dependency_scan/scan_build_artifacts.py (5)

400-421: Avoid external 'find'; use pathlib.rglob within build_dir.

Reduces subprocess usage (S603/S607) and improves portability.

-                    import subprocess
-                    try:
-                        result = subprocess.run([
-                            'find',
-                            str(self.build_dir), '-name', basename, '-type', 'f'
-                        ],
-                                                capture_output=True,
-                                                text=True,
-                                                timeout=5)
-                        if result.returncode == 0 and result.stdout.strip():
-                            matches = result.stdout.strip().split('\n')
-                            # Try to find match with similar relative path structure
-                            for match in matches:
-                                if header_path in match or match.endswith(
-                                        header_path):
-                                    canonical_path = os.path.realpath(match)
-                                    break
-                            # If no exact match, use first match
-                            if not os.path.exists(canonical_path) and matches:
-                                canonical_path = os.path.realpath(matches[0])
-                    except Exception:
-                        pass
+                    try:
+                        matches = [str(p) for p in self.build_dir.rglob(basename)]
+                        if matches:
+                            # Prefer closest path-structure match
+                            for match in matches:
+                                if header_path in match or match.endswith(header_path):
+                                    canonical_path = os.path.realpath(match)
+                                    break
+                            if not os.path.exists(canonical_path):
+                                canonical_path = os.path.realpath(matches[0])
+                    except OSError:
+                        pass

542-551: Stabilize wheel artifact paths for dedup/reporting.

Prefix with wheel identifier to avoid collisions and ephemeral temp paths.

-                    artifacts.append(
-                        Artifact(path=str(so_file.relative_to(temp_path)),
+                    artifacts.append(
+                        Artifact(path=f"wheel://{wheel_file.name}/{so_file.relative_to(temp_path)}",
                                  type='binary',
                                  source=str(wheel_file),
                                  metadata={
                                      'wheel': wheel_file.name,
                                      'needed': needed_libs
                                  }))
@@
-                        artifacts.append(
-                            Artifact(path=needed_lib,
+                        artifacts.append(
+                            Artifact(path=needed_lib,
                                     type='library',
                                     source=str(wheel_file),
                                     metadata={
-                                        'from_binary':
-                                        str(so_file.relative_to(temp_path)),
+                                        'from_binary': f"wheel://{wheel_file.name}/{so_file.relative_to(temp_path)}",
                                         'dynamic_dependency':
                                         True
                                     }))

Also applies to: 553-564

223-227: Replace broad except Exception and silent pass/continue with specific exceptions + logging.

Catching everything hides real failures; add logging and handle OSError, subprocess.TimeoutExpired, FileNotFoundError explicitly. Keep try bodies minimal.

I can post a focused diff wiring logging.getLogger(__name__) and updating these sites in one pass. Based on coding guidelines.

Also applies to: 337-338, 420-421, 430-431, 459-460, 498-499, 565-566, 606-607, 693-695, 1089-1091

971-1007: Prefer yaml.safe_dump for emitting reports.

Avoids emitting arbitrary Python tags; preserves current structure.

-        with open(known_file, 'w') as f:
-            yaml.dump(
+        with open(known_file, 'w') as f:
+            yaml.safe_dump(
@@
-        with open(unknown_file, 'w') as f:
-            yaml.dump(
+        with open(unknown_file, 'w') as f:
+            yaml.safe_dump(

616-631: Unify terminology: “metadata” vs “dependencies”.

The code uses --metadata-dir but texts mention “dependencies/”. Standardize on “metadata” to reduce confusion.

Apply small string edits in docstrings/help/prints to consistently reference the metadata directory.

Also applies to: 1000-1006, 1128-1130

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b27810 and dd4c6ca.

📒 Files selected for processing (25)

cpp/CMakeLists.txt (1 hunks)
cpp/dependency_scan/README.md (1 hunks)
cpp/dependency_scan/metadata/README.md (1 hunks)
cpp/dependency_scan/metadata/_schema.yml (1 hunks)
cpp/dependency_scan/metadata/_template.yml (1 hunks)
cpp/dependency_scan/metadata/cppzmq.yml (1 hunks)
cpp/dependency_scan/metadata/cuda.yml (1 hunks)
cpp/dependency_scan/metadata/cutlass.yml (1 hunks)
cpp/dependency_scan/metadata/deep-ep.yml (1 hunks)
cpp/dependency_scan/metadata/deepgemm.yml (1 hunks)
cpp/dependency_scan/metadata/dpkg.yml (1 hunks)
cpp/dependency_scan/metadata/nanobind.yml (1 hunks)
cpp/dependency_scan/metadata/nlohmann-json.yml (1 hunks)
cpp/dependency_scan/metadata/nvshmem.yml (1 hunks)
cpp/dependency_scan/metadata/openmpi.yml (1 hunks)
cpp/dependency_scan/metadata/pybind11.yml (1 hunks)
cpp/dependency_scan/metadata/pytorch.yml (1 hunks)
cpp/dependency_scan/metadata/tensorrt-llm.yml (1 hunks)
cpp/dependency_scan/metadata/tensorrt.yml (1 hunks)
cpp/dependency_scan/metadata/ucx.yml (1 hunks)
cpp/dependency_scan/metadata/ucxx.yml (1 hunks)
cpp/dependency_scan/metadata/xgrammar.yml (1 hunks)
cpp/dependency_scan/scan_build_artifacts.py (1 hunks)
cpp/dependency_scan/scan_output/unknown.yml (1 hunks)
cpp/dependency_scan/tests/test_scan_build_artifacts.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

cpp/dependency_scan/tests/test_scan_build_artifacts.py
cpp/dependency_scan/scan_build_artifacts.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

cpp/dependency_scan/tests/test_scan_build_artifacts.py
cpp/dependency_scan/scan_build_artifacts.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

cpp/dependency_scan/tests/test_scan_build_artifacts.py
cpp/dependency_scan/scan_build_artifacts.py

🧠 Learnings (3)

📚 Learning: 2025-08-21T21:48:35.135Z

Learnt from: djns99
PR: NVIDIA/TensorRT-LLM#7104
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:399-417
Timestamp: 2025-08-21T21:48:35.135Z
Learning: CUTLASS extensions in TensorRT-LLM (located under cpp/tensorrt_llm/cutlass_extensions/) are designed to integrate with and extend functionality in the external CUTLASS repository. When analyzing these extensions, their consumers and functionality wiring may exist in the CUTLASS codebase rather than within TensorRT-LLM itself.

Applied to files:

cpp/dependency_scan/metadata/cutlass.yml

📚 Learning: 2025-09-16T09:30:09.716Z

Learnt from: tongyuantongyu
PR: NVIDIA/TensorRT-LLM#7763
File: cpp/tensorrt_llm/CMakeLists.txt:297-301
Timestamp: 2025-09-16T09:30:09.716Z
Learning: In the TensorRT-LLM project, NCCL libraries are loaded earlier by PyTorch libraries or the bindings library, so the main shared library doesn't need NCCL paths in its RPATH - the libraries will already be available in the process address space when needed.

Applied to files:

cpp/dependency_scan/metadata/tensorrt-llm.yml

📚 Learning: 2025-08-01T15:14:45.673Z

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

cpp/dependency_scan/metadata/tensorrt-llm.yml

🧬 Code graph analysis (1)

cpp/dependency_scan/tests/test_scan_build_artifacts.py (1)

cpp/dependency_scan/scan_build_artifacts.py (21)

Artifact (47-57)

ArtifactCollector (261-609)

DpkgResolver (81-253)

Mapping (61-73)

OutputGenerator (910-1008)

PatternMatcher (617-902)

to_dict (55-57)

to_dict (69-73)

find_library_path (128-164)

get_package (166-227)

_normalize_cuda_package (230-253)

_parse_d_file (311-433)

_parse_link_file (435-501)

_get_needed_libraries (571-609)

_scan_wheel (503-568)

collect_all (274-309)

_match_exact_library (792-836)

_match_path_alias (838-870)

_match_generic_library (872-902)

match (765-790)

generate (920-1008)

🪛 LanguageTool

cpp/dependency_scan/metadata/README.md

[grammar] ~148-~148: Use a hyphen to join words.
Context: ...ir validation_output/ ``` ## Pattern Matching Behavior The scanner tries mat...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

cpp/dependency_scan/metadata/README.md

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

442-442: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

598-598: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

cpp/dependency_scan/README.md

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

442-442: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

598-598: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.1)

cpp/dependency_scan/tests/test_scan_build_artifacts.py

1-1: Shebang is present but file is not executable

(EXE001)

229-229: f-string without any placeholders

Remove extraneous f prefix

(F541)

260-260: f-string without any placeholders

Remove extraneous f prefix

(F541)

284-284: Avoid equality comparisons to True; use artifacts[0].metadata.get('static'): for truth checks

Replace with artifacts[0].metadata.get('static')

(E712)

cpp/dependency_scan/scan_build_artifacts.py

200-200: subprocess call: check for execution of untrusted input

(S603)

200-200: Starting a process with a partial executable path

(S607)

223-224: try-except-pass detected, consider logging the exception

(S110)

223-223: Do not catch blind exception: Exception

(BLE001)

337-337: Do not catch blind exception: Exception

(BLE001)

402-402: subprocess call: check for execution of untrusted input

(S603)

402-405: Starting a process with a partial executable path

(S607)

420-421: try-except-pass detected, consider logging the exception

(S110)

420-420: Do not catch blind exception: Exception

(BLE001)

430-431: try-except-continue detected, consider logging the exception

(S112)

430-430: Do not catch blind exception: Exception

(BLE001)

459-459: Do not catch blind exception: Exception

(BLE001)

498-499: try-except-continue detected, consider logging the exception

(S112)

498-498: Do not catch blind exception: Exception

(BLE001)

565-566: try-except-pass detected, consider logging the exception

(S110)

565-565: Do not catch blind exception: Exception

(BLE001)

592-592: subprocess call: check for execution of untrusted input

(S603)

593-593: Starting a process with a partial executable path

(S607)

606-607: try-except-pass detected, consider logging the exception

(S110)

606-606: Do not catch blind exception: Exception

(BLE001)

693-693: Do not catch blind exception: Exception

(BLE001)

1089-1089: Do not catch blind exception: Exception

(BLE001)

1241-1241: f-string without any placeholders

Remove extraneous f prefix

(F541)

1306-1306: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (13)

cpp/dependency_scan/metadata/nlohmann-json.yml (1)

1-17: Well-formed metadata with complete field coverage.

This metadata file is well-structured with all expected fields populated. The information aligns with the actual nlohmann-json project (correct license, copyright, homepage). Empty patterns and linker_flags are appropriate for a header-only library.

cpp/dependency_scan/metadata/cuda.yml (1)

1-18: Reasonable metadata for system dependencies; version pinning may be restrictive.

The metadata is well-formed for CUDA runtime dependencies. However, version 13.0.48-1 is pinned to a specific CUDA release. If builds target multiple CUDA versions, consider whether this metadata should be versioned or support version ranges. Confirm that the scanner gracefully handles CUDA installations with different versions.

cpp/dependency_scan/metadata/nanobind.yml (1)

1-17: Well-formed bundled dependency metadata.

This metadata is complete and consistent with other bundled dependencies. Empty patterns and linker_flags are appropriate for a header-only binding library.

cpp/dependency_scan/metadata/deepgemm.yml (1)

12-12: Path component casing may be inconsistent.

The path_components value is ["DeepGEMM"] (PascalCase), while deep-ep.yml uses snake_case (["deep_ep_download-src"]). Verify that this casing matches the actual directory name in the repository to ensure proper pattern matching during scanning.

cpp/dependency_scan/metadata/pybind11.yml (1)

1-17: Well-formed bundled dependency metadata.

Complete and consistent with other properly-defined bundled dependencies. Metadata appears accurate for the pybind11 project.

cpp/dependency_scan/metadata/tensorrt.yml (1)

1-15: Well-formed system dependency metadata with version-aware patterns.

The patterns and path components provide good coverage for detecting TensorRT installations across multiple versions. The approach of including version-specific path variants (tensorrt-8, -9, -10) is reasonable for supporting multiple TensorRT versions.

cpp/dependency_scan/metadata/deep-ep.yml (1)

1-14: ****

The original review comment is based on an incorrect premise. The schema file (_schema.yml) explicitly defines license, copyright, and homepage as optional fields—they are not in the required array. The only required fields are name, version, and description, all of which deep-ep.yml contains.

Moreover, this pattern of omitting optional metadata is prevalent across the codebase: 10 other dependency files (cuda, pytorch, tensorrt, ucx, ucxx, etc.) also lack these fields. The inconsistency exists but does not constitute a schema violation or deficiency in deep-ep.yml.

The file meets all schema requirements and should be approved as-is.

Likely an incorrect or invalid review comment.

cpp/dependency_scan/metadata/xgrammar.yml (1)

1-15: Structure and content look good.

The metadata is well-formed and appropriate for a bundled library that doesn't require direct linking. Version "unknown" is acceptable per documentation, and the single path_component will capture inclusion-based artifacts correctly.

cpp/dependency_scan/metadata/cppzmq.yml (1)

1-17: Well-formed metadata with good attribution.

The metadata correctly identifies cppzmq as a bundled, header-only C++ library. The copyright attribution and homepage reference are helpful for provenance tracking.
cpp/dependency_scan/metadata/openmpi.yml (1)
1-11: Structure is sound, but verify linker_flags.

The metadata correctly captures OpenMPI patterns and common path components. However, the empty linker_flags is unusual—OpenMPI typically provides -lmpi linker flag. Verify whether this is intentional (e.g., if OpenMPI is always dynamically loaded in this codebase) or if -lmpi should be added.

If OpenMPI is explicitly linked in C++ builds, add the linker flag:
linker_flags:
  - -lmpi
cpp/dependency_scan/metadata/ucxx.yml (1)

1-15: Metadata structure is appropriate.

The entry correctly identifies UCXX as a bundled C++ bindings library with no direct patterns or linking requirements, which aligns with the header-only nature of C++ bindings.

cpp/dependency_scan/metadata/cutlass.yml (1)

1-17: Comprehensive metadata with proper attribution.

The CUTLASS metadata includes appropriate licensing (BSD-3-Clause), copyright attribution, and correctly identifies it as a bundled template library. The empty patterns and linker flags are appropriate for header-only template code.

cpp/dependency_scan/metadata/dpkg.yml (1)

1-29: System library metadata is well-structured.

The dpkg.yml entries correctly follow the documented list format and include appropriate linker flags for common system libraries. The version strings match Ubuntu package versioning conventions. Empty patterns is correct as these libraries are resolved via dpkg-query.

coderabbitai · 2025-10-25T06:52:41Z

cpp/CMakeLists.txt

+# Generate dependency files (.d) to track all header dependencies This creates
+# .d files alongside .o files showing all headers used
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")
+set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify .gitignore coverage for generated .d files.

The -MD -MP flags will generate many .d (dependency) files during builds. These should not be committed to version control. Confirm that a .gitignore rule exists to exclude *.d files (or a more specific pattern like **/*.d).

Run the following command to verify:

🏁 Script executed:

#!/bin/bash # Check if .gitignore covers .d files git check-ignore -v **/*.d 2>/dev/null | head -20 || echo "No .gitignore match found for .d files"

Length of output: 0

I need to verify this differently since .d files are generated during builds and don't currently exist in the repository. Let me check the .gitignore file directly for patterns that would exclude .d files.

--- 🏁 Script executed: ```shell #!/bin/bash # Find and examine .gitignore files for .d file patterns echo "=== Checking for .gitignore files ===" fd -name ".gitignore" -type f echo -e "\n=== Examining .gitignore content for .d file patterns ===" if [ -f .gitignore ]; then echo "--- Root .gitignore ---" cat .gitignore | head -50 echo -e "\n--- Searching for .d patterns ---" grep -n "\.d\|\.d/" .gitignore || echo "No .d patterns found in root .gitignore" fi echo -e "\n=== Checking cpp directory for .gitignore ===" if [ -f cpp/.gitignore ]; then echo "--- cpp/.gitignore ---" cat cpp/.gitignore fi

Length of output: 1345

Let me verify where CMake places build artifacts and if the existing cpp/build* gitignore pattern is sufficient.

--- 🏁 Script executed: ```shell #!/bin/bash # Check CMakeLists.txt for build output configuration echo "=== Checking CMakeLists.txt for build output directory setup ===" rg -n "set\(CMAKE_BINARY_DIR|set\(CMAKE_BUILD_TYPE|CMAKE_RUNTIME_OUTPUT_DIRECTORY|CMAKE_LIBRARY_OUTPUT_DIRECTORY|CMAKE_ARCHIVE_OUTPUT_DIRECTORY" cpp/CMakeLists.txt -A2 | head -30 echo -e "\n=== Checking for other potential build locations ===" rg -n "add_executable|add_library" cpp/CMakeLists.txt | head -10 echo -e "\n=== Verifying what cpp/build* pattern covers ===" echo "Pattern 'cpp/build*' will match:" echo " - cpp/build" echo " - cpp/build/" echo " - cpp/build_debug" echo " - cpp/build_release" echo " - etc." echo -e "\n=== Checking if .gitignore patterns are recursive ===" grep -n "\.d\|\*\.o\|\*\.a\|\*\.so" .gitignore || echo "No object file patterns found"

Length of output: 660

Add explicit *.d pattern to .gitignore to ensure comprehensive coverage.

The review comment is correct. While the existing .gitignore includes cpp/build* (which covers standard build directories), this pattern is directory-specific and does not guarantee coverage if build artifacts are generated elsewhere. The -MD -MP flags add dependency tracking globally across CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS, so .d files could potentially be generated in various locations depending on the build configuration.

Adding an explicit *.d pattern to .gitignore provides robust, convention-independent coverage for all generated dependency files, regardless of build directory location.

🤖 Prompt for AI Agents

In cpp/CMakeLists.txt around lines 321 to 324, the project enables generation of .d dependency files via -MD -MP but there is no explicit .gitignore entry for them; update the repository .gitignore (root or relevant cpp/.gitignore) to add a single line with the pattern *.d so generated dependency files are ignored regardless of build directory, then commit that change.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Platform compatibility concern: -MD -MP flags may not work on Windows MSVC builds.

The -MD -MP compiler flags are GCC/Clang specific. Windows MSVC uses /showIncludes instead, which produces a different output format. The code lacks a platform guard (e.g., if(NOT WIN32) or compiler ID check), which could cause build failures or silently skip dependency tracking on Windows.

Please verify that this works on all supported platforms. If Windows builds are supported, consider:

# Generate dependency files (.d) to track all header dependencies This creates # .d files alongside .o files showing all headers used -set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP") -set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP") +if(NOT WIN32) + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP") + set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP") +endif()

Alternatively, if MSVC support is planned, please generate a separate code path using /showIncludes.

I'll verify the review comment by examining the codebase and confirming the compiler flag claims.

🏁 Script executed:

#!/bin/bash # Examine the actual code at lines 321-324 in cpp/CMakeLists.txt cat cpp/CMakeLists.txt | sed -n '310,335p'

Length of output: 1150

🏁 Script executed:

#!/bin/bash # Search for platform guards and WIN32 checks in CMakeLists.txt rg -n "WIN32|CMAKE_CXX_COMPILER_ID|MSVC" cpp/CMakeLists.txt -A2 -B2

Length of output: 1490

🏁 Script executed:

#!/bin/bash # Check .gitignore for .d files handling cat .gitignore | grep -E '\.d$|\.d\s' || echo "No .d pattern found in .gitignore"

Length of output: 33

🏁 Script executed:

#!/bin/bash # Search for build documentation or platform support statements fd -e md -e txt -e rst | xargs grep -l -i "platform\|windows\|msvc" | head -20

Length of output: 838

🌐 Web query:

GCC Clang -MD -MP compiler flags Windows MSVC compatibility

💡 Result:

Short answer

On GCC/Clang, -MD tells the driver to emit a Make-style dependency file (.d) as a side-effect of compilation; -MMD is like -MD but omits system headers; -MP adds phony targets for each header (avoids “missing file” errors). Use -MD (or -MMD) together with -MF or accept the default .d name. [1][2]

On MSVC, /MD means something completely different: it selects the DLL C runtime (multithreaded DLL). It is not a dependency-file option. Do not confuse GCC/Clang’s -MD with MSVC’s /MD. [3]

clang-cl (the MSVC-compatible Clang driver) aims to be compatible with MSVC options and emit MSVC-style include notes (/showIncludes), but the GCC-style -MD/-MMD/-MP depfile behavior is what you should use when you want Make/Ninja-style .d files from clang/gcc. Build tools (CMake/Ninja) handle the different formats (MSVC /showIncludes vs GCC depfiles). [4][1]

Practical recommendations

If you want GCC/Clang to produce Make/Ninja dependency files use:

gcc/clang: -MMD -MF .d -MP (or -MD instead of -MMD if you want system headers included). [1][2]

If you want MSVC-style include parsing (MSBuild, cl.exe workflows) use cl.exe /showIncludes (or clang-cl which emits a similar “Note: including file:” prefix). Let your build system parse the output appropriately. [4]

Remember: don’t mix meanings — /MD (MSVC) ≠ -MD (gcc/clang). Ensure your build rules use the right flag for the toolchain.

Sources

GCC manual: -MD, -MMD, -MF, -MP descriptions. [1]

Practical explanation and examples (Stack Overflow). [2]

MSVC /MD meaning (runtime selection). [3]

clang-cl /showIncludes / MSVC compatibility and build-tool handling. [4]

References
[1] GCC manual (dependency options).
[2] Stack Overflow — using -MD/-MMD with -o.
[3] MSVC docs / Stack Overflow explanation of /MD vs /MT.
[4] CMake/clang-cl /showIncludes compatibility / clang-cl blog posts.

Add platform guard to prevent MSVC build failures on Windows.

The -MD -MP flags are GCC/Clang specific for generating .d dependency files. On MSVC, /MD means something completely different: it selects the DLL C runtime (multithreaded DLL) and is not a dependency-file option. The codebase already demonstrates this awareness—see line 356-358 which wraps similar CMAKE_CXX_FLAGS modifications with if(NOT WIN32). Without a guard, Windows MSVC builds will either fail or silently skip dependency tracking.

Additionally, .d files are not listed in .gitignore, risking accidental commits.

Wrap lines 321-324 with a platform guard:

# Generate dependency files (.d) to track all header dependencies This creates # .d files alongside .o files showing all headers used +if(NOT WIN32) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP") set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP") +endif()

And add .d files to .gitignore:

+*.d

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Generate dependency files (.d) to track all header dependencies This creates

# .d files alongside .o files showing all headers used

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")

# Generate dependency files (.d) to track all header dependencies This creates

# .d files alongside .o files showing all headers used

if(NOT WIN32)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -MD -MP")

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -MD -MP")

endif()

🤖 Prompt for AI Agents

In cpp/CMakeLists.txt around lines 321 to 324, the flags `-MD -MP` are being unconditionally appended to CMAKE_CXX_FLAGS and CMAKE_CUDA_FLAGS which will break or misconfigure MSVC on Windows; wrap those two set(...) lines in an `if(NOT WIN32)` / `endif()` guard (matching the style used later at lines ~356-358) so GCC/Clang-only dependency flags are only applied on non-Windows platforms, and also add a `.d` pattern (e.g. `*.d`) to the repository .gitignore so generated dependency files are not accidentally committed.

coderabbitai · 2025-10-25T06:52:41Z

cpp/dependency_scan/metadata/deepgemm.yml

+name: deepgemm
+version: unknown
+description: DeepGEMM optimized matrix multiplication library
+
+source: bundled
+
+patterns: []
+
+linker_flags: []
+
+path_components:
+- DeepGEMM
+
+aliases: []


⚠️ Potential issue | 🟡 Minor

Incomplete metadata: missing license, copyright, and homepage fields.

This file lacks license, copyright, and homepage fields that are present in other bundled dependency metadata (e.g., nlohmann-json.yml, pybind11.yml). For consistency and discoverability, these fields should be added if available.

🤖 Prompt for AI Agents

In cpp/dependency_scan/metadata/deepgemm.yml lines 1-14, the metadata is missing the standard fields `license`, `copyright`, and `homepage`; add these three keys to match other bundled metadata files (e.g., nlohmann-json.yml, pybind11.yml). Populate them with the actual values if available (preferred), otherwise set them to a clear placeholder such as "unknown" for license and copyright and "https://unknown" or "unknown" for homepage so the file structure remains consistent.

coderabbitai · 2025-10-25T06:52:41Z

cpp/dependency_scan/metadata/nvshmem.yml

+path_components:
+- nvshmem-build
+- nvshmem_project-prefix
+- libs/nvshmem
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix path_components: remove slashed entry.

Matcher compares individual components; “libs/nvshmem” won’t match.

path_components: - nvshmem-build - nvshmem_project-prefix - - libs/nvshmem + - nvshmem

Consider adding additional patterns (e.g., libnvshmem.so, libnvshmem_host.so) if present in your builds.

Is “license: NVIDIA” the correct SPDX form (e.g., “Proprietary”/“NVIDIA-proprietary”)? Is “source: bundled” accurate for this repo?

I'll verify the review comment by examining the file structure, the path_components matching logic, and license conventions.

--- 🏁 Script executed: ```shell #!/bin/bash # Read the full nvshmem.yml file to understand context cat cpp/dependency_scan/metadata/nvshmem.yml

Length of output: 362

🏁 Script executed:

#!/bin/bash # Search for how path_components is used/matched in the codebase rg -n "path_components" --type yaml -C 5 | head -100

Length of output: 5213

🏁 Script executed:

#!/bin/bash # Look for matcher/parser logic that processes path_components rg -n "path_component" --type py --type ts --type js -C 3 | head -150

Length of output: 12639

🏁 Script executed:

#!/bin/bash # Find other similar metadata yml files to understand naming conventions fd -e yml cpp/dependency_scan/metadata | head -20

Length of output: 417

🏁 Script executed:

#!/bin/bash # Check for examples of license field in yml files rg -n "^license:" cpp/dependency_scan/metadata/ --type yaml -A 1 | head -50

Length of output: 1220

--- 🏁 Script executed: ```shell #!/bin/bash # Find the actual path matching implementation rg -n "rightmost\|path_alias" cpp/dependency_scan/scan_build_artifacts.py -A 10 -B 2

Length of output: 0

🏁 Script executed:

#!/bin/bash # Look for how path_components matching is used rg -n "def.*match\|path_component" cpp/dependency_scan/scan_build_artifacts.py -C 5

Length of output: 0

🏁 Script executed:

#!/bin/bash # Search for test cases that show path_components usage patterns rg -n "path_components.*/" cpp/dependency_scan/ -C 2

Length of output: 495

🏁 Script executed:

#!/bin/bash # Verify SPDX license formats rg -n "^license:" cpp/dependency_scan/metadata/ --type yaml | grep -E "(MIT|Apache|BSD|Proprietary|LicenseRef|NVIDIA)"

Length of output: 440

--- 🏁 Script executed: ```shell #!/bin/bash # Search for path_alias related code and matching logic rg -n "path_alias" cpp/dependency_scan/scan_build_artifacts.py -C 8

Length of output: 6150

🏁 Script executed:

#!/bin/bash # Look for how paths are matched in the dependency resolution rg -n "def.*resolve\|def.*match\|def.*find" cpp/dependency_scan/scan_build_artifacts.py -A 15 | head -100

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if there's a split or path parsing function rg -n "split.*path\|/\|separator" cpp/dependency_scan/scan_build_artifacts.py -C 3 | head -80

Length of output: 0

🏁 Script executed:

#!/bin/bash # Look at the full context around line 743-745 where path_components are processed sed -n '740,800p' cpp/dependency_scan/scan_build_artifacts.py

Length of output: 2672

--- 🌐 Web query: ```web SPDX license identifier for NVIDIA proprietary software

💡 Result:

Use a LicenseRef identifier. NVIDIA commonly uses "SPDX-License-Identifier: LicenseRef-NvidiaProprietary". (docs.nvidia.com)

Citations:

1: https://docs.nvidia.com/pva/sdk/2.7.1/pva_sdk_apis/program_listing_file_src_host_c_api_include_cupva_host_scheduling.h.html?utm_source=openai

Fix path_components and license format.

The path matcher splits artifact paths by / and checks individual components right-to-left. The entry libs/nvshmem is stored as a composite key and will never match.

Additionally, NVIDIA is not a valid SPDX license identifier. Use LicenseRef-NvidiaProprietary instead.

-license: NVIDIA +license: LicenseRef-NvidiaProprietary path_components: - nvshmem-build - nvshmem_project-prefix -- libs/nvshmem +- nvshmem

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In cpp/dependency_scan/metadata/nvshmem.yml around lines 15–19, split the composite path_components entry "libs/nvshmem" into two separate components in right-to-left order (i.e., "nvshmem" then "libs") so the path matcher can match individual path segments, and replace the non-SPDX "NVIDIA" license identifier with "LicenseRef-NvidiaProprietary"; update the YAML values accordingly.

coderabbitai · 2025-10-25T06:52:41Z

cpp/dependency_scan/metadata/README.md

+├── tensorrt-llm.yml   # Individual dependency
+├── pytorch.yml
+├── cuda-cudart-12.yml
+└── ... (62+ dependencies)


⚠️ Potential issue | 🟡 Minor

Add language specifier to fenced code block.

Markdown best practices require specifying the language for syntax highlighting. This block appears to be YAML or shell—please add the appropriate identifier.

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In cpp/dependency_scan/metadata/README.md at line 21, the fenced code block lacks a language specifier; update the opening fence to include the correct language (e.g., ```yaml or ```bash depending on the block contents) so Markdown renders syntax highlighting, ensuring the chosen identifier matches the block content.

coderabbitai · 2025-10-25T06:52:41Z

cpp/dependency_scan/metadata/README.md

+## Pattern Matching Behavior
+
+The scanner tries matches in this order:
+
+1. **Exact match on basename**: `libfoo.so` == `libfoo.so`
+2. **Exact match on full path**: `/path/to/libfoo.so` == `/path/to/libfoo.so`
+3. **Substring match**: `foo.cpython` in `foo.cpython-312-x86_64.so`
+
+**No need to specify match type** - the scanner tries both automatically!


⚠️ Potential issue | 🟡 Minor

Minor grammar: Use hyphen for compound modifier.

"Pattern Matching Behavior" should be hyphenated when used as a compound modifier, though as a section header this is stylistic. More critically, ensure consistency with other headers in the document.

🧰 Tools

🪛 LanguageTool

[grammar] ~148-~148: Use a hyphen to join words.
Context: ...ir validation_output/ ``` ## Pattern Matching Behavior The scanner tries mat...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

In cpp/dependency_scan/metadata/README.md around lines 148 to 156, the section header "Pattern Matching Behavior" should use a hyphen when used as a compound modifier to be consistent with other headers; change the header to "Pattern-Matching Behavior" (or update other headers to match if you prefer no hyphens) and ensure the same hyphenation style is applied consistently across the document.

coderabbitai · 2025-10-25T06:52:42Z

cpp/dependency_scan/scan_build_artifacts.py

+    def __init__(self):
+        self._cache: Dict[str, Optional[str]] = {}
+        self._lib_search_paths = self._get_library_search_paths()
+


⚠️ Potential issue | 🟠 Major

Make dpkg resolution robust: pre-check binary and resolve non-absolute lib basenames.

Avoid repeated FileNotFoundError by resolving dpkg-query once.

Handle inputs like "libcudart.so.12" (from wheel NEEDED) by searching standard lib paths (reuse -l logic).

@@ - def __init__(self): - self._cache: Dict[str, Optional[str]] = {} - self._lib_search_paths = self._get_library_search_paths() + def __init__(self): + import shutil + self._cache: Dict[str, Optional[str]] = {} + self._lib_search_paths = self._get_library_search_paths() + self._dpkg_query = shutil.which("dpkg-query") + self._dpkg_available = self._dpkg_query is not None @@ - # Handle linker flags + # Handle linker flags if file_path.startswith("-l"): resolved_path = self.find_library_path(file_path) if not resolved_path: self._cache[file_path] = None return None file_path = resolved_path + + # Handle non-absolute library basenames like "libfoo.so" / "libfoo.so.6" / "libfoo.a" + if not os.path.isabs(file_path): + base = os.path.basename(file_path) + # Try mapping to -l<name> and reuse search paths + m = re.match(r'^lib([A-Za-z0-9_\-\+]+)\.(?:so(?:\.\d+)*)$', base) + if m: + name = m.group(1) + resolved_path = self.find_library_path(f"-l{name}") + if resolved_path: + file_path = resolved_path + elif base.startswith("lib") and base.endswith(".a"): + name = base[3:-2] + resolved_path = self.find_library_path(f"-l{name}") + if resolved_path: + file_path = resolved_path @@ - # Query dpkg + # Short-circuit if dpkg-query is unavailable + if not getattr(self, "_dpkg_available", False): + self._cache[file_path] = None + return None + + # Query dpkg try: - result = subprocess.run(["dpkg-query", "-S", file_path], + result = subprocess.run([self._dpkg_query, "-S", file_path], capture_output=True, text=True, timeout=5)

Also applies to: 166-228

🤖 Prompt for AI Agents

In cpp/dependency_scan/scan_build_artifacts.py around lines 98 to 101 (and apply same changes to 166–228), the dpkg resolution and library basename handling are brittle: resolve the dpkg-query binary once during initialization and cache its path or mark it unavailable to avoid repeated FileNotFoundError attempts; when a library name is not an absolute path (e.g., "libcudart.so.12"), treat it as a basename and attempt to locate it by searching the collected _lib_search_paths (reusing the existing -l style search logic) before falling back to dpkg-query; update logic to normalize inputs, try search paths first for non-absolute names, and only call dpkg-query once (using cached path) if search fails, handling and logging the case where dpkg-query is not present.

coderabbitai · 2025-10-25T06:52:42Z

cpp/dependency_scan/scan_build_artifacts.py

+    print(
+        f"  Resolved {len(dpkg_mappings)} artifacts via dpkg ({len(dpkg_mappings) / len(artifacts) * 100:.1f}%)"
+    )
+    print()


⚠️ Potential issue | 🔴 Critical

Guard division by zero in percentage prints.

When no artifacts are found, current f-strings raise ZeroDivisionError.

- print( - f" Resolved {len(dpkg_mappings)} artifacts via dpkg ({len(dpkg_mappings) / len(artifacts) * 100:.1f}%)" - ) + total_artifacts = len(artifacts) + dpkg_pct = (len(dpkg_mappings) / total_artifacts * 100) if total_artifacts else 0.0 + print(f" Resolved {len(dpkg_mappings)} artifacts via dpkg ({dpkg_pct:.1f}%)") @@ - print( - f" Resolved {len(pattern_mappings)} additional artifacts via patterns ({len(pattern_mappings) / len(artifacts) * 100:.1f}%)" - ) + pattern_pct = (len(pattern_mappings) / total_artifacts * 100) if total_artifacts else 0.0 + print(f" Resolved {len(pattern_mappings)} additional artifacts via patterns ({pattern_pct:.1f}%)") @@ - print( - f" Mapped (dpkg): {dpkg_known} ({dpkg_known / len(artifacts) * 100:.1f}%)" - ) - print( - f" Mapped (patterns): {pattern_known} ({pattern_known / len(artifacts) * 100:.1f}%)" - ) - print( - f" Unknown: {total_unknown} ({total_unknown / len(artifacts) * 100:.1f}%)" - ) + print(f" Mapped (dpkg): {dpkg_known} ({(dpkg_known / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)") + print(f" Mapped (patterns): {pattern_known} ({(pattern_known / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)") + print(f" Unknown: {total_unknown} ({(total_unknown / total_artifacts * 100) if total_artifacts else 0.0:.1f}%)")

Also applies to: 1230-1233, 1265-1272

🤖 Prompt for AI Agents

In cpp/dependency_scan/scan_build_artifacts.py around lines 1210-1213 (and similarly at 1230-1233 and 1265-1272), the f-strings calculate a percentage using len(artifacts) as the denominator which will raise ZeroDivisionError when artifacts is empty; update each print to guard the division by checking if len(artifacts) is zero and use 0.0% (or format 0.0) in that case, e.g. compute total = len(artifacts) and pct = (len(dpkg_mappings) / total * 100) if total else 0.0, then print using pct to avoid dividing by zero.

coderabbitai · 2025-10-25T06:52:42Z

cpp/dependency_scan/scan_build_artifacts.py

+    print(f"  Reports written to:")
+    print(f"    - {known_file}")
+    print(f"    - {unknown_file}")


⚠️ Potential issue | 🟡 Minor

Remove redundant f-strings (no placeholders).

Minor cleanup flagged by Ruff F541.

- print(f" Reports written to:") + print(" Reports written to:") @@ - print(f" Re-run scanner to improve coverage") + print(" Re-run scanner to improve coverage")

Also applies to: 1306-1306

🧰 Tools

🪛 Ruff (0.14.1)

1241-1241: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In cpp/dependency_scan/scan_build_artifacts.py around lines 1241-1243 (and also at line 1306), some print statements use f-strings without any placeholders (e.g. print(f" Reports written to:")), which Ruff flags as redundant F541; replace these f-strings with plain string literals (remove the leading 'f') so the prints become regular strings and no formatting is attempted.

coderabbitai · 2025-10-25T06:52:42Z

cpp/dependency_scan/scan_output/unknown.yml

+summary:
+  count: 0
+  action_required: Add patterns to YAML files in dependencies/ for these artifacts
+artifacts: []


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Correct path reference in action_required message.

Line 3 references dependencies/ directory, but the actual metadata directory is metadata/ throughout the codebase and documentation. This will confuse users trying to follow the guidance.

Update the action_required message to reference the correct directory:

- action_required: Add patterns to YAML files in dependencies/ for these artifacts + action_required: Add patterns to YAML files in metadata/ for these artifacts

Verify that the scanner code generating this file outputs the correct path reference. If the scanner hardcodes "dependencies/", that should be corrected to match the actual directory structure.

🏁 Script executed:

#!/bin/bash # Verify the actual content of unknown.yml and directory structure echo "=== Checking unknown.yml content ===" find . -name "unknown.yml" -type f -exec cat {} \; echo -e "\n=== Checking directory structure for 'dependencies' and 'metadata' ===" find . -type d $ -name "dependencies" -o -name "metadata" $ | head -20 echo -e "\n=== Searching for 'dependencies/' references in documentation ===" rg -n "dependencies/" --type=md --type=txt | head -20 echo -e "\n=== Searching for 'metadata/' references in documentation ===" rg -n "metadata/" --type=md --type=txt | head -20 echo -e "\n=== Checking README.md for path references ===" find . -name "README.md" -exec grep -n -E "(dependencies|metadata)" {} + | head -20

Length of output: 4447

🏁 Script executed:

#!/bin/bash # Find scanner code that generates unknown.yml and the "dependencies/" message echo "=== Searching for files that generate unknown.yml ===" rg -n "unknown\.yml" --type=py --type=sh | head -20 echo -e "\n=== Searching for 'dependencies/' string in Python/code files ===" rg -n "dependencies/" --type=py --type=sh | head -30 echo -e "\n=== Searching for 'action_required' in code ===" rg -n "action_required" --type=py --type=sh | head -30 echo -e "\n=== Find all Python files in cpp/dependency_scan ===" find cpp/dependency_scan -name "*.py" -type f

Length of output: 2022

🏁 Script executed:

#!/bin/bash # Examine the actual code around line 1000 where the bug is echo "=== Code context around line 1000 in scan_build_artifacts.py ===" sed -n '995,1005p' cpp/dependency_scan/scan_build_artifacts.py echo -e "\n=== Check the parameter definition for dependencies/metadata ===" rg -n "dependencies.dir|metadata.dir" --type=py -A2 -B2 echo -e "\n=== Check PatternMatcher initialization and metadata_dir parameter ===" sed -n '619,670p' cpp/dependency_scan/scan_build_artifacts.py

Length of output: 14794

Correct hardcoded path reference in action_required message and update stale docstrings.

The output message in cpp/dependency_scan/scan_build_artifacts.py at line 1000 references the incorrect directory. The actual implementation uses metadata/ (parameter --metadata-dir, default ./metadata/), but the message tells users to add patterns to dependencies/.

Update line 1000:

- 'Add patterns to YAML files in dependencies/ for these artifacts' + 'Add patterns to YAML files in metadata/ for these artifacts'

Also update stale docstring references to dependencies/ in lines 619, 626, 636, and 657 to consistently reference metadata/ instead.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In cpp/dependency_scan/scan_build_artifacts.py around line 1000 (and docstrings at lines 619, 626, 636, 657), the user-facing message incorrectly references "dependencies/" instead of the actual metadata directory; update the action_required message at line 1000 to reference the --metadata-dir (default ./metadata/) and replace all stale docstring occurrences of "dependencies/" with "metadata/" so wording is consistent and mentions the --metadata-dir flag and default path where appropriate.

coderabbitai · 2025-10-25T06:52:43Z

cpp/dependency_scan/tests/test_scan_build_artifacts.py

@@ -0,0 +1,952 @@
+#!/usr/bin/env python3


⚠️ Potential issue | 🟠 Major

Add NVIDIA header; drop shebang; fix lint nits (F541, E712).

Complies with repo guidelines and cleans Ruff warnings.

-#!/usr/bin/env python3 +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. @@ - d_file.write_text(f"build/foo.o: ../include/test.h\n") + d_file.write_text("build/foo.o: ../include/test.h\n") @@ - link_file.write_text(f"/usr/bin/c++ -o foo @response.rsp\n") + link_file.write_text("/usr/bin/c++ -o foo @response.rsp\n") @@ - assert artifacts[0].metadata.get('static') == True + assert artifacts[0].metadata.get('static') is True

Import style: prefer module namespace per guidelines.

-from scan_build_artifacts import (Artifact, ArtifactCollector, DpkgResolver, - Mapping, OutputGenerator, PatternMatcher) +import scan_build_artifacts as sba

and reference as sba.Artifact, etc. Apply only if it doesn’t overcomplicate the test changes.

Also applies to: 229-229, 260-260, 284-284

🧰 Tools

🪛 Ruff (0.14.1)

1-1: Shebang is present but file is not executable

(EXE001)

🤖 Prompt for AI Agents

In cpp/dependency_scan/tests/test_scan_build_artifacts.py around lines 1 (and also apply similar changes at lines 229, 260, 284): remove the leading shebang, add the standard NVIDIA file header/license block at the top, fix Ruff lint issues by replacing bare f-strings with regular strings where there are no interpolations (F541) and avoid comparisons like "== True/False" by using direct boolean checks or "is True"/"is False" as appropriate (E712), and adjust imports to the module namespace style (e.g., import the module as sba and reference types/functions as sba.Artifact) unless doing so would overcomplicate the tests. Ensure changes are minimal and only alter imports/strings/boolean comparisons and header/shebang as described.

Signed-off-by: Venky Ganesh <[email protected]>

juney-nvidia · 2025-10-27T00:16:38Z

@venkywonka Hi Venky, can you briefly explain the principle of this scanning approach? And how long it can take to finish one round of scanning?

Thanks
June

cpp/dependency_scan/README.md

tburt-nv · 2025-10-27T18:08:55Z

cpp/dependency_scan/scan_output/unknown.yml

@@ -0,0 +1,4 @@
+summary:


We should probably add these files to the gitignore, right?

yep! planning to add it

tburt-nv · 2025-10-27T18:13:28Z

cpp/dependency_scan/scan_build_artifacts.py

+Usage:
+  python scan_build_artifacts.py --build-dir build/ --output-dir scan_output/
+  python scan_build_artifacts.py --validate  # Validate YAML files
+"""


There's a lot of code here I'll want to review, but I can do that as I work on integrating it with the dependency database. What do you think about adding an "in developement" or "not ready for use" disclaimer to the README for now? That way you can submit your work now, and I can remove the disclaimer once I'm done reviewing and integrating the code. I'll also patch this script as necessary if I find any issues, so you can review the patches instead of implementing them.

sure @tburt-nv - I agree with that too - i wanted to put this out here even though its very much WIP so that there's concreteness on how this looks to you, so that your async dev can be informed of the same - while i continue to smoothen the edges... I don't intend to merge this immediately, but if you think merging it with a "not ready for use" is better I'm good with that too.

Signed-off-by: Venky Ganesh <[email protected]>

venkywonka · 2025-10-28T01:46:41Z

@venkywonka Hi Venky, can you briefly explain the principle of this scanning approach? And how long it can take to finish one round of scanning?

Thanks June

hey @juney-nvidia , I have changed this to draft.
I have tried to outline things in the README attached - but basically to find the dependencies that end up in the binaries we build and ship, trying to take a first-principles-approach of actually tracking them from the information cmake gives. These are the .d (depfiles) for header files that get actually included + link.txt files for any included precompiled libraries.

On top of that, the code also takes a built wheel, looks at the bundled binaries we ship with it, and reads what are the runtime dependencies mandated by it.

As regards to how long it takes to finish it, it takes maybe a minute max.

- Added a section on recent improvements, highlighting the achievement of 100% artifact coverage through parser bug fixes and pattern enhancements. - Updated usage instructions for the script, including command-line arguments. - Improved metadata structure by introducing new fields and updating existing ones to better reflect dependency sources and matching patterns. - Removed the deprecated file and consolidated system libraries into . - Enhanced test coverage for artifact collection, including handling of malformed paths and CMakeFiles artifacts. This commit aims to improve the clarity and functionality of the dependency scanning process. Signed-off-by: Venky Ganesh <[email protected]>

Signed-off-by: Venky Ganesh <[email protected]>

venkywonka marked this pull request as ready for review October 25, 2025 06:43

venkywonka requested a review from a team as a code owner October 25, 2025 06:43

coderabbitai bot changed the title ~~[None] @coderabbitai title~~ [None] [feat] Add C++ dependency scanning system Oct 25, 2025

coderabbitai bot reviewed Oct 25, 2025

View reviewed changes

venkywonka force-pushed the venky/gen-dependency-artifacts branch from dd4c6ca to dc413c7 Compare October 25, 2025 06:53

add scripts and yml files

22a3372

Signed-off-by: Venky Ganesh <[email protected]>

venkywonka force-pushed the venky/gen-dependency-artifacts branch from dc413c7 to 22a3372 Compare October 25, 2025 06:59

venkywonka marked this pull request as draft October 25, 2025 07:00

tburt-nv reviewed Oct 27, 2025

View reviewed changes

venkywonka changed the title ~~[None] [feat] Add C++ dependency scanning system~~ [None] [feat] [Do Not Review] Add C++ dependency scanning system Oct 27, 2025

venkywonka added 2 commits October 27, 2025 15:29

add deps, modify schema

f2c86e1

Signed-off-by: Venky Ganesh <[email protected]>

add to .gitignore, modify tests

e2783f3

Signed-off-by: Venky Ganesh <[email protected]>

venkywonka added 2 commits October 28, 2025 15:16

readme changes

7ca2ab0

Signed-off-by: Venky Ganesh <[email protected]>

Uh oh!

[None] [feat] [Do Not Review] Add C++ dependency scanning system #8662

Are you sure you want to change the base?

[None] [feat] [Do Not Review] Add C++ dependency scanning system #8662

Uh oh!

Conversation

venkywonka commented Oct 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

juney-nvidia commented Oct 27, 2025

Uh oh!

Uh oh!

tburt-nv Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

venkywonka Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

tburt-nv Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

venkywonka Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

venkywonka commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

venkywonka commented Oct 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 25, 2025 •

edited

Loading

venkywonka Oct 27, 2025 •

edited

Loading