feat(scanner): identify vendored open source in C/C++ sources via SCANOSS#168
Merged
Conversation
…NOSS C/C++ embedded source with no package manager yields an almost-empty SBOM (cdxgen lists each file as pkg:generic). Add an opt-in step that fingerprints the sources against the public OSSKB, recording copied-in (vendored) open source as real components with name, version, PURL — and, where the library has an NVD record, a CPE so the existing Trivy step reports its CVEs. - identify-vendored.sh: SCANOSS file-match only (snippets off), dependency/ build folders excluded; emits CycloneDX with bomlens provenance. - normalize-sbom.sh + vendored-purl-map.json: map pkg:github matches to a CPE so the identify->CVE chain completes (Trivy ignores pkg:github/pkg:generic). - reconcile-vendored.sh: drop matches a package-manager component already covers, so enabling this on a managed project does not duplicate known deps or inflate CVEs. merge-sbom.sh unchanged. - suggest-vendored.sh: off-by-default discovery — nudge only for C/C++ with no manifest and a near-empty scan; sets a metadata flag for the web UI banner. - CLI --identify-vendored + SCANOSS_API_URL/KEY; Dockerfile SBOM_SCANOSS opt-in (scanoss.py is MIT; GPL engine not bundled). - Web UI: Advanced toggle, result banner, vendored badge with read-only match confidence. Stateful match audit stays out of scope (TRUSCA). - Tests: 40 No-Docker unit assertions incl. identify->merge->CPE chain and over-detection reconciliation; gated SCANOSS_E2E integration checks. - Docs + THIRD_PARTY_LICENSES (scanoss MIT, OSSKB terms).
OSSKB spike (real api.osskb.org, openssl 3.0.0 sources) confirmed two things:
matches carry NO cpe field — so vendored-purl-map.json is the required path,
not a fallback — and the version arrives as a git tag (e.g. "openssl-3.0.0",
"v1.2.13"). Feeding that raw into the CPE produced a malformed
cpe:2.3:a:openssl:openssl:openssl-3.0.0:... that Trivy could never match,
silently breaking the identify->CVE chain on real data.
- identify-vendored.sh: strip a leading "<component>-"/"<component>_" and a
leading "v" before a digit, so the version is the bare release (3.0.0).
- test fixture now uses a git-tag version ("openssl-3.0.0"); a new assertion
pins the normalized output, and the CPE-chain test expects ...:3.0.0:...
- guides: note that file-match version precision is approximate (a file's
match reports the release where its content first appeared).
Member
Author
OSSKB CPE spike — done ✅Ran the pre-merge spike against the real Findings:
No remaining blockers from the spike. |
This was referenced Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
C/C++ embedded source with no package manager (raw CMake/Make) yields an almost-empty SBOM — cdxgen lists each file as an unidentified
pkg:genericentry. The open source actually lives in files copied (vendored) into the tree (openssl, liblfds, djbdns, …). This adds an opt-in step that identifies those and feeds them into the existing license/CVE pipeline. Motivated by C/C++ supplier scans (e.g. tRelay 26.4.0) where the submitted SBOM identified zero open source.What
identify-vendored.shfingerprints sources against the public OSSKB (SCANOSS), file-match only (snippets off), with dependency/build folders excluded. Emits CycloneDX components taggedbomlens:layer=vendored/identifiedBy=scanoss.normalize-sbom.sh+vendored-purl-map.jsonmap SCANOSSpkg:github/*matches to a CPE, since Trivy ignorespkg:github/pkg:generic. Libraries with an NVD record (e.g. openssl, djbdns) then surface CVEs; niche ones (liblfds, libaes) are identified without CVEs (a data limit, not a tool one).reconcile-vendored.shdrops matches a package-manager component already covers, so enabling this on a normal managed project does not duplicate known deps or inflate the CVE count. Genericmerge-sbom.shunchanged.suggest-vendored.shnudges (one line + web banner) only for C/C++ with no manifest and a near-empty scan. The user is never required to know the feature up front.--identify-vendored(+SCANOSS_API_URL/SCANOSS_API_KEY); DockerfileSBOM_SCANOSSopt-in (scanoss.py is MIT; the GPL engine is not bundled); web UI Advanced toggle, result banner, and avendoredbadge showing read-only match confidence. A stateful accept/reject match-audit workflow is intentionally out of scope (that is TRUSCA's role).Testing
bash tests/test-postprocess.sh→ 40 passed (No-Docker), including the identify→merge→CPE chain, snippet exclusion, the suggestion logic, and the reconciliation/over-detection guard.bash -n+shellcheckon new/changed scripts; frontendtsc --noEmit+vite buildclean.tests/test-e2e.share gated behindSCANOSS_E2E=1(network + opt-in image), including a managed-project over-detection comparison (baseline vs--identify-vendored).Before merge
scanoss-py scan --format cyclonedxand confirm whether matches carry a CPE. The code handles both cases (CPE present → kept; absent → map lookup); this just confirms which path dominates and tunesvendored-purl-map.jsoncoverage.Notes
OSSKB is free, rate-limited, and identification-only; for high-volume or air-gapped use point
SCANOSS_API_URLat a SCANOSS commercial/self-hosted endpoint. Terms and license notes are inTHIRD_PARTY_LICENSES.md.