Skip to content

feat(scanner): identify vendored open source in C/C++ sources via SCANOSS#168

Merged
haksungjang merged 2 commits into
mainfrom
feat/identify-vendored-scanoss
Jun 22, 2026
Merged

feat(scanner): identify vendored open source in C/C++ sources via SCANOSS#168
haksungjang merged 2 commits into
mainfrom
feat/identify-vendored-scanoss

Conversation

@haksungjang

Copy link
Copy Markdown
Member

Why

C/C++ embedded source with no package manager (raw CMake/Make) yields an almost-empty SBOM — cdxgen lists each file as an unidentified pkg:generic entry. The open source actually lives in files copied (vendored) into the tree (openssl, liblfds, djbdns, …). This adds an opt-in step that identifies those and feeds them into the existing license/CVE pipeline. Motivated by C/C++ supplier scans (e.g. tRelay 26.4.0) where the submitted SBOM identified zero open source.

What

  • Identificationidentify-vendored.sh fingerprints sources against the public OSSKB (SCANOSS), file-match only (snippets off), with dependency/build folders excluded. Emits CycloneDX components tagged bomlens:layer=vendored / identifiedBy=scanoss.
  • Identify → CVE chainnormalize-sbom.sh + vendored-purl-map.json map SCANOSS pkg:github/* matches to a CPE, since Trivy ignores pkg:github/pkg:generic. Libraries with an NVD record (e.g. openssl, djbdns) then surface CVEs; niche ones (liblfds, libaes) are identified without CVEs (a data limit, not a tool one).
  • No over-detectionreconcile-vendored.sh drops matches a package-manager component already covers, so enabling this on a normal managed project does not duplicate known deps or inflate the CVE count. Generic merge-sbom.sh unchanged.
  • Off-by-default discoverysuggest-vendored.sh nudges (one line + web banner) only for C/C++ with no manifest and a near-empty scan. The user is never required to know the feature up front.
  • Surfaces — CLI --identify-vendored (+ SCANOSS_API_URL/SCANOSS_API_KEY); Dockerfile SBOM_SCANOSS opt-in (scanoss.py is MIT; the GPL engine is not bundled); web UI Advanced toggle, result banner, and a vendored badge showing read-only match confidence. A stateful accept/reject match-audit workflow is intentionally out of scope (that is TRUSCA's role).

Testing

  • bash tests/test-postprocess.sh40 passed (No-Docker), including the identify→merge→CPE chain, snippet exclusion, the suggestion logic, and the reconciliation/over-detection guard.
  • bash -n + shellcheck on new/changed scripts; frontend tsc --noEmit + vite build clean.
  • Integration checks in tests/test-e2e.sh are gated behind SCANOSS_E2E=1 (network + opt-in image), including a managed-project over-detection comparison (baseline vs --identify-vendored).

Before merge

  • Run the OSSKB spike once on a network-connected host: scan an openssl-vendored sample with scanoss-py scan --format cyclonedx and confirm whether matches carry a CPE. The code handles both cases (CPE present → kept; absent → map lookup); this just confirms which path dominates and tunes vendored-purl-map.json coverage.

Notes

OSSKB is free, rate-limited, and identification-only; for high-volume or air-gapped use point SCANOSS_API_URL at a SCANOSS commercial/self-hosted endpoint. Terms and license notes are in THIRD_PARTY_LICENSES.md.

…NOSS

C/C++ embedded source with no package manager yields an almost-empty SBOM
(cdxgen lists each file as pkg:generic). Add an opt-in step that fingerprints
the sources against the public OSSKB, recording copied-in (vendored) open
source as real components with name, version, PURL — and, where the library
has an NVD record, a CPE so the existing Trivy step reports its CVEs.

- identify-vendored.sh: SCANOSS file-match only (snippets off), dependency/
  build folders excluded; emits CycloneDX with bomlens provenance.
- normalize-sbom.sh + vendored-purl-map.json: map pkg:github matches to a CPE
  so the identify->CVE chain completes (Trivy ignores pkg:github/pkg:generic).
- reconcile-vendored.sh: drop matches a package-manager component already
  covers, so enabling this on a managed project does not duplicate known deps
  or inflate CVEs. merge-sbom.sh unchanged.
- suggest-vendored.sh: off-by-default discovery — nudge only for C/C++ with no
  manifest and a near-empty scan; sets a metadata flag for the web UI banner.
- CLI --identify-vendored + SCANOSS_API_URL/KEY; Dockerfile SBOM_SCANOSS opt-in
  (scanoss.py is MIT; GPL engine not bundled).
- Web UI: Advanced toggle, result banner, vendored badge with read-only match
  confidence. Stateful match audit stays out of scope (TRUSCA).
- Tests: 40 No-Docker unit assertions incl. identify->merge->CPE chain and
  over-detection reconciliation; gated SCANOSS_E2E integration checks.
- Docs + THIRD_PARTY_LICENSES (scanoss MIT, OSSKB terms).
OSSKB spike (real api.osskb.org, openssl 3.0.0 sources) confirmed two things:
matches carry NO cpe field — so vendored-purl-map.json is the required path,
not a fallback — and the version arrives as a git tag (e.g. "openssl-3.0.0",
"v1.2.13"). Feeding that raw into the CPE produced a malformed
cpe:2.3:a:openssl:openssl:openssl-3.0.0:... that Trivy could never match,
silently breaking the identify->CVE chain on real data.

- identify-vendored.sh: strip a leading "<component>-"/"<component>_" and a
  leading "v" before a digit, so the version is the bare release (3.0.0).
- test fixture now uses a git-tag version ("openssl-3.0.0"); a new assertion
  pins the normalized output, and the CPE-chain test expects ...:3.0.0:...
- guides: note that file-match version precision is approximate (a file's
  match reports the release where its content first appeared).
@haksungjang

Copy link
Copy Markdown
Member Author

OSSKB CPE spike — done ✅

Ran the pre-merge spike against the real api.osskb.org (scanoss-py 1.53.1, five openssl-3.0.0 source files, --skip-snippets).

Findings:

  • All 5 files returned full-file (id: file) matches.
  • No cpe field in the result → vendored-purl-map.json is the required path to CVEs, not a fallback. Design confirmed.
  • PURL is pkg:github/openssl/openssl (also conan/conda/maven/npm/nuget) → Trivy can't match it directly → the map's CPE synthesis is needed. Map key matches.
  • Version arrives as a git tag (openssl-3.0.0-beta2, openssl-3.0.0, …). Fed raw, that produced a malformed CPE and broke the chain on real data. Fixed in 5d3d8f8 (strip <component>-/v prefix → 3.0.0), now pinned by a test using a git-tag fixture.
  • Per-file matches resolved to different pre-release tags of the same release → file-match version precision is inherently approximate (now documented in the guide; status: pending on each match reinforces the review-first posture).

No remaining blockers from the spike.

@haksungjang haksungjang merged commit 9444f93 into main Jun 22, 2026
24 checks passed
@haksungjang haksungjang deleted the feat/identify-vendored-scanoss branch June 22, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant