From a939a52cf322c627016658956cfdf0e6ccbb65cb Mon Sep 17 00:00:00 2001 From: Haksung Jang Date: Mon, 22 Jun 2026 21:50:48 +0900 Subject: [PATCH 1/2] feat(scanner): identify vendored open source in C/C++ sources via SCANOSS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit C/C++ embedded source with no package manager yields an almost-empty SBOM (cdxgen lists each file as pkg:generic). Add an opt-in step that fingerprints the sources against the public OSSKB, recording copied-in (vendored) open source as real components with name, version, PURL — and, where the library has an NVD record, a CPE so the existing Trivy step reports its CVEs. - identify-vendored.sh: SCANOSS file-match only (snippets off), dependency/ build folders excluded; emits CycloneDX with bomlens provenance. - normalize-sbom.sh + vendored-purl-map.json: map pkg:github matches to a CPE so the identify->CVE chain completes (Trivy ignores pkg:github/pkg:generic). - reconcile-vendored.sh: drop matches a package-manager component already covers, so enabling this on a managed project does not duplicate known deps or inflate CVEs. merge-sbom.sh unchanged. - suggest-vendored.sh: off-by-default discovery — nudge only for C/C++ with no manifest and a near-empty scan; sets a metadata flag for the web UI banner. - CLI --identify-vendored + SCANOSS_API_URL/KEY; Dockerfile SBOM_SCANOSS opt-in (scanoss.py is MIT; GPL engine not bundled). - Web UI: Advanced toggle, result banner, vendored badge with read-only match confidence. Stateful match audit stays out of scope (TRUSCA). - Tests: 40 No-Docker unit assertions incl. identify->merge->CPE chain and over-detection reconciliation; gated SCANOSS_E2E integration checks. - Docs + THIRD_PARTY_LICENSES (scanoss MIT, OSSKB terms). --- .github/workflows/ci.yml | 2 +- THIRD_PARTY_LICENSES.md | 13 ++ docker/Dockerfile | 17 ++ docker/entrypoint.sh | 40 +++++ docker/lib/identify-vendored.sh | 161 ++++++++++++++++++ docker/lib/normalize-sbom.sh | 27 ++- docker/lib/reconcile-vendored.sh | 45 +++++ docker/lib/suggest-vendored.sh | 79 +++++++++ docker/lib/vendored-purl-map.json | 21 +++ .../src/components/ComponentsTable.tsx | 12 ++ .../src/components/ResultDashboard.tsx | 9 + .../web/frontend/src/components/ScanForm.tsx | 33 ++++ docker/web/frontend/src/lib/api.ts | 15 +- .../web/frontend/src/locales/en/common.json | 10 +- .../web/frontend/src/locales/ko/common.json | 10 +- docker/web/server.py | 30 ++++ docs/guides/identify-vendored.ko.md | 62 +++++++ docs/guides/identify-vendored.md | 62 +++++++ mkdocs.yml | 1 + scripts/scan-sbom.sh | 17 +- tests/fixtures/cdxgen-cpp-sparse.json | 14 ++ tests/fixtures/cdxgen-node-managed.json | 13 ++ tests/fixtures/scanoss-raw-managed.json | 24 +++ tests/fixtures/scanoss-raw.json | 33 ++++ tests/test-e2e.sh | 74 ++++++++ tests/test-postprocess.sh | 130 ++++++++++++++ 26 files changed, 945 insertions(+), 9 deletions(-) create mode 100644 docker/lib/identify-vendored.sh create mode 100644 docker/lib/reconcile-vendored.sh create mode 100644 docker/lib/suggest-vendored.sh create mode 100644 docker/lib/vendored-purl-map.json create mode 100644 docs/guides/identify-vendored.ko.md create mode 100644 docs/guides/identify-vendored.md create mode 100644 tests/fixtures/cdxgen-cpp-sparse.json create mode 100644 tests/fixtures/cdxgen-node-managed.json create mode 100644 tests/fixtures/scanoss-raw-managed.json create mode 100644 tests/fixtures/scanoss-raw.json diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 2233e1b..cb31dd0 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -82,7 +82,7 @@ jobs: shellcheck docker/lib/validate-sbom.sh docker/lib/convert-to-cdx.sh docker/lib/generate-risk-report.sh || true echo "" echo "Checking docker/lib post-process scripts..." - shellcheck docker/lib/normalize-sbom.sh docker/lib/generate-notice.sh docker/lib/stamp-metadata.sh docker/lib/source-detect.sh docker/lib/build-prep.sh || true + shellcheck docker/lib/normalize-sbom.sh docker/lib/generate-notice.sh docker/lib/stamp-metadata.sh docker/lib/source-detect.sh docker/lib/build-prep.sh docker/lib/identify-vendored.sh docker/lib/suggest-vendored.sh docker/lib/reconcile-vendored.sh || true echo "" echo "Checking tests/test-scan.sh..." shellcheck tests/test-scan.sh || true diff --git a/THIRD_PARTY_LICENSES.md b/THIRD_PARTY_LICENSES.md index bda955f..edc367b 100644 --- a/THIRD_PARTY_LICENSES.md +++ b/THIRD_PARTY_LICENSES.md @@ -19,10 +19,23 @@ | trivy-db | 취약점 DB | Apache-2.0 | https://github.com/aquasecurity/trivy-db | | cosign | SBOM 서명 | Apache-2.0 | https://github.com/sigstore/cosign | | scancode-toolkit | 정밀 라이선스(opt-in) | Apache-2.0 (데이터셋 일부 CC-BY-4.0 등) | https://github.com/aboutcode-org/scancode-toolkit | +| scanoss (scanoss.py) | vendored 오픈소스 식별(opt-in `SBOM_SCANOSS`) | MIT (동봉 데이터셋 `osadl-copyleft.json`은 CC-BY-4.0) | https://github.com/scanoss/scanoss.py | | jq | SBOM 가공(헬퍼) | MIT (일부 컴포넌트 BSD/ICU/Lucent) | https://github.com/jqlang/jq | > 데이터: NVD(취약점 출처)는 public domain이며 "NIST/NVD" 출처 표시가 요구됩니다. +### vendored 오픈소스 식별과 OSSKB API (opt-in) + +`--identify-vendored`(빌드: `docker build --build-arg SBOM_SCANOSS=true`)는 클라이언트 `scanoss.py`(MIT)만 번들합니다. SBOM 매칭을 수행하는 SCANOSS Engine(GPL-2.0)은 **포함하지 않으며**, 호스팅 OSSKB API(`api.osskb.org`)를 호출합니다. 그래서 firmware 이미지의 GPL 도구와 달리 base 이미지에 둘 수 있습니다(MIT). 동봉 데이터셋 `osadl-copyleft.json`은 코드가 아닌 CC-BY-4.0 데이터로, 출처 표기만 요구됩니다. + +OSSKB API(운영: Software Transparency Foundation) 이용 시 약관 제약: + +- 전송되는 것은 소스 코드가 아니라 **파일 지문(해시)**뿐입니다. +- 반환 데이터는 **소프트웨어 식별 목적으로만** 사용할 수 있고, OSSKB 데이터를 **재배포·별도 DB로 캐싱하는 것은 금지**됩니다. `sbom-tools`는 스캔별 SBOM 컴포넌트로만 결과를 내보내므로 이 범위 안입니다. +- 무료·best-effort이며 **요청 빈도 제한(rate limit)**이 있습니다. 대량·전사 운용이나 에어갭 환경에서는 `SCANOSS_API_URL`/`SCANOSS_API_KEY`로 SCANOSS 상용 서비스나 자체 호스팅 엔드포인트를 지정하세요. +- 결과는 "사람 검토가 필요한 식별 힌트"로 제공됩니다(정확도 무보증). +- 약관 원문: https://www.softwaretransparency.org/terms + ## 펌웨어 이미지 — `ghcr.io/sktelecom/bomlens-firmware` (GPL 포함, opt-in) > 무거운 언팩·바이너리 분석 도구와 GPL 컴포넌트를 격리하기 위한 별도 opt-in 이미지입니다. diff --git a/docker/Dockerfile b/docker/Dockerfile index 1bea8e8..1f3d741 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -40,6 +40,11 @@ ARG TRIVY_VERSION=v0.70.0 ARG COSIGN_VERSION=v2.4.1 ARG SCANCODE_VERSION=32.3.0 ARG SBOM_DEEP_LICENSE=false +# Vendored-OSS identification (opt-in: --build-arg SBOM_SCANOSS=true). scanoss.py +# is MIT, so unlike the firmware GPL tools it can live in the base image; kept +# opt-in only to keep the default image lean and free of outbound API calls. +ARG SBOM_SCANOSS=false +ARG SCANOSS_VERSION=1.25.2 ARG TARGETARCH=amd64 # docker CLI (client only) — lets the web UI source scan launch cdxgen language # images as sibling containers via the mounted host socket (transitive deps). @@ -105,6 +110,17 @@ RUN if [ "$SBOM_DEEP_LICENSE" = "true" ]; then \ echo "[build] scancode-toolkit skipped (SBOM_DEEP_LICENSE=false)"; \ fi +# scanoss.py (opt-in vendored-OSS identification: --build-arg SBOM_SCANOSS=true). +# MIT-licensed client; talks to the hosted OSSKB API (or SCANOSS_API_URL). It does +# NOT bundle the GPL-2.0 SCANOSS engine. See identify-vendored.sh and +# THIRD_PARTY_LICENSES.md for the OSSKB terms (identification-only, no redistribution). +RUN if [ "$SBOM_SCANOSS" = "true" ]; then \ + pip3 install --no-cache-dir "scanoss==${SCANOSS_VERSION}" \ + && scanoss-py --version; \ + else \ + echo "[build] scanoss skipped (SBOM_SCANOSS=false)"; \ + fi + # Firmware unpack + binary identification (opt-in: --build-arg SBOM_FIRMWARE=true). # unblob = MIT (primary unpacker); cve-bin-tool/ubi_reader = GPL (binary ID, UBI). # squashfs-tools/e2fsprogs/p7zip/unar/jefferson/... are the extractor binaries @@ -131,6 +147,7 @@ RUN if [ "$SBOM_FIRMWARE" = "true" ]; then \ # Reflects the firmware build flavor. When SBOM_FIRMWARE=true the image bundles GPL # tools (cve-bin-tool, ubi_reader); the source-offer label points to the inventory. LABEL com.sktelecom.sbom.firmware-tools="${SBOM_FIRMWARE}" \ + com.sktelecom.sbom.scanoss-tools="${SBOM_SCANOSS}" \ com.sktelecom.sbom.gpl-source-offer="https://github.com/sktelecom/sbom-tools/blob/main/THIRD_PARTY_LICENSES.md" COPY entrypoint.sh /usr/local/bin/run-scan diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh index 72eb182..2c99956 100644 --- a/docker/entrypoint.sh +++ b/docker/entrypoint.sh @@ -232,6 +232,46 @@ if command -v jq >/dev/null 2>&1; then fi fi +# ======================================================== +# Vendored open source (opt-in, SCANOSS) — only meaningful for a source tree. +# Runs for both the CLI source scan (MODE=POSTPROCESS, tree mounted at /src) and +# the web-UI source scan (SOURCE mode, SOURCE_ROOT). When enabled, identify the +# open source copied straight into the sources and merge it into the SBOM before +# stamping/normalizing, so the PURL->CPE fix and the security scan pick it up. +# When disabled, suggest-vendored.sh decides whether to nudge the user (C/C++, +# no package manager, near-empty scan) — off-by-default discovery. +# ======================================================== +VENDORED_SRC="${SOURCE_ROOT:-/src}" +if [ "${IDENTIFY_VENDORED:-false}" = "true" ] && [ -d "$VENDORED_SRC" ]; then + echo "[INFO] Identifying vendored open source (SCANOSS)..." + VEND_SBOM="${OUT_PREFIX}_vendored.cdx.json" + if bash "$LIBDIR/identify-vendored.sh" "$VENDORED_SRC" "$VEND_SBOM" "$PROJECT_VERSION"; then + VEND_N=$(jq '[.components[]?] | length' "$VEND_SBOM" 2>/dev/null || echo 0) + # Reconcile against the package-manager scan before merging: drop vendored + # matches whose name a cdxgen/syft component already carries (see + # reconcile-vendored.sh). Prevents duplicate pkg:github components / false + # CVEs when this option is enabled on a normal managed project. + if [ "${VEND_N:-0}" -gt 0 ]; then + DROPPED_N=$(bash "$LIBDIR/reconcile-vendored.sh" "$OUTPUT_FILE" "$VEND_SBOM") + [ "${DROPPED_N:-0}" -gt 0 ] && echo "[INFO] vendored: reconciled ${DROPPED_N} match(es) already covered by the package-manager scan." + VEND_N=$(jq '[.components[]?] | length' "$VEND_SBOM" 2>/dev/null || echo 0) + fi + if [ "${VEND_N:-0}" -gt 0 ]; then + echo "[INFO] vendored components identified: $VEND_N — merging into SBOM." + if bash "$LIBDIR/merge-sbom.sh" "${OUTPUT_FILE}.merged" "$PROJECT_NAME" "$PROJECT_VERSION" "$OUTPUT_FILE" "$VEND_SBOM"; then + mv "${OUTPUT_FILE}.merged" "$OUTPUT_FILE" + else + echo "[WARN] merge of vendored components failed; keeping the original SBOM." >&2 + rm -f "${OUTPUT_FILE}.merged" + fi + else + echo "[INFO] no new vendored open source to add (after reconciliation)." + fi + fi +elif [ -d "$VENDORED_SRC" ]; then + bash "$LIBDIR/suggest-vendored.sh" "$OUTPUT_FILE" "$VENDORED_SRC" || true +fi + # ======================================================== # Common pipeline: normalize / deep-license / notice / security / sign # ======================================================== diff --git a/docker/lib/identify-vendored.sh b/docker/lib/identify-vendored.sh new file mode 100644 index 0000000..dcf1a83 --- /dev/null +++ b/docker/lib/identify-vendored.sh @@ -0,0 +1,161 @@ +#!/bin/bash +# Copyright 2026 SK Telecom Co., Ltd. +# Licensed under the Apache License, Version 2.0. +# +# identify-vendored.sh — identify open source copied (vendored) into a source tree. +# +# Usage: identify-vendored.sh +# produces (CycloneDX 1.6) whose components are the open-source +# files SCANOSS matched against its public knowledge base (OSSKB). +# +# Why this exists: a C/C++ embedded source tree with no package manager (raw +# CMake/Make) yields an almost-empty SBOM — cdxgen lists each source file as a +# pkg:generic component with no name/version. The real open source lives in files +# copied straight into the tree (liblfds, djbdns, libaes, openssl, …). SCANOSS +# winnowing fingerprints those files and matches them to a known release, so we +# can record them as proper components (name + version + purl). +# +# Precision: only FULL-FILE matches (id == "file") are promoted to components. +# Snippet matches (a few lines copied from elsewhere) are noisy and are skipped +# here, so the SBOM that feeds the security/notice pipeline stays clean. +# +# Privacy: SCANOSS sends file FINGERPRINTS (hashes), not source code, to the +# OSSKB API. Endpoint/credentials are overridable via SCANOSS_API_URL / +# SCANOSS_API_KEY (default: the free api.osskb.org). +# +# Best-effort: a missing tool, no network, or no match degrades to an empty +# components array rather than aborting — the caller always gets a valid SBOM. +set -e + +SRC="$1" +OUTPUT="$2" +VERSION="${3:-unknown}" + +if [ -z "$SRC" ] || [ ! -d "$SRC" ]; then + echo "[vendored] source directory not found: $SRC" >&2 + exit 1 +fi +if [ -z "$OUTPUT" ]; then + echo "[vendored] output path is required (usage: identify-vendored.sh )" >&2 + exit 1 +fi + +GEN_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ") + +# Always emit a valid (possibly empty) CycloneDX envelope, used on every +# graceful-degrade path below so the caller never sees a missing/half file. +write_empty() { + jq -n --arg version "$VERSION" --arg ts "$GEN_AT" ' + { + bomFormat: "CycloneDX", specVersion: "1.6", version: 1, + metadata: { + timestamp: $ts, + tools: { components: [ { type: "application", name: "scanoss" } ] }, + component: { type: "application", name: "vendored", version: $version } + }, + components: [] + }' > "$OUTPUT" +} + +if ! command -v scanoss-py >/dev/null 2>&1; then + echo "[vendored] scanoss-py not installed in this image; skipping vendored identification." >&2 + echo "[vendored] Rebuild with: docker build --build-arg SBOM_SCANOSS=true -t bomlens ./docker" >&2 + write_empty + exit 0 +fi +if ! command -v jq >/dev/null 2>&1; then + echo "[vendored] ERROR: jq not installed in this image." >&2 + exit 1 +fi + +WORK="$(mktemp -d)" +trap 'rm -rf "$WORK"' EXIT +RAW="$WORK/scanoss-raw.json" + +# Folders owned by a package manager or a build (their contents are already +# declared by the cdxgen scan, or are generated output). Excluding them keeps +# SCANOSS from re-identifying known dependencies as duplicate pkg:github +# components — the main over-detection risk when this option is enabled on a +# normal, package-managed project. (Name reconciliation in entrypoint.sh is the +# second line of defence for anything that slips through.) +SKIP_FOLDERS="node_modules vendor dist build target out .venv venv \ +__pycache__ .gradle .m2 .git bower_components Pods .next .tox .cargo .bundle" +SKIP_ARGS=() +for d in $SKIP_FOLDERS; do SKIP_ARGS+=(--skip-folder "$d"); done +# Ignore tiny files: too little content to identify reliably, a common source of +# spurious file matches (boilerplate headers, empty stubs). +SKIP_ARGS+=(--skip-size 256) + +# Run SCANOSS. --skip-snippets keeps it to full-file matching (precision) and is +# faster/lighter on the API. We take the RAW result (default JSON, keyed by file +# path) rather than scanoss' own CycloneDX so we fully control which matches are +# promoted and which provenance properties are attached. +echo "[vendored] SCANOSS: fingerprinting $SRC (file hashes only; source stays local)..." +# shellcheck disable=SC2086 +if ! scanoss-py scan "$SRC" --skip-snippets "${SKIP_ARGS[@]}" --output "$RAW" \ + ${SCANOSS_API_URL:+--apiurl "$SCANOSS_API_URL"} \ + ${SCANOSS_API_KEY:+--key "$SCANOSS_API_KEY"} >/dev/null 2>&1; then + echo "[vendored] WARN: SCANOSS scan failed (no network / rate limit / bad endpoint); no vendored components." >&2 + write_empty + exit 0 +fi +if [ ! -s "$RAW" ] || ! jq empty "$RAW" >/dev/null 2>&1; then + echo "[vendored] WARN: SCANOSS produced no usable result; no vendored components." >&2 + write_empty + exit 0 +fi + +# Transform raw SCANOSS JSON -> CycloneDX components. +# - keep only full-file matches (.id == "file") +# - carry SCANOSS' cpe through when present (lets Trivy match CVEs directly; +# normalize-sbom.sh fills the gap for libraries SCANOSS gives no cpe for) +# - tag provenance: bomlens:layer=vendored, identifiedBy=scanoss, match %, source file +# - dedupe by purl (fallback name@version), matching merge-sbom.sh +COMPS=$(jq -c ' + [ to_entries[] + | .key as $file + | .value[]? + | select((.id // "") == "file") + | { + type: "library", + name: (.component // ((.purl[0] // "") | sub("^pkg:[^/]+/"; ""))), + version: (.version // ""), + purl: (.purl[0] // null), + cpe: (.cpe[0]? // null), + licenses: ( [ .licenses[]?.name // empty ] + | map(select(. != null and . != "")) | unique + | map({ license: { name: . } }) ), + properties: ( [ + { name: "bomlens:layer", value: "vendored" }, + { name: "bomlens:identifiedBy", value: "scanoss" }, + { name: "bomlens:scanoss:match", value: (.matched // "") }, + { name: "bomlens:scanoss:file", value: $file }, + { name: "bomlens:scanoss:purl", value: (.purl[0] // "") } + ] | map(select((.value // "") != "")) ) + } + | with_entries(select(.value != null and .value != "" and .value != [])) + | select((.name // "") != "") + ] + | group_by(.purl // ((.name // "") + "@" + (.version // ""))) + | map(.[0]) + | sort_by(.purl // ((.name // "") + "@" + (.version // ""))) +' "$RAW" 2>/dev/null || echo '[]') + +NCOMP=$(echo "$COMPS" | jq 'length' 2>/dev/null || echo 0) + +jq -n \ + --argjson comps "$COMPS" \ + --arg version "$VERSION" \ + --arg ts "$GEN_AT" ' +{ + bomFormat: "CycloneDX", + specVersion: "1.6", + version: 1, + metadata: { + timestamp: $ts, + tools: { components: [ { type: "application", name: "scanoss" } ] } + }, + components: $comps +}' > "$OUTPUT" + +echo "[vendored] SBOM written: $OUTPUT (vendored components=${NCOMP})" diff --git a/docker/lib/normalize-sbom.sh b/docker/lib/normalize-sbom.sh index 571373e..c30f29d 100755 --- a/docker/lib/normalize-sbom.sh +++ b/docker/lib/normalize-sbom.sh @@ -45,6 +45,28 @@ SORT_FILTER='(.components) |= (if type=="array" then sort_by(.purl // ((.name // # component name/version are retained); valid namespaced swift purls are untouched. PURL_FIX='(.metadata.component) |= (if (has("purl") and (.purl|test("^pkg:swift/[^/]+@"))) then with_entries(select(.key!="purl")) else . end) | (.components) |= (if type=="array" then map(if (has("purl") and (.purl|test("^pkg:swift/[^/]+@"))) then with_entries(select(.key!="purl")) else . end) else . end)' +# Make vendored (SCANOSS-identified) components reachable by the security scan. +# SCANOSS labels C/C++ matches with pkg:github// PURLs, which Trivy +# does NOT use for CVE matching — it matches OS/language PURLs and CPEs. Without a +# CPE these components are identified but carry no vulnerabilities, breaking the +# identify->CVE chain. For components SCANOSS already gave a cpe we leave it alone; +# otherwise we look the version-stripped PURL coordinate up in vendored-purl-map.json +# and synthesize a cpe:2.3 (NVD). Coordinates not in the map (niche libraries with +# no NVD record) keep their PURL and are simply identified, not vuln-matched. +VMAP_JSON='{}' +[ -f "$SCRIPT_DIR/vendored-purl-map.json" ] && VMAP_JSON=$(cat "$SCRIPT_DIR/vendored-purl-map.json") +VENDORED_CPE_FIX='(.components) |= (if type=="array" then map( + if ( ((.properties // []) | map(select(.name=="bomlens:identifiedBy" and .value=="scanoss")) | length) > 0 ) + and (.cpe == null) and (.purl != null) and ((.version // "") != "") + then + ( .purl | split("@")[0] | split("?")[0] ) as $coord + | ($vmap[$coord]) as $m + | (if ($m != null) + then . + { cpe: ("cpe:2.3:a:" + $m.cpe_vendor + ":" + $m.cpe_product + ":" + .version + ":*:*:*:*:*:*:*") } + else . end) + else . end +) else . end)' + # Always: normalize component license aliases to SPDX ids. cdxgen records some # licenses as non-SPDX free text ("Expat license", "Apache License 2.0"); the v1.3 # web UI surfaces (license filter, distribution card, dependency tree) read these @@ -75,10 +97,11 @@ if [ "$MODE" = "--stable" ]; then # cdxgen further leaks the random name of the temp virtualenv it builds to # resolve python deps (cdxgen-venv-XXXXXX) into component evidence values, so # the same input yields a different byte stream each run; pin that suffix too. - jq -S " + jq -S --argjson vmap "$VMAP_JSON" " ${NORMALIZE_DEF} ${NULL_FIX} | ${PURL_FIX} + | ${VENDORED_CPE_FIX} | ${LICENSE_FIX} | ${SORT_FILTER} | walk(if type==\"object\" and has(\"timestamp\") then .timestamp = \"1970-01-01T00:00:00Z\" else . end) @@ -91,7 +114,7 @@ if [ "$MODE" = "--stable" ]; then | del(.serialNumber) " "$SBOM" > "$TMP" else - jq -S "${NORMALIZE_DEF} ${NULL_FIX} | ${PURL_FIX} | ${LICENSE_FIX} | ${SORT_FILTER}" "$SBOM" > "$TMP" + jq -S --argjson vmap "$VMAP_JSON" "${NORMALIZE_DEF} ${NULL_FIX} | ${PURL_FIX} | ${VENDORED_CPE_FIX} | ${LICENSE_FIX} | ${SORT_FILTER}" "$SBOM" > "$TMP" fi mv "$TMP" "$SBOM" diff --git a/docker/lib/reconcile-vendored.sh b/docker/lib/reconcile-vendored.sh new file mode 100644 index 0000000..ae4f8d6 --- /dev/null +++ b/docker/lib/reconcile-vendored.sh @@ -0,0 +1,45 @@ +#!/bin/bash +# Copyright 2026 SK Telecom Co., Ltd. +# Licensed under the Apache License, Version 2.0. +# +# reconcile-vendored.sh — drop vendored matches the package-manager scan already covers. +# +# Usage: reconcile-vendored.sh +# Rewrites in place, removing every component whose name (case- +# insensitive) already appears in . Prints the number dropped. +# +# Why: when --identify-vendored runs on a normal package-managed project, SCANOSS +# may file-match a declared dependency (e.g. node_modules/lodash). That would land +# as a duplicate pkg:github component with a possibly-wrong CPE — over-detection +# and false CVEs. The authoritative package-manager identity wins; only genuinely +# new finds (real vendored source) survive. The generic merge-sbom.sh dedup cannot +# do this (different PURL ecosystems never match) and must stay unchanged — layered +# server SBOMs legitimately repeat names across layers. +# +# Best-effort: any error leaves the vendored SBOM untouched and reports 0 dropped. +set -e + +BASE="$1" +VEND="$2" + +if [ -z "$BASE" ] || [ -z "$VEND" ] || [ ! -f "$BASE" ] || [ ! -f "$VEND" ]; then + echo 0; exit 0 +fi +if ! command -v jq >/dev/null 2>&1; then + echo 0; exit 0 +fi + +before=$(jq '[.components[]?] | length' "$VEND" 2>/dev/null || echo 0) +known=$(jq -c '[.components[]?.name // empty | ascii_downcase] | unique' "$BASE" 2>/dev/null || echo '[]') + +TMP="$(mktemp)" +if jq --argjson known "$known" \ + '.components |= map(select((((.name // "") | ascii_downcase) as $n | ($known | index($n))) | not))' \ + "$VEND" > "$TMP" 2>/dev/null; then + mv "$TMP" "$VEND" +else + rm -f "$TMP" +fi + +after=$(jq '[.components[]?] | length' "$VEND" 2>/dev/null || echo "$before") +echo $((before - after)) diff --git a/docker/lib/suggest-vendored.sh b/docker/lib/suggest-vendored.sh new file mode 100644 index 0000000..a431aba --- /dev/null +++ b/docker/lib/suggest-vendored.sh @@ -0,0 +1,79 @@ +#!/bin/bash +# Copyright 2026 SK Telecom Co., Ltd. +# Licensed under the Apache License, Version 2.0. +# +# suggest-vendored.sh — surface the --identify-vendored option only when it helps. +# +# Usage: suggest-vendored.sh +# +# Vendored-OSS identification (SCANOSS) is needed almost exclusively for C/C++ +# embedded source with no package manager — a small slice of users. So the option +# is off by default and hidden; this helper detects the one situation where it +# matters and tells the user, in one plain line, to switch it on. It never runs +# the scan itself (that sends fingerprints to an external API — the user decides). +# +# Trigger = no package-manager manifest + C/C++ source present + the scan found +# almost nothing (few components, or mostly cdxgen pkg:generic file entries). +# When it fires it also records `bomlens:suggest-identify-vendored=true` on the +# SBOM metadata so the web UI can show the same hint as a result banner. +# +# Best-effort and silent on anything unexpected: it must never break a scan. +set -e + +SBOM="$1" +SRC="$2" + +[ -n "$SBOM" ] && [ -f "$SBOM" ] || exit 0 +[ -n "$SRC" ] && [ -d "$SRC" ] || exit 0 +command -v jq >/dev/null 2>&1 || exit 0 + +# Already enabled? Nothing to suggest. +[ "${IDENTIFY_VENDORED:-false}" = "true" ] && exit 0 + +# C/C++ source present in the tree? +has_c=$(find "$SRC" -type f \( \ + -name '*.c' -o -name '*.cc' -o -name '*.cpp' -o -name '*.cxx' \ + -o -name '*.h' -o -name '*.hpp' -o -name '*.hh' \) 2>/dev/null | head -1) +[ -n "$has_c" ] || exit 0 + +# No package manager? Reuse the shared language detector. It returns "unknown" +# precisely when no manifest (pom.xml/package.json/go.mod/Conan/vcpkg/…) is found, +# which is the raw-CMake/Make C/C++ case this feature targets. With a manifest, +# cdxgen already resolves dependencies and the hint would be noise. +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +# shellcheck source=docker/lib/source-detect.sh +. "$SCRIPT_DIR/source-detect.sh" +[ "$(detect_lang "$SRC")" = "unknown" ] || exit 0 + +# Did the scan come up nearly empty, or mostly pkg:generic file noise? +total=$(jq '[.components[]?] | length' "$SBOM" 2>/dev/null || echo 0) +generic=$(jq '[.components[]? | select((.purl // "") | startswith("pkg:generic"))] | length' "$SBOM" 2>/dev/null || echo 0) +total=${total:-0}; generic=${generic:-0} + +sparse=0 +if [ "$total" -le 3 ]; then + sparse=1 +elif [ "$total" -gt 0 ] && [ "$((generic * 100 / total))" -ge 60 ]; then + sparse=1 +fi +[ "$sparse" = 1 ] || exit 0 + +cat >&2 <<'EOF' +[hint] This looks like a C/C++ source tree with no package manager, and the scan + found little. Open source is often copied (vendored) straight into such + sources and a normal scan cannot see it. To identify it, re-run with: + --identify-vendored + Only file fingerprints (hashes) are sent to the OSSKB service — your source + code stays local. See docs/guides/identify-vendored.md +EOF + +# Record the suggestion on the SBOM so the web UI can show a matching banner. +TMP="$(mktemp)" +if jq '(.metadata.properties) = ((.metadata.properties // []) + + [{ name: "bomlens:suggest-identify-vendored", value: "true" }])' \ + "$SBOM" > "$TMP" 2>/dev/null; then + mv "$TMP" "$SBOM" +else + rm -f "$TMP" +fi +exit 0 diff --git a/docker/lib/vendored-purl-map.json b/docker/lib/vendored-purl-map.json new file mode 100644 index 0000000..c6d017e --- /dev/null +++ b/docker/lib/vendored-purl-map.json @@ -0,0 +1,21 @@ +{ + "_comment": "PURL -> CPE map for vendored (statically-copied) open source. SCANOSS labels C/C++ matches with pkg:github// PURLs, which Trivy does NOT use for CVE matching (it matches OS/language PURLs and CPEs). normalize-sbom.sh looks up the version-stripped PURL coordinate here and attaches a cpe:2.3 so the identify->CVE chain completes. Keyed by PURL coordinate (no @version). cpe_vendor/cpe_product follow the NVD CPE dictionary. Only libraries that actually carry CVEs need an entry; niche libraries with no NVD record (liblfds, libdjbresolv, libaes) are identified by name+version without a CPE. Extend as needed. Entries starting with '_' are ignored.", + "pkg:github/openssl/openssl": { "cpe_vendor": "openssl", "cpe_product": "openssl" }, + "pkg:github/madler/zlib": { "cpe_vendor": "zlib", "cpe_product": "zlib" }, + "pkg:github/curl/curl": { "cpe_vendor": "curl", "cpe_product": "curl" }, + "pkg:github/libexpat/libexpat": { "cpe_vendor": "libexpat_project", "cpe_product": "libexpat" }, + "pkg:github/glennrp/libpng": { "cpe_vendor": "libpng", "cpe_product": "libpng" }, + "pkg:github/sqlite/sqlite": { "cpe_vendor": "sqlite", "cpe_product": "sqlite" }, + "pkg:github/mirror/busybox": { "cpe_vendor": "busybox", "cpe_product": "busybox" }, + "pkg:github/mkj/dropbear": { "cpe_vendor": "dropbear_ssh_project", "cpe_product": "dropbear_ssh" }, + "pkg:github/Mbed-TLS/mbedtls": { "cpe_vendor": "arm", "cpe_product": "mbed_tls" }, + "pkg:github/ARMmbed/mbedtls": { "cpe_vendor": "arm", "cpe_product": "mbed_tls" }, + "pkg:github/nghttp2/nghttp2": { "cpe_vendor": "nghttp2", "cpe_product": "nghttp2" }, + "pkg:github/libssh2/libssh2": { "cpe_vendor": "libssh2", "cpe_product": "libssh2" }, + "pkg:github/PCRE2Project/pcre2": { "cpe_vendor": "pcre", "cpe_product": "pcre2" }, + "pkg:github/json-c/json-c": { "cpe_vendor": "json-c", "cpe_product": "json-c" }, + "pkg:github/facebook/zstd": { "cpe_vendor": "facebook", "cpe_product": "zstandard" }, + "pkg:github/GNOME/libxml2": { "cpe_vendor": "xmlsoft", "cpe_product": "libxml2" }, + "pkg:github/openssh/openssh-portable": { "cpe_vendor": "openbsd", "cpe_product": "openssh" }, + "pkg:github/the-tcpdump-group/libpcap": { "cpe_vendor": "tcpdump", "cpe_product": "libpcap" } +} diff --git a/docker/web/frontend/src/components/ComponentsTable.tsx b/docker/web/frontend/src/components/ComponentsTable.tsx index c1a88fa..e5feb77 100644 --- a/docker/web/frontend/src/components/ComponentsTable.tsx +++ b/docker/web/frontend/src/components/ComponentsTable.tsx @@ -202,6 +202,18 @@ export function ComponentsTable({ items, total, truncated }: Props) { {c.group ? `${c.group} / ` : ""} {c.name} + {c.vendored && ( + + {t("result.vendoredBadge")} + + )} diff --git a/docker/web/frontend/src/components/ResultDashboard.tsx b/docker/web/frontend/src/components/ResultDashboard.tsx index ae7eb9f..9342697 100644 --- a/docker/web/frontend/src/components/ResultDashboard.tsx +++ b/docker/web/frontend/src/components/ResultDashboard.tsx @@ -43,6 +43,15 @@ export function ResultDashboard({ result }: { result: DoneEvent }) { + {/* Off-by-default discovery: the scan looked like C/C++ embedded source + with no package manager and found little. Nudge — don't auto-run. */} + {result.sbom?.suggestIdentifyVendored && ( +
+
{t("result.vendoredHintTitle")}
+

{t("result.vendoredHintBody")}

+
+ )} + (null); const [uploading, setUploading] = useState(false); @@ -82,6 +83,10 @@ export function ScanForm({ running, capabilities, onRun }: Props) { const textInput = TEXT_INPUT[source]; const isText = textInput !== undefined; const isAnalyze = source === "sbom-upload"; + // Vendored-OSS identification only applies to a scanned source tree. + const isSourceScan = + source === "current-dir" || source === "git-url" || source === "zip-upload"; + const showVendored = Boolean(capabilities.scanoss) && isSourceScan; const busy = running || uploading; const submit = async () => { @@ -139,6 +144,7 @@ export function ScanForm({ running, capabilities, onRun }: Props) { notice: isAnalyze ? true : notice, security: isAnalyze ? true : security, deepLicense, + identifyVendored: showVendored ? identifyVendored : false, // Byte-stable (reproducible) output is a CI concern; not exposed in the UI // so the default deliverable keeps a real timestamp + serialNumber. byteStable: false, @@ -298,6 +304,33 @@ export function ScanForm({ running, capabilities, onRun }: Props) { + {/* Advanced: vendored-OSS identification. Hidden by default — only shown + when the running image supports it (SBOM_SCANOSS) and the input is a + source tree, since it sends file fingerprints to an external service. */} + {showVendored && ( +
+ + {t("form.advanced")} + + +
+ )} + {invalid && (

{uploadKind && !file ? t("validation.file") : t("validation.required")} diff --git a/docker/web/frontend/src/lib/api.ts b/docker/web/frontend/src/lib/api.ts index 8169113..a7e9546 100644 --- a/docker/web/frontend/src/lib/api.ts +++ b/docker/web/frontend/src/lib/api.ts @@ -20,6 +20,10 @@ export interface ComponentItem { purl: string; type: string; licenses: string[]; + /** Identified by SCANOSS as open source copied (vendored) into the sources. */ + vendored?: boolean; + /** SCANOSS file-match confidence (e.g. "100%"), shown read-only on vendored rows. */ + matchConfidence?: string; } export interface SbomSummary { @@ -28,6 +32,9 @@ export interface SbomSummary { componentList?: ComponentItem[]; /** True when the SBOM has more components than the server returned. */ truncated?: boolean; + /** Set when the scan looks like C/C++ embedded source with no package manager, + * hinting the user to re-run with --identify-vendored. Drives a result banner. */ + suggestIdentifyVendored?: boolean; } export const SEVERITY_ORDER = [ @@ -110,6 +117,7 @@ export interface ScanParams { notice: boolean; security: boolean; deepLicense: boolean; + identifyVendored: boolean; byteStable: boolean; } @@ -121,6 +129,8 @@ export interface ScanHandlers { export interface Capabilities { firmware: boolean; + /** scanoss-py present (built with SBOM_SCANOSS) — enables --identify-vendored. */ + scanoss?: boolean; docker: boolean; firmwareImage?: string; hostDir?: string; // the host folder the UI was launched from (mounted as /src) @@ -130,10 +140,10 @@ export interface Capabilities { export async function getCapabilities(): Promise { try { const res = await fetch("/capabilities"); - if (!res.ok) return { firmware: false, docker: true }; + if (!res.ok) return { firmware: false, scanoss: false, docker: true }; return (await res.json()) as Capabilities; } catch { - return { firmware: false, docker: true }; + return { firmware: false, scanoss: false, docker: true }; } } @@ -222,6 +232,7 @@ export function startScan(params: ScanParams, handlers: ScanHandlers): EventSour notice: String(params.notice), security: String(params.security), deep_license: String(params.deepLicense), + identify_vendored: String(params.identifyVendored), byte_stable: String(params.byteStable), }); diff --git a/docker/web/frontend/src/locales/en/common.json b/docker/web/frontend/src/locales/en/common.json index 3eeec1b..0b26a36 100644 --- a/docker/web/frontend/src/locales/en/common.json +++ b/docker/web/frontend/src/locales/en/common.json @@ -10,6 +10,7 @@ "version": "Version", "versionPlaceholder": "1.0.0", "options": "Generation options", + "advanced": "Advanced", "run": "Run scan", "running": "Scanning…" }, @@ -45,7 +46,9 @@ "security": "Security report", "securityHint": "Generate a Trivy vulnerability report", "deepLicense": "Deep license", - "deepLicenseHint": "ScanCode deep license detection (slow)" + "deepLicenseHint": "ScanCode deep license detection (slow)", + "identifyVendored": "Identify bundled open source", + "identifyVendoredHint": "For C/C++ source with no package manager. Sends file fingerprints (not source) to the OSSKB service." }, "progress": { "title": "Run log", @@ -58,6 +61,11 @@ "components": "Components", "vulnerabilities": "Vulnerabilities", "noSecurity": "Security report was not generated", + "vendoredHintTitle": "Few components found — is this C/C++ embedded source?", + "vendoredHintBody": "Open source is often copied (vendored) into C/C++ sources that have no package manager, where a normal scan can't see it. Turn on \"Identify bundled open source\" under Advanced and run again. Only file fingerprints are sent — your source stays local.", + "vendoredBadge": "vendored", + "vendoredBadgeHint": "Identified by SCANOSS as open source copied into the sources", + "vendoredMatch": "match {{pct}}", "severityTitle": "Severity distribution", "noVulns": "No vulnerabilities found", "artifacts": "Generated artifacts", diff --git a/docker/web/frontend/src/locales/ko/common.json b/docker/web/frontend/src/locales/ko/common.json index 0a90950..c1081a4 100644 --- a/docker/web/frontend/src/locales/ko/common.json +++ b/docker/web/frontend/src/locales/ko/common.json @@ -10,6 +10,7 @@ "version": "버전", "versionPlaceholder": "1.0.0", "options": "생성 옵션", + "advanced": "고급", "run": "스캔 실행", "running": "스캔 중…" }, @@ -45,7 +46,9 @@ "security": "보안 보고서", "securityHint": "Trivy 취약점 보고서 생성", "deepLicense": "정밀 라이선스", - "deepLicenseHint": "ScanCode 심화 라이선스 탐지 (느림)" + "deepLicenseHint": "ScanCode 심화 라이선스 탐지 (느림)", + "identifyVendored": "내장 오픈소스 식별", + "identifyVendoredHint": "패키지 매니저가 없는 C/C++ 소스용. 소스가 아니라 파일 지문만 OSSKB 서비스로 전송합니다." }, "progress": { "title": "실행 로그", @@ -58,6 +61,11 @@ "components": "컴포넌트", "vulnerabilities": "취약점", "noSecurity": "보안 보고서를 생성하지 않았습니다", + "vendoredHintTitle": "컴포넌트가 거의 없습니다 — C/C++ 임베디드 소스인가요?", + "vendoredHintBody": "패키지 매니저가 없는 C/C++ 소스에는 오픈소스가 소스째 복사(vendored)돼 있는 경우가 많아, 일반 스캔으로는 보이지 않습니다. 고급에서 \"내장 오픈소스 식별\"을 켜고 다시 실행하세요. 소스가 아니라 파일 지문만 전송됩니다.", + "vendoredBadge": "vendored", + "vendoredBadgeHint": "소스에 복사된 오픈소스로 SCANOSS가 식별", + "vendoredMatch": "일치도 {{pct}}", "severityTitle": "심각도 분포", "noVulns": "발견된 취약점이 없습니다", "artifacts": "생성된 결과물", diff --git a/docker/web/server.py b/docker/web/server.py index d4d6580..be077ba 100644 --- a/docker/web/server.py +++ b/docker/web/server.py @@ -126,6 +126,11 @@ def firmware_capable(): return shutil.which("unblob") is not None +def scanoss_capable(): + """Vendored-OSS identification (scanoss-py) is only built in with SBOM_SCANOSS.""" + return shutil.which("scanoss-py") is not None + + def docker_capable(): return os.path.exists("/var/run/docker.sock") @@ -233,6 +238,17 @@ def sbom_summary(project, version): comps = data.get("components") or [] rows = [] for c in comps[:MAX_COMPONENT_ROWS]: + props = c.get("properties") or [] + vendored = any( + p.get("name") == "bomlens:layer" and p.get("value") == "vendored" + for p in props + ) + # SCANOSS match confidence, surfaced read-only so a reviewer can eyeball it + # (no accept/reject workflow — match triage belongs to TRUSCA). + match = next( + (p.get("value") for p in props if p.get("name") == "bomlens:scanoss:match"), + "", + ) rows.append({ "name": c.get("name") or "", "version": c.get("version") or "", @@ -240,11 +256,21 @@ def sbom_summary(project, version): "purl": c.get("purl") or "", "type": c.get("type") or "", "licenses": _component_licenses(c), + "vendored": vendored, + "matchConfidence": match, }) + # suggest-identify-vendored: set by suggest-vendored.sh when the scan looks like + # C/C++ embedded source with no package manager. Drives the result banner. + meta_props = (data.get("metadata") or {}).get("properties") or [] + suggest = any( + p.get("name") == "bomlens:suggest-identify-vendored" and p.get("value") == "true" + for p in meta_props + ) return { "components": len(comps), "componentList": rows, "truncated": len(comps) > MAX_COMPONENT_ROWS, + "suggestIdentifyVendored": suggest, } @@ -427,6 +453,7 @@ def do_GET(self): elif path == "/capabilities": self._send(200, json.dumps({ "firmware": firmware_capable(), + "scanoss": scanoss_capable(), "docker": docker_capable(), "firmwareImage": FIRMWARE_IMAGE, "hostDir": os.environ.get("SBOM_UI_HOST_DIR", ""), @@ -633,6 +660,9 @@ def fail(msg): "GENERATE_SECURITY": "true" if g("security", "true") == "true" else "false", "GENERATE_REPORT": "true", # 오픈소스위험분석보고서: default-on (mirrors CLI) "DEEP_LICENSE": "true" if g("deep_license") == "true" else "false", + # Vendored-OSS identification (SCANOSS). SCANOSS_API_URL/KEY, if set in + # the server's environment, pass through via env.copy() above. + "IDENTIFY_VENDORED": "true" if g("identify_vendored") == "true" else "false", "BYTE_STABLE": "true" if g("byte_stable") == "true" else "false", }) cwd = OUTPUT_DIR diff --git a/docs/guides/identify-vendored.ko.md b/docs/guides/identify-vendored.ko.md new file mode 100644 index 0000000..9ea5701 --- /dev/null +++ b/docs/guides/identify-vendored.ko.md @@ -0,0 +1,62 @@ +--- +description: 패키지 매니저가 없는 C/C++ 임베디드 소스에 소스째 포함(vendored)된 오픈소스를 식별합니다. 거의 빈 SBOM이 버전·CVE를 갖춘 컴포넌트 목록으로 바뀝니다. +--- + +# 내장 오픈소스 식별 (C/C++) + +C/C++ 임베디드 소스를 스캔했는데 BomLens가 거의 아무것도 못 찾을 때 사용합니다. + +## 언제 필요한가 + +일반 스캔은 패키지 매니저(npm, Maven, pip, Go, Conan 등)를 읽어 프로젝트가 어떤 오픈소스를 쓰는지 파악합니다. C/C++ 임베디드 펌웨어에는 대개 패키지 매니저가 없고, 오픈소스가 소스 트리에 그대로 복사돼 있습니다. 예를 들어 `third_party/` 아래에 openssl·zlib·liblfds 사본이 들어가는 식인데, 이를 소스째 포함(vendored)이라고 합니다. cdxgen은 이런 파일의 이름을 알 수 없어, SBOM이 거의 비고 각 파일이 식별 안 된 `pkg:generic` 항목으로만 나옵니다. + +이 상황이 되면 BomLens가 이 옵션을 권하는 한 줄 안내를 출력하고, 웹 UI도 스캔 후 같은 안내를 보여줍니다. 사용자가 직접 상황을 알아챌 필요는 없습니다. + +`--identify-vendored`는 소스 파일의 지문을 공개 OSSKB 지식 베이스와 대조해, 일치한 항목을 이름·버전·PURL을 갖춘 컴포넌트로 기록합니다. 그러면 복사돼 들어간 오픈소스가 SBOM에 드러나고, 알려진 CVE가 있는 라이브러리는 보안 보고서에도 나타납니다. + +## 무엇이 전송되나 + +OSSKB 서비스로는 파일 **지문(해시)**만 전송됩니다. 소스 코드는 기기를 떠나지 않습니다. 공급사는 계약 전에 자기 환경에서 그대로 실행할 수 있습니다. + +## 패키지 매니저가 있는 프로젝트에서는 + +이 옵션은 패키지 매니저가 없는 소스를 위한 것입니다. npm·Maven·pip·Go 등을 쓰는 프로젝트라면 일반 스캔이 이미 의존성을 해석하므로 필요하지 않습니다. 그래도 켜면 BomLens가 결과를 정합화합니다. 의존성·빌드 디렉터리(`node_modules`, `vendor`, `dist` 등)는 건너뛰고, 패키지 매니저 컴포넌트가 이미 가진 이름과 겹치는 매치는 그 권위 있는 식별을 우선해 제거합니다. 그래서 관리 프로젝트에서 켜도 알려진 의존성이 중복되거나 취약점 수가 부풀지 않으며, 기껏해야 패키지 매니저가 못 본 진짜 복사된 소스만 추가됩니다. + +매치는 출처와 신뢰도가 태깅된 채 읽기 전용으로 기록됩니다. BomLens는 accept/reject 같은 audit 워크플로를 제공하지 않습니다. 매치를 확정하거나 triage해야 하면 SBOM을 TRUSCA에 올려 거기서 처리하세요. + +## 준비 + +SCANOSS 클라이언트를 포함한 이미지를 빌드(또는 pull)합니다. + +```bash +docker build --build-arg SBOM_SCANOSS=true -t bomlens ./docker +``` + +## 실행 + +```bash +scan-sbom.sh --project trelay --version 26.4.0 --target ./src \ + --identify-vendored --all --generate-only +``` + +웹 UI에서는 **고급**을 펼쳐 **내장 오픈소스 식별**을 켭니다. 이 옵션은 소스 스캔이면서 이미지가 지원할 때만 보입니다. + +## 결과 + +- 복사된 오픈소스가 버전을 가진 컴포넌트로 SBOM에 나타나며, 각 항목에 `vendored` 표시(`bomlens:layer=vendored` 속성)가 붙습니다. +- 알려진 제품으로 매핑되는 컴포넌트에는 CPE가 붙어, Trivy 보안 보고서에 해당 CVE가 나열됩니다. 예를 들어 vendored된 `openssl 1.1.1w`는 관련 취약점과 함께 나타납니다. +- 취약점 데이터베이스에 기록이 없는 흔치 않은 라이브러리(예: `liblfds`, `libaes`, `djbdns`)는 이름과 버전까지 식별됩니다. 보고할 CVE가 없을 뿐이며, 이는 스캔이 아니라 공개 데이터의 한계입니다. + +파일 단위 전체 일치만 컴포넌트가 됩니다. 부분(스니펫) 일치는 노이즈가 커서 제외하므로 보고서가 깔끔하게 유지됩니다. + +## 엔드포인트와 제한 + +기본 엔드포인트는 무료 OSSKB API로, 요청 빈도 제한이 있고 식별 전용입니다. 대량 사용이나 에어갭 환경에서는 SCANOSS 상용·자체 호스팅 엔드포인트를 지정하세요. + +```bash +SCANOSS_API_URL=https://your-scanoss-endpoint \ +SCANOSS_API_KEY=your-key \ +scan-sbom.sh --project trelay --version 26.4.0 --target ./src --identify-vendored --all --generate-only +``` + +결과는 사람 검토가 도움이 되는 best-effort 추정입니다. OSSKB 약관과 라이선스 설명은 [THIRD_PARTY_LICENSES.md](https://github.com/sktelecom/sbom-tools/blob/main/THIRD_PARTY_LICENSES.md)를 참조하세요. diff --git a/docs/guides/identify-vendored.md b/docs/guides/identify-vendored.md new file mode 100644 index 0000000..66f46da --- /dev/null +++ b/docs/guides/identify-vendored.md @@ -0,0 +1,62 @@ +--- +description: Identify open source copied (vendored) into C/C++ embedded source that has no package manager, so a normal BomLens scan that finds almost nothing turns into a real component list with versions and CVEs. +--- + +# Identify bundled open source (C/C++) + +Use this when you scan a C/C++ embedded source tree and BomLens finds almost nothing. + +## When you need it + +A normal scan reads a package manager (npm, Maven, pip, Go, Conan, and so on) to learn what open source a project uses. C/C++ embedded firmware usually has no package manager: the open source is copied straight into the source tree (this is called *vendored* source — for example a copy of openssl, zlib, or liblfds under `third_party/`). cdxgen cannot name those files, so the SBOM comes back almost empty, with each file listed as an unidentified `pkg:generic` entry. + +When that happens, BomLens prints a one-line hint suggesting this option, and the web UI shows the same hint after the scan. You do not need to recognize the situation yourself. + +`--identify-vendored` matches the file fingerprints of your sources against the public OSSKB knowledge base and records each match as a real component (name, version, PURL), so the copied-in open source shows up in the SBOM — and, where the library has known CVEs, in the security report. + +## What is sent + +Only file **fingerprints** (hashes) are sent to the OSSKB service. Your source code never leaves the machine. The supplier can run this in their own environment before any contract. + +## On a package-managed project + +This option is for source with no package manager. If your project uses npm, Maven, pip, Go, and so on, the normal scan already resolves your dependencies and you do not need it. If you turn it on anyway, BomLens reconciles the results: dependency and build directories (`node_modules`, `vendor`, `dist`, and the like) are skipped, and any match whose name a package-manager component already carries is dropped in favor of that authoritative identity. So enabling it on a managed project does not duplicate known dependencies or inflate the vulnerability count — at most it adds genuinely copied-in source the package manager could not see. + +Matches are recorded read-only, tagged with their source and confidence. BomLens does not provide an accept/reject audit workflow; if you need to confirm or triage matches, upload the SBOM to TRUSCA and do it there. + +## Prerequisites + +Build (or pull) an image that includes the SCANOSS client: + +```bash +docker build --build-arg SBOM_SCANOSS=true -t bomlens ./docker +``` + +## Run it + +```bash +scan-sbom.sh --project trelay --version 26.4.0 --target ./src \ + --identify-vendored --all --generate-only +``` + +In the web UI, open **Advanced** and turn on **Identify bundled open source**. The option appears only for a source scan when the image supports it. + +## What you get + +- Copied-in open source appears in the SBOM as named components with versions, each tagged `vendored` (a `bomlens:layer=vendored` property). +- Components that map to a known product get a CPE, so the Trivy security report lists their CVEs. For example a vendored `openssl 1.1.1w` shows up with its advisories. +- Niche libraries with no entry in the vulnerability databases (for example `liblfds`, `libaes`, `djbdns`) are still identified by name and version; they simply have no CVEs to report, which is a limit of the public data, not of the scan. + +Only full-file matches become components. Partial (snippet) matches are noisy and are left out, so the report stays clean. + +## Endpoint and limits + +The default endpoint is the free OSSKB API, which is rate-limited and intended for identification only. For high-volume or air-gapped use, point at a SCANOSS commercial or self-hosted endpoint: + +```bash +SCANOSS_API_URL=https://your-scanoss-endpoint \ +SCANOSS_API_KEY=your-key \ +scan-sbom.sh --project trelay --version 26.4.0 --target ./src --identify-vendored --all --generate-only +``` + +Results are a best-effort estimate that benefits from human review. See the OSSKB terms and license notes in [THIRD_PARTY_LICENSES.md](https://github.com/sktelecom/sbom-tools/blob/main/THIRD_PARTY_LICENSES.md). diff --git a/mkdocs.yml b/mkdocs.yml index b8d0b3e..14cf6e8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -147,6 +147,7 @@ nav: - Reports: guides/reports.md - Supplier SBOM: guides/supplier-sbom.md - Firmware: guides/firmware.md + - Identify bundled OSS (C/C++): guides/identify-vendored.md - Upload: guides/upload.md - CI/CD: guides/ci-cd.md - Reference: diff --git a/scripts/scan-sbom.sh b/scripts/scan-sbom.sh index 97b9466..29b11ec 100755 --- a/scripts/scan-sbom.sh +++ b/scripts/scan-sbom.sh @@ -44,6 +44,8 @@ GENERATE_ONLY="false"; TARGET=""; PROJECT_NAME=""; PROJECT_VERSION="" GENERATE_NOTICE="false"; GENERATE_SECURITY="false"; DEEP_LICENSE="false" SIGN_SBOM="false"; BYTE_STABLE="false"; UI_MODE="false"; UI_PORT="${UI_PORT:-8080}" FORCE_FIRMWARE="false"; ANALYZE_SBOM="" +IDENTIFY_VENDORED="false" +SCANOSS_API_URL="${SCANOSS_API_URL:-}"; SCANOSS_API_KEY="${SCANOSS_API_KEY:-}" GIT_URL=""; GIT_REF=""; NO_REPORT="false"; GENERATE_REPORT="false" INGEST_SOURCE="false"; SCAN_INPUT_DIR=""; CLEANUP_DIRS=() MERGE_FILES=() @@ -76,6 +78,7 @@ while [[ "$#" -gt 0 ]]; do --security) GENERATE_SECURITY="true" ;; --all) GENERATE_NOTICE="true"; GENERATE_SECURITY="true" ;; --deep-license) DEEP_LICENSE="true" ;; + --identify-vendored) IDENTIFY_VENDORED="true" ;; --sign) SIGN_SBOM="true" ;; --byte-stable) BYTE_STABLE="true" ;; --firmware) FORCE_FIRMWARE="true" ;; @@ -113,6 +116,10 @@ Options: the risk report (+notice+security) is generated in every mode; --no-report opts out. --deep-license scancode deep license (opt-in image) + --identify-vendored Identify open source copied (vendored) into C/C++ source + that has no package manager. Matches file fingerprints + against the OSSKB service (opt-in image; sends hashes, + not source). See docs/guides/identify-vendored.md --byte-stable Deterministic SBOM output --sign cosign sign (requires COSIGN_KEY) --ui Launch local web UI @@ -127,6 +134,10 @@ Environment: COSIGN_KEY Signing key for --sign SBOM_SCANNER_IMAGE Override the scanner image SBOM_FIRMWARE_IMAGE Override the firmware image + SCANOSS_API_URL Vendored-OSS endpoint for --identify-vendored + (default: the free OSSKB API; set to a self-hosted + SCANOSS endpoint for air-gapped or high-volume use) + SCANOSS_API_KEY Credential for SCANOSS_API_URL (if the endpoint needs one) API_URL Upload server base URL (DT server, or TRUSCA base) API_KEY Upload credential (DT: X-Api-Key; TRUSCA: Bearer token) UPLOAD_TARGET dependency-track (default) | trusca @@ -191,8 +202,8 @@ trap cleanup EXIT INT TERM # HOST_UID/HOST_GID let the (root) container chown artifacts back to the calling # user, so Linux hosts/CI runners can read them (macOS Docker maps UIDs already). pp_env() { - printf ' -e GENERATE_NOTICE=%s -e GENERATE_SECURITY=%s -e GENERATE_REPORT=%s -e DEEP_LICENSE=%s -e SIGN_SBOM=%s -e BYTE_STABLE=%s -e UPLOAD_ENABLED=%s -e PROJECT_NAME=%q -e PROJECT_VERSION=%q -e HOST_OUTPUT_DIR=/host-output -e HOST_UID=%s -e HOST_GID=%s -e API_KEY=%q -e API_URL=%q -e UPLOAD_TARGET=%q -e TRUSCA_PROJECT_ID=%q -e TRUSCA_REF=%q -e TRUSCA_RELEASE=%q' \ - "$GENERATE_NOTICE" "$GENERATE_SECURITY" "$GENERATE_REPORT" "$DEEP_LICENSE" "$SIGN_SBOM" "$BYTE_STABLE" "$UPLOAD_VAR" "$PROJECT_NAME" "$PROJECT_VERSION" "$(id -u)" "$(id -g)" "$DEFAULT_API_KEY" "$SERVER_URL" "$UPLOAD_TARGET" "$TRUSCA_PROJECT_ID" "$TRUSCA_REF" "$TRUSCA_RELEASE" + printf ' -e GENERATE_NOTICE=%s -e GENERATE_SECURITY=%s -e GENERATE_REPORT=%s -e DEEP_LICENSE=%s -e IDENTIFY_VENDORED=%s -e SCANOSS_API_URL=%q -e SCANOSS_API_KEY=%q -e SIGN_SBOM=%s -e BYTE_STABLE=%s -e UPLOAD_ENABLED=%s -e PROJECT_NAME=%q -e PROJECT_VERSION=%q -e HOST_OUTPUT_DIR=/host-output -e HOST_UID=%s -e HOST_GID=%s -e API_KEY=%q -e API_URL=%q -e UPLOAD_TARGET=%q -e TRUSCA_PROJECT_ID=%q -e TRUSCA_REF=%q -e TRUSCA_RELEASE=%q' \ + "$GENERATE_NOTICE" "$GENERATE_SECURITY" "$GENERATE_REPORT" "$DEEP_LICENSE" "$IDENTIFY_VENDORED" "$SCANOSS_API_URL" "$SCANOSS_API_KEY" "$SIGN_SBOM" "$BYTE_STABLE" "$UPLOAD_VAR" "$PROJECT_NAME" "$PROJECT_VERSION" "$(id -u)" "$(id -g)" "$DEFAULT_API_KEY" "$SERVER_URL" "$UPLOAD_TARGET" "$TRUSCA_PROJECT_ID" "$TRUSCA_REF" "$TRUSCA_RELEASE" } # cosign key mount + env, only when --sign is set with a real key. The private @@ -423,9 +434,11 @@ if [ "$MODE" = "SOURCE" ]; then if [ "$LANG_DET" = "cpp" ]; then echo "[WARN] C/C++: dependencies resolve only via a package manager (Conan/vcpkg)." echo "[WARN] Raw CMake/Make sources yield a sparse SBOM; add --deep-license for 1st-party license headers." + echo "[WARN] For open source copied (vendored) into the sources, add --identify-vendored (opt-in image)." fi if [ "$LANG_DET" = "unknown" ]; then echo "[WARN] No package manifest detected; using cdxgen all-in-one (results may be sparse)." + echo "[WARN] If this is C/C++ embedded source, --identify-vendored finds open source copied in (opt-in image)." fi fi echo "[1/2] Generating SBOM (cdxgen)..." diff --git a/tests/fixtures/cdxgen-cpp-sparse.json b/tests/fixtures/cdxgen-cpp-sparse.json new file mode 100644 index 0000000..6046409 --- /dev/null +++ b/tests/fixtures/cdxgen-cpp-sparse.json @@ -0,0 +1,14 @@ +{ + "bomFormat": "CycloneDX", + "specVersion": "1.6", + "version": 1, + "metadata": { + "component": { "type": "application", "name": "trelay", "version": "26.4.0" } + }, + "components": [ + { "type": "file", "name": "ssl_lib.c", "purl": "pkg:generic/ssl_lib.c" }, + { "type": "file", "name": "lfds.c", "purl": "pkg:generic/lfds.c" }, + { "type": "file", "name": "main.c", "purl": "pkg:generic/main.c" }, + { "type": "file", "name": "config.c", "purl": "pkg:generic/config.c" } + ] +} diff --git a/tests/fixtures/cdxgen-node-managed.json b/tests/fixtures/cdxgen-node-managed.json new file mode 100644 index 0000000..57e4b77 --- /dev/null +++ b/tests/fixtures/cdxgen-node-managed.json @@ -0,0 +1,13 @@ +{ + "bomFormat": "CycloneDX", + "specVersion": "1.6", + "version": 1, + "metadata": { + "component": { "type": "application", "name": "webapp", "version": "1.0.0" } + }, + "components": [ + { "type": "library", "name": "lodash", "version": "4.17.21", "purl": "pkg:npm/lodash@4.17.21" }, + { "type": "library", "name": "express", "version": "4.18.2", "purl": "pkg:npm/express@4.18.2" }, + { "type": "library", "name": "axios", "version": "1.6.0", "purl": "pkg:npm/axios@1.6.0" } + ] +} diff --git a/tests/fixtures/scanoss-raw-managed.json b/tests/fixtures/scanoss-raw-managed.json new file mode 100644 index 0000000..1aafb24 --- /dev/null +++ b/tests/fixtures/scanoss-raw-managed.json @@ -0,0 +1,24 @@ +{ + "node_modules_copy/lodash/lodash.js": [ + { + "id": "file", + "component": "lodash", + "vendor": "lodash", + "version": "4.17.21", + "purl": ["pkg:github/lodash/lodash"], + "licenses": [{ "name": "MIT" }], + "matched": "100%" + } + ], + "src/liblfds/lfds.c": [ + { + "id": "file", + "component": "liblfds", + "vendor": "liblfds", + "version": "6.1.1", + "purl": ["pkg:github/liblfds/liblfds"], + "licenses": [{ "name": "Unlicense" }], + "matched": "100%" + } + ] +} diff --git a/tests/fixtures/scanoss-raw.json b/tests/fixtures/scanoss-raw.json new file mode 100644 index 0000000..490dfa1 --- /dev/null +++ b/tests/fixtures/scanoss-raw.json @@ -0,0 +1,33 @@ +{ + "src/openssl/ssl_lib.c": [ + { + "id": "file", + "component": "openssl", + "vendor": "openssl", + "version": "1.1.1w", + "purl": ["pkg:github/openssl/openssl"], + "licenses": [{ "name": "Apache-2.0" }], + "matched": "100%" + } + ], + "src/liblfds/lfds.c": [ + { + "id": "file", + "component": "liblfds", + "vendor": "liblfds", + "version": "6.1.1", + "purl": ["pkg:github/liblfds/liblfds"], + "licenses": [{ "name": "Unlicense" }], + "matched": "100%" + } + ], + "src/util/helpers.c": [ + { + "id": "snippet", + "component": "somelib", + "version": "2.0.0", + "purl": ["pkg:github/foo/somelib"], + "matched": "31%" + } + ] +} diff --git a/tests/test-e2e.sh b/tests/test-e2e.sh index dad16ba..31428d4 100755 --- a/tests/test-e2e.sh +++ b/tests/test-e2e.sh @@ -541,6 +541,80 @@ EOF rm -rf "$w" fi +# -------------------------------------------------------- +# Group 4b2: vendored-OSS identification (SCANOSS). Network + opt-in image, so +# it is gated behind SCANOSS_E2E=1 and never part of the default/CI run (OSSKB +# is rate-limited and identification-only). The deterministic half (the +# off-by-default suggestion) runs whenever the image is available. +# -------------------------------------------------------- +section "Vendored-OSS identification E2E" + +have_scanoss=0 +if [ "$have_image" = 1 ] && \ + docker run --rm --entrypoint sh "$SCANNER_IMG" -c 'command -v scanoss-py' >/dev/null 2>&1; then + have_scanoss=1 +fi + +if [ "$have_image" != 1 ]; then + skip "vendored-OSS identification (scanner image not available)" +else + # Deterministic: a C/C++ tree with no package manager and a near-empty scan + # must record the off-by-default suggestion (no SCANOSS needed for this). + w="$(mktemp -d "$WORK_ROOT/vend.XXXXXX")" + mkdir -p "$w/src" + cat > "$w/src/main.c" <<'EOF' +int main(void) { return 0; } +EOF + ( cd "$w" && bash "$SCAN" --project "vendtest" --version "1.0" \ + --generate-only ) > "$w/_suggest.log" 2>&1 || true + sbom="$w/vendtest_1.0_bom.json" + if [ -f "$sbom" ] && jq -e '.metadata.properties[]? | select(.name=="bomlens:suggest-identify-vendored" and .value=="true")' "$sbom" >/dev/null 2>&1; then + pass "C/C++ source with no manifest records the identify-vendored suggestion" + else + fail "vendored suggestion not recorded for a bare C/C++ tree" "$(tail -5 "$w/_suggest.log" 2>/dev/null)"; show_log_if_verbose "$w" + fi + + # Real identification needs scanoss-py in the image + OSSKB reachability. + if [ "$have_scanoss" != 1 ]; then + skip "vendored identification scan (image lacks scanoss-py — build --build-arg SBOM_SCANOSS=true)" + elif [ "${SCANOSS_E2E:-0}" != "1" ]; then + skip "vendored identification scan (set SCANOSS_E2E=1 to hit the OSSKB API)" + else + ( cd "$w" && bash "$SCAN" --project "vendtest2" --version "1.0" \ + --identify-vendored --all --generate-only ) > "$w/_identify.log" 2>&1 || true + # Wiring check is deterministic; a specific match is not (depends on OSSKB). + if grep -q "Identifying vendored open source" "$w/_identify.log" 2>/dev/null; then + pass "--identify-vendored runs the SCANOSS step inside the container" + else + fail "--identify-vendored did not invoke the SCANOSS step" "$(tail -8 "$w/_identify.log" 2>/dev/null)"; show_log_if_verbose "$w" + fi + sbom2="$w/vendtest2_1.0_bom.json" + if [ -f "$sbom2" ] && jq -e '.bomFormat=="CycloneDX"' "$sbom2" >/dev/null 2>&1; then + pass "--identify-vendored produces a valid CycloneDX SBOM" + else + fail "--identify-vendored SBOM invalid/missing" + fi + + # Over-detection guard (the scenario raised in review): enabling + # --identify-vendored on a normal package-managed project must NOT balloon + # the component count. Reconciliation drops SCANOSS matches that the npm + # scan already declared, so the count stays ~equal to baseline. + if [ -d "$EXAMPLES/nodejs" ]; then + wb="$(run_source_scan "$EXAMPLES/nodejs" --all)" + base_n=$(jq '[.components[]?]|length' "$wb/testapp_1.0_bom.json" 2>/dev/null || echo 0) + wv="$(run_source_scan "$EXAMPLES/nodejs" --all --identify-vendored)" + vend_n=$(jq '[.components[]?]|length' "$wv/testapp_1.0_bom.json" 2>/dev/null || echo 0) + if [ "${base_n:-0}" -gt 0 ] && [ "${vend_n:-0}" -le "$((base_n + 3))" ]; then + pass "managed project: --identify-vendored does not over-detect (base=$base_n, with=$vend_n)" + else + fail "managed project over-detection" "base=$base_n, with=$vend_n (expected with <= base+3)" + fi + rm -rf "$wb" "$wv" + fi + fi + rm -rf "$w" +fi + # -------------------------------------------------------- # Group 4c: supplier SBOM analysis E2E through the container (requires image) # -------------------------------------------------------- diff --git a/tests/test-postprocess.sh b/tests/test-postprocess.sh index f18c731..76fa65d 100755 --- a/tests/test-postprocess.sh +++ b/tests/test-postprocess.sh @@ -139,6 +139,136 @@ date_expr=$(jq -r '.components[] | select(.name=="python-dateutil") | .licenses[ pkg_expr=$(jq -r '.components[] | select(.name=="packaging") | .licenses[0].expression // "ABSENT"' "$WORK/c.json") [ "$pkg_expr" = "Apache-2.0 OR BSD-2-Clause" ] && pass "compound expression left untouched" || fail "packaging expression='$pkg_expr'" +echo "== vendored: identify-vendored.sh promotes file matches, drops snippets ==" +# Mock scanoss-py (no network/image needed): write the raw SCANOSS fixture to the +# tool's --output path so identify-vendored.sh's jq transform is exercised. +mkdir -p "$WORK/bin" "$WORK/srctree/src" +echo 'int main(void){return 0;}' > "$WORK/srctree/src/main.c" +cat > "$WORK/bin/scanoss-py" <<'MOCK' +#!/bin/bash +out=""; prev="" +for a in "$@"; do [ "$prev" = "--output" ] && out="$a"; prev="$a"; done +[ -n "$out" ] && cp "$SCANOSS_RAW_FIXTURE" "$out" +exit 0 +MOCK +chmod +x "$WORK/bin/scanoss-py" +export SCANOSS_RAW_FIXTURE="$FIX/scanoss-raw.json" +PATH="$WORK/bin:$PATH" bash "$LIB/identify-vendored.sh" "$WORK/srctree" "$WORK/vend.json" "26.4.0" >/dev/null 2>&1 +vn=$(jq '[.components[]?] | length' "$WORK/vend.json" 2>/dev/null || echo 0) +[ "$vn" = "2" ] && pass "two full-file matches promoted (openssl, liblfds)" || fail "vendored components=$vn, expected 2" +if jq -e '[.components[] | select(.name=="somelib")] | length == 0' "$WORK/vend.json" >/dev/null 2>&1; then + pass "snippet-only match (somelib) not promoted to a component" +else + fail "snippet match leaked into components" +fi +if jq -e '.components[] | select(.name=="openssl") | .properties[]? | select(.name=="bomlens:identifiedBy" and .value=="scanoss")' "$WORK/vend.json" >/dev/null 2>&1; then + pass "vendored components carry bomlens:identifiedBy=scanoss" +else + fail "missing bomlens:identifiedBy=scanoss provenance" +fi + +echo "== vendored: identify -> merge -> normalize completes the PURL->CVE chain ==" +# Merge the vendored components with a sparse cdxgen C/C++ SBOM, then normalize. +bash "$LIB/merge-sbom.sh" "$WORK/merged.json" "trelay" "26.4.0" \ + "$FIX/cdxgen-cpp-sparse.json" "$WORK/vend.json" >/dev/null 2>&1 +if jq -e '.components[] | select(.name=="openssl")' "$WORK/merged.json" >/dev/null 2>&1; then + pass "vendored openssl survived the merge into the project SBOM" +else + fail "openssl missing after merge" +fi +bash "$LIB/normalize-sbom.sh" "$WORK/merged.json" >/dev/null 2>&1 +# openssl: no SCANOSS cpe, but the map yields one -> Trivy can now match CVEs. +ssl_cpe=$(jq -r '.components[] | select(.name=="openssl") | .cpe // "ABSENT"' "$WORK/merged.json") +[ "$ssl_cpe" = "cpe:2.3:a:openssl:openssl:1.1.1w:*:*:*:*:*:*:*" ] \ + && pass "openssl PURL mapped to a Trivy-matchable cpe ($ssl_cpe)" \ + || fail "openssl cpe='$ssl_cpe' (PURL->CVE chain broken)" +# niche liblfds: no NVD record -> identified only, original PURL preserved. +lfds_cpe=$(jq -r '.components[] | select(.name=="liblfds") | .cpe // "ABSENT"' "$WORK/merged.json") +lfds_purl=$(jq -r '.components[] | select(.name=="liblfds") | .purl // "ABSENT"' "$WORK/merged.json") +[ "$lfds_cpe" = "ABSENT" ] && pass "niche liblfds left without a cpe (no NVD record)" || fail "liblfds unexpectedly got cpe='$lfds_cpe'" +[ "$lfds_purl" = "pkg:github/liblfds/liblfds" ] && pass "liblfds keeps its identifying PURL" || fail "liblfds purl='$lfds_purl'" +if jq -e '.components[] | select(.name=="openssl") | .properties[]? | select(.name=="bomlens:layer" and .value=="vendored")' "$WORK/merged.json" >/dev/null 2>&1; then + pass "vendored provenance (bomlens:layer=vendored) survives normalize" +else + fail "vendored layer marker lost" +fi + +echo "== suggest: nudge only for C/C++ source, no manifest, sparse SBOM ==" +mkdir -p "$WORK/csrc" +echo 'int main(void){return 0;}' > "$WORK/csrc/main.c" +cp "$FIX/cdxgen-cpp-sparse.json" "$WORK/sug.json" +IDENTIFY_VENDORED=false bash "$LIB/suggest-vendored.sh" "$WORK/sug.json" "$WORK/csrc" >/dev/null 2>&1 +if jq -e '.metadata.properties[]? | select(.name=="bomlens:suggest-identify-vendored" and .value=="true")' "$WORK/sug.json" >/dev/null 2>&1; then + pass "C/C++ + no manifest + sparse SBOM -> suggestion recorded" +else + fail "expected suggestion property was not set" +fi +# Negative: a package manager manifest present -> no nudge (cdxgen already resolves). +mkdir -p "$WORK/nodesrc" +echo 'int main(void){return 0;}' > "$WORK/nodesrc/main.c" +echo '{"name":"x"}' > "$WORK/nodesrc/package.json" +cp "$FIX/cdxgen-cpp-sparse.json" "$WORK/sug2.json" +IDENTIFY_VENDORED=false bash "$LIB/suggest-vendored.sh" "$WORK/sug2.json" "$WORK/nodesrc" >/dev/null 2>&1 +if jq -e '.metadata.properties[]? | select(.name=="bomlens:suggest-identify-vendored")' "$WORK/sug2.json" >/dev/null 2>&1; then + fail "suggested even though a package manifest is present" +else + pass "no nudge when a package manager manifest exists" +fi +# Negative: already enabled -> never nudge. +cp "$FIX/cdxgen-cpp-sparse.json" "$WORK/sug3.json" +IDENTIFY_VENDORED=true bash "$LIB/suggest-vendored.sh" "$WORK/sug3.json" "$WORK/csrc" >/dev/null 2>&1 +if jq -e '.metadata.properties[]? | select(.name=="bomlens:suggest-identify-vendored")' "$WORK/sug3.json" >/dev/null 2>&1; then + fail "nudged even though --identify-vendored is already on" +else + pass "no nudge when --identify-vendored is already enabled" +fi + +echo "== vendored: reconciliation prevents over-detection on a managed project ==" +# A SCANOSS result that file-matches a declared dependency (lodash, already found +# by the package manager) plus a genuine vendored find (liblfds). Reconciliation +# must drop the duplicate and keep the new one, so enabling --identify-vendored on +# a normal managed project does not balloon the SBOM or invent false CVEs. +mkdir -p "$WORK/bin2" "$WORK/mtree/src" +echo 'int main(void){return 0;}' > "$WORK/mtree/src/main.c" +cat > "$WORK/bin2/scanoss-py" <<'MOCK' +#!/bin/bash +out=""; prev="" +for a in "$@"; do [ "$prev" = "--output" ] && out="$a"; prev="$a"; done +[ -n "$out" ] && cp "$SCANOSS_RAW_FIXTURE" "$out" +exit 0 +MOCK +chmod +x "$WORK/bin2/scanoss-py" +export SCANOSS_RAW_FIXTURE="$FIX/scanoss-raw-managed.json" +PATH="$WORK/bin2:$PATH" bash "$LIB/identify-vendored.sh" "$WORK/mtree" "$WORK/vend2.json" "1.0.0" >/dev/null 2>&1 +vraw=$(jq '[.components[]?]|length' "$WORK/vend2.json" 2>/dev/null || echo 0) +[ "$vraw" = "2" ] && pass "SCANOSS produced 2 matches (lodash + liblfds)" || fail "expected 2 raw vendored matches, got $vraw" + +# Reconcile against the managed cdxgen SBOM (which already declares lodash). +dropped=$(bash "$LIB/reconcile-vendored.sh" "$FIX/cdxgen-node-managed.json" "$WORK/vend2.json") +[ "$dropped" = "1" ] && pass "reconcile drops 1 match already covered by the package manager" || fail "reconcile dropped '$dropped', expected 1" +if jq -e '[.components[] | select((.name|ascii_downcase)=="lodash")] | length == 0' "$WORK/vend2.json" >/dev/null 2>&1; then + pass "duplicate lodash removed from the vendored set" +else + fail "duplicate lodash survived reconciliation (over-detection)" +fi +if jq -e '[.components[] | select(.name=="liblfds")] | length == 1' "$WORK/vend2.json" >/dev/null 2>&1; then + pass "genuine vendored find (liblfds) preserved" +else + fail "real vendored component liblfds was wrongly dropped" +fi + +# Merge the reconciled set into the managed SBOM: lodash stays single (the npm +# authoritative one), liblfds is added — no double counting. +bash "$LIB/merge-sbom.sh" "$WORK/mmerged.json" "webapp" "1.0.0" \ + "$FIX/cdxgen-node-managed.json" "$WORK/vend2.json" >/dev/null 2>&1 +lodash_n=$(jq '[.components[] | select((.name|ascii_downcase)=="lodash")] | length' "$WORK/mmerged.json") +total_n=$(jq '[.components[]?] | length' "$WORK/mmerged.json") +[ "$lodash_n" = "1" ] && pass "merged SBOM has exactly one lodash (no duplicate)" || fail "lodash appears ${lodash_n}x after merge" +[ "$total_n" = "4" ] && pass "merged total = 3 managed + 1 new vendored (no double count)" || fail "merged total=$total_n, expected 4" +# The surviving lodash is the authoritative package-manager identity (pkg:npm). +lodash_purl=$(jq -r '.components[] | select((.name|ascii_downcase)=="lodash") | .purl' "$WORK/mmerged.json") +[ "$lodash_purl" = "pkg:npm/lodash@4.17.21" ] && pass "package-manager identity (pkg:npm) wins over the SCANOSS pkg:github match" || fail "lodash purl='$lodash_purl', expected pkg:npm" + echo "" echo "Results: ${PASS} passed, ${FAIL} failed" [ "$FAIL" -eq 0 ] From 5d3d8f8e5dd90c59b5252197ec389143a088336e Mon Sep 17 00:00:00 2001 From: Haksung Jang Date: Mon, 22 Jun 2026 22:07:26 +0900 Subject: [PATCH 2/2] fix(scanner): normalize SCANOSS git-tag versions so vendored CPEs match MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OSSKB spike (real api.osskb.org, openssl 3.0.0 sources) confirmed two things: matches carry NO cpe field — so vendored-purl-map.json is the required path, not a fallback — and the version arrives as a git tag (e.g. "openssl-3.0.0", "v1.2.13"). Feeding that raw into the CPE produced a malformed cpe:2.3:a:openssl:openssl:openssl-3.0.0:... that Trivy could never match, silently breaking the identify->CVE chain on real data. - identify-vendored.sh: strip a leading "-"/"_" and a leading "v" before a digit, so the version is the bare release (3.0.0). - test fixture now uses a git-tag version ("openssl-3.0.0"); a new assertion pins the normalized output, and the CPE-chain test expects ...:3.0.0:... - guides: note that file-match version precision is approximate (a file's match reports the release where its content first appeared). --- docker/lib/identify-vendored.sh | 8 +++++++- docs/guides/identify-vendored.ko.md | 2 ++ docs/guides/identify-vendored.md | 2 ++ tests/fixtures/scanoss-raw.json | 2 +- tests/test-postprocess.sh | 7 ++++++- 5 files changed, 18 insertions(+), 3 deletions(-) diff --git a/docker/lib/identify-vendored.sh b/docker/lib/identify-vendored.sh index dcf1a83..d4a3d0c 100644 --- a/docker/lib/identify-vendored.sh +++ b/docker/lib/identify-vendored.sh @@ -109,6 +109,9 @@ fi # - keep only full-file matches (.id == "file") # - carry SCANOSS' cpe through when present (lets Trivy match CVEs directly; # normalize-sbom.sh fills the gap for libraries SCANOSS gives no cpe for) +# - normalize the version: OSSKB returns git-tag forms (e.g. "openssl-3.0.0", +# "v1.2.13"), which would otherwise produce a malformed CPE and miss CVEs. +# Strip a leading "-"/"_" and a leading "v" before a digit. # - tag provenance: bomlens:layer=vendored, identifiedBy=scanoss, match %, source file # - dedupe by purl (fallback name@version), matching merge-sbom.sh COMPS=$(jq -c ' @@ -119,7 +122,10 @@ COMPS=$(jq -c ' | { type: "library", name: (.component // ((.purl[0] // "") | sub("^pkg:[^/]+/"; ""))), - version: (.version // ""), + version: ( (.component // "") as $c + | (.version // "") + | ltrimstr($c + "-") | ltrimstr($c + "_") + | sub("^[vV](?=[0-9])"; "") ), purl: (.purl[0] // null), cpe: (.cpe[0]? // null), licenses: ( [ .licenses[]?.name // empty ] diff --git a/docs/guides/identify-vendored.ko.md b/docs/guides/identify-vendored.ko.md index 9ea5701..a08df03 100644 --- a/docs/guides/identify-vendored.ko.md +++ b/docs/guides/identify-vendored.ko.md @@ -59,4 +59,6 @@ SCANOSS_API_KEY=your-key \ scan-sbom.sh --project trelay --version 26.4.0 --target ./src --identify-vendored --all --generate-only ``` +버전은 근사값입니다. 파일 매치는 그 파일 내용이 처음 등장한 릴리스를 버전으로 보고하므로, 같은 라이브러리라도 파일마다 버전이 조금씩 다르게 나오거나 실제보다 한 단계 어긋난 릴리스로 보고될 수 있습니다. 버전(과 그로부터 도출된 CVE)은 최종 판정이 아니라 검토의 출발점으로 삼으세요. + 결과는 사람 검토가 도움이 되는 best-effort 추정입니다. OSSKB 약관과 라이선스 설명은 [THIRD_PARTY_LICENSES.md](https://github.com/sktelecom/sbom-tools/blob/main/THIRD_PARTY_LICENSES.md)를 참조하세요. diff --git a/docs/guides/identify-vendored.md b/docs/guides/identify-vendored.md index 66f46da..7fcdcde 100644 --- a/docs/guides/identify-vendored.md +++ b/docs/guides/identify-vendored.md @@ -59,4 +59,6 @@ SCANOSS_API_KEY=your-key \ scan-sbom.sh --project trelay --version 26.4.0 --target ./src --identify-vendored --all --generate-only ``` +Version precision is approximate. A file match reports the release where that file content first appeared, so different files of the same library can resolve to slightly different versions and a copied-in library may be reported a point release off. Treat the version (and any CVEs derived from it) as a starting point for review, not a final verdict. + Results are a best-effort estimate that benefits from human review. See the OSSKB terms and license notes in [THIRD_PARTY_LICENSES.md](https://github.com/sktelecom/sbom-tools/blob/main/THIRD_PARTY_LICENSES.md). diff --git a/tests/fixtures/scanoss-raw.json b/tests/fixtures/scanoss-raw.json index 490dfa1..b563b5d 100644 --- a/tests/fixtures/scanoss-raw.json +++ b/tests/fixtures/scanoss-raw.json @@ -4,7 +4,7 @@ "id": "file", "component": "openssl", "vendor": "openssl", - "version": "1.1.1w", + "version": "openssl-3.0.0", "purl": ["pkg:github/openssl/openssl"], "licenses": [{ "name": "Apache-2.0" }], "matched": "100%" diff --git a/tests/test-postprocess.sh b/tests/test-postprocess.sh index 76fa65d..c0e0a8b 100755 --- a/tests/test-postprocess.sh +++ b/tests/test-postprocess.sh @@ -166,6 +166,11 @@ if jq -e '.components[] | select(.name=="openssl") | .properties[]? | select(.na else fail "missing bomlens:identifiedBy=scanoss provenance" fi +# OSSKB returns git-tag versions (e.g. "openssl-3.0.0"); they must be normalized +# or the synthesized CPE is malformed and Trivy matches nothing (found via the +# real-OSSKB spike). The component version must be the bare "3.0.0". +ssl_ver=$(jq -r '.components[] | select(.name=="openssl") | .version' "$WORK/vend.json") +[ "$ssl_ver" = "3.0.0" ] && pass "git-tag version normalized (openssl-3.0.0 -> 3.0.0)" || fail "version='$ssl_ver', expected 3.0.0 (normalization)" echo "== vendored: identify -> merge -> normalize completes the PURL->CVE chain ==" # Merge the vendored components with a sparse cdxgen C/C++ SBOM, then normalize. @@ -179,7 +184,7 @@ fi bash "$LIB/normalize-sbom.sh" "$WORK/merged.json" >/dev/null 2>&1 # openssl: no SCANOSS cpe, but the map yields one -> Trivy can now match CVEs. ssl_cpe=$(jq -r '.components[] | select(.name=="openssl") | .cpe // "ABSENT"' "$WORK/merged.json") -[ "$ssl_cpe" = "cpe:2.3:a:openssl:openssl:1.1.1w:*:*:*:*:*:*:*" ] \ +[ "$ssl_cpe" = "cpe:2.3:a:openssl:openssl:3.0.0:*:*:*:*:*:*:*" ] \ && pass "openssl PURL mapped to a Trivy-matchable cpe ($ssl_cpe)" \ || fail "openssl cpe='$ssl_cpe' (PURL->CVE chain broken)" # niche liblfds: no NVD record -> identified only, original PURL preserved.