Skip to content

feat(audit-story): phase 1 — SSG datastream rule-ID introspection (#127)#128

Merged
SupremeCommanderHedgehog merged 1 commit into
mainfrom
audit-story-phase-1-datastream-introspection
Jun 20, 2026
Merged

feat(audit-story): phase 1 — SSG datastream rule-ID introspection (#127)#128
SupremeCommanderHedgehog merged 1 commit into
mainfrom
audit-story-phase-1-datastream-introspection

Conversation

@SupremeCommanderHedgehog

Copy link
Copy Markdown
Owner

Summary

First phase of the cross-distro audit-story PR (#127). Adds the data and tooling that subsequent phases consume:

This PR introduces no functional changes — pure introspection + data files.

What landed

Toolscripts/audit_story/extract_ssg_rule_ids.py: stdlib-only CLI (~80 LOC). Parses xccdf:Rule@id from one or more ssg-*-ds.xml files and emits per-distro rule-ID lists + cross-distro set-ops markdown diff. 8 unit tests cover the extraction + diff logic against synthetic XML.

Data — extracted from the downstream package versions operators actually get at install time (see docs/audit-story/SSG-VERSIONS.md):

File Lines SSG version Package
docs/audit-story/alma8-rule-ids.txt 1630 0.1.74 scap-security-guide-0.1.74-3.el8_10.alma.1
docs/audit-story/alma9-rule-ids.txt 1530 0.1.80 scap-security-guide-0.1.80-1.el9_7.alma.2
docs/audit-story/ubuntu2404-rule-ids.txt 639 0.1.79 ssg-debderived 0.1.79-1
docs/audit-story/cross-distro-rule-id-diff.md 453 set-ops over the three
docs/audit-story/SSG-VERSIONS.md pinned versions + re-extraction recipe

Headline findings (from the diff)

  • 452 rules shared across all three datastreams — the universal STIG floor.
  • AL8 ∩ AL9 = 1435 rules (88% of AL8, 94% of AL9) — strongly validates the alma8 re-export gambit from feat: support AlmaLinux 8 — RHEL 8 STIG via ssg-almalinux8 datastream #121 phase 2. Most alma9 SSG IDs that emit_tailoring references are directly valid on alma8; per-rule cfg.distro branching for the audit-story will be relatively rare.
  • AL8 ∩ Ubuntu = 458, AL9 ∩ Ubuntu = 472 — modest overlap; Ubuntu's STIG content is much smaller (639 rules total vs RHEL-family's ~1600), reflecting its newer SSG status.
  • AL8-only: 189 rules — legacy AL8 subsystems (avahi, dhcp-server, cups, dovecot, ldap-client checks) dropped in AL9.
  • Rule-ID format is identical across distros (xccdf_org.ssgproject.content_rule_*). Future audit-story cfg.distro branching can reference the same string and just gate on set membership per distro.

Why pin downstream SSG versions (not upstream v0.1.81)

oscap xccdf eval at install time loads /usr/share/xml/scap/ssg/content/ssg-<distro>-ds.xml from the installed scap-security-guide RPM / ssg-debderived deb on the target host — whichever version that distro shipped. ks-gen's emit_tailoring references rule IDs that need to exist in that datastream, not in the latest upstream release. So we pin against what's actually deployable today.

When a downstream bumps SSG, re-run scripts/audit_story/extract_ssg_rule_ids.py per the SSG-VERSIONS.md recipe — git diff shows what changed.

Test plan

  • 8 new unit tests in tests/audit_story/test_extract_ssg_rule_ids.py cover: extract IDs from synthetic XCCDF 1.2 XML; rules without id= skipped; non-XCCDF-1.2 namespaces ignored; write_rule_id_list sorts + newline-terminates; cross-distro diff emits intersections + only-sets; CLI writes per-distro files + diff; CLI errors on missing path; CLI rejects malformed label=path.
  • No tests against the real ~1500-rule datastreams — those XML files are too large to commit (25-27 MB each). The pinned, committed *-rule-ids.txt lists ARE the regression check: a diff means SSG bumped.
  • Full CI chain run locally: ruff check && ruff format --check && mypy && pytest -q — all four green (941 passed = 933 from end of v0.26.0 + 8 new).
  • Each commit on this branch is GPG-signed with BE707B220C995478.
  • Manual smoke: ran the tool against real AL8 + AL9 + Ubuntu 24.04 datastreams extracted via rpm2cpio | cpio and dpkg-deb -x. Output matches the committed *-rule-ids.txt byte-for-byte.

What's next on #127

Phase 2 — per-rule mapping. For each of our 14 ubuntu2404 + 15 alma8 rules (plus an alma9 sweep), map the rule's operator-facing effect to the set of SSG rule IDs it should <disable> / <set-value> in emit_tailoring. Cross-reference against the three rule-ID files this PR produces.

Spec: docs/superpowers/specs/2026-06-20-audit-story-phase-1-datastream-introspection-design.md

Related: #127 (parent), #81 (ubuntu2404 ports — all emit_tailoring deferred), #121 (alma8 re-exports — inherit alma9's emit_tailoring).

First phase of the cross-distro audit-story PR (#127). Adds the data and
tooling that subsequent phases (per-rule mapping, emit_tailoring
implementation, exception_entry sweep, test coverage) consume.

Tool:
  scripts/audit_story/extract_ssg_rule_ids.py — stdlib-only CLI that
  parses xccdf:Rule@id from one or more ssg-*-ds.xml files and emits
  per-distro rule-ID lists + cross-distro set-ops markdown diff. 80 LOC.

Data (extracted from the downstream package versions operators actually
get at install time, per docs/audit-story/SSG-VERSIONS.md):
  docs/audit-story/alma8-rule-ids.txt        — 1630 rules (SSG 0.1.74)
  docs/audit-story/alma9-rule-ids.txt        — 1530 rules (SSG 0.1.80)
  docs/audit-story/ubuntu2404-rule-ids.txt   —  639 rules (SSG 0.1.79)
  docs/audit-story/cross-distro-rule-id-diff.md
  docs/audit-story/SSG-VERSIONS.md           — pinned versions + recipe

Headline findings (from the diff):
  - 452 rules shared across all three datastreams (universal STIG floor)
  - AL8 ∩ AL9 = 1435 rules (88% of AL8, 94% of AL9) — strongly validates
    the alma8 re-export gambit from #121 phase 2: most alma9 SSG IDs
    that emit_tailoring references are directly valid on alma8
  - AL8 only: 189 rules (legacy AL8 subsystems — avahi, dhcp-server,
    cups, dovecot, ldap-client checks dropped in AL9)
  - The rule-ID format is identical across distros
    (xccdf_org.ssgproject.content_rule_*), so a future audit-story
    cfg.distro branch can reference the same string and just gate on
    set membership per distro

Why pin downstream versions (not upstream v0.1.81):
  oscap xccdf eval at install time loads the datastream from the
  installed scap-security-guide RPM / ssg-debderived deb on the
  target host. That's the downstream — which lags upstream by 1-7
  patch releases. Pinning what's actually deployable means our
  per-rule mapping references IDs that exist where it matters.

8 new unit tests under tests/audit_story/ pin: extract_rule_ids from
synthetic XCCDF 1.2 XML; rules without id= skipped; non-XCCDF-1.2
namespaces ignored; write_rule_id_list sorts + newline-terminates;
diff emits intersections + only-sets; CLI writes per-distro files
+ diff; CLI errors on missing path; CLI rejects malformed label=path.

No tests against the real 1500-1600-rule datastreams: those XML files
are too large to commit and the pinned committed rule-id lists ARE
the regression check — a diff means SSG bumped.

Spec: docs/superpowers/specs/2026-06-20-audit-story-phase-1-datastream-introspection-design.md

Refs #127.
@SupremeCommanderHedgehog SupremeCommanderHedgehog merged commit 92c7c48 into main Jun 20, 2026
6 checks passed
@SupremeCommanderHedgehog SupremeCommanderHedgehog deleted the audit-story-phase-1-datastream-introspection branch June 20, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant