feat(audit-story): phase 1 — SSG datastream rule-ID introspection (#127)#128
Merged
SupremeCommanderHedgehog merged 1 commit intoJun 20, 2026
Merged
Conversation
First phase of the cross-distro audit-story PR (#127). Adds the data and tooling that subsequent phases (per-rule mapping, emit_tailoring implementation, exception_entry sweep, test coverage) consume. Tool: scripts/audit_story/extract_ssg_rule_ids.py — stdlib-only CLI that parses xccdf:Rule@id from one or more ssg-*-ds.xml files and emits per-distro rule-ID lists + cross-distro set-ops markdown diff. 80 LOC. Data (extracted from the downstream package versions operators actually get at install time, per docs/audit-story/SSG-VERSIONS.md): docs/audit-story/alma8-rule-ids.txt — 1630 rules (SSG 0.1.74) docs/audit-story/alma9-rule-ids.txt — 1530 rules (SSG 0.1.80) docs/audit-story/ubuntu2404-rule-ids.txt — 639 rules (SSG 0.1.79) docs/audit-story/cross-distro-rule-id-diff.md docs/audit-story/SSG-VERSIONS.md — pinned versions + recipe Headline findings (from the diff): - 452 rules shared across all three datastreams (universal STIG floor) - AL8 ∩ AL9 = 1435 rules (88% of AL8, 94% of AL9) — strongly validates the alma8 re-export gambit from #121 phase 2: most alma9 SSG IDs that emit_tailoring references are directly valid on alma8 - AL8 only: 189 rules (legacy AL8 subsystems — avahi, dhcp-server, cups, dovecot, ldap-client checks dropped in AL9) - The rule-ID format is identical across distros (xccdf_org.ssgproject.content_rule_*), so a future audit-story cfg.distro branch can reference the same string and just gate on set membership per distro Why pin downstream versions (not upstream v0.1.81): oscap xccdf eval at install time loads the datastream from the installed scap-security-guide RPM / ssg-debderived deb on the target host. That's the downstream — which lags upstream by 1-7 patch releases. Pinning what's actually deployable means our per-rule mapping references IDs that exist where it matters. 8 new unit tests under tests/audit_story/ pin: extract_rule_ids from synthetic XCCDF 1.2 XML; rules without id= skipped; non-XCCDF-1.2 namespaces ignored; write_rule_id_list sorts + newline-terminates; diff emits intersections + only-sets; CLI writes per-distro files + diff; CLI errors on missing path; CLI rejects malformed label=path. No tests against the real 1500-1600-rule datastreams: those XML files are too large to commit and the pinned committed rule-id lists ARE the regression check — a diff means SSG bumped. Spec: docs/superpowers/specs/2026-06-20-audit-story-phase-1-datastream-introspection-design.md Refs #127.
This was referenced Jun 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First phase of the cross-distro audit-story PR (#127). Adds the data and tooling that subsequent phases consume:
emit_tailoring.cfg.distrobranching vs re-export-replacement vs separate implementation — depend on whether the SSG IDs a given rule needs are shared across distros or distro-only.This PR introduces no functional changes — pure introspection + data files.
What landed
Tool —
scripts/audit_story/extract_ssg_rule_ids.py: stdlib-only CLI (~80 LOC). Parsesxccdf:Rule@idfrom one or moressg-*-ds.xmlfiles and emits per-distro rule-ID lists + cross-distro set-ops markdown diff. 8 unit tests cover the extraction + diff logic against synthetic XML.Data — extracted from the downstream package versions operators actually get at install time (see
docs/audit-story/SSG-VERSIONS.md):docs/audit-story/alma8-rule-ids.txtscap-security-guide-0.1.74-3.el8_10.alma.1docs/audit-story/alma9-rule-ids.txtscap-security-guide-0.1.80-1.el9_7.alma.2docs/audit-story/ubuntu2404-rule-ids.txtssg-debderived 0.1.79-1docs/audit-story/cross-distro-rule-id-diff.mddocs/audit-story/SSG-VERSIONS.mdHeadline findings (from the diff)
emit_tailoringreferences are directly valid on alma8; per-rulecfg.distrobranching for the audit-story will be relatively rare.xccdf_org.ssgproject.content_rule_*). Future audit-storycfg.distrobranching can reference the same string and just gate on set membership per distro.Why pin downstream SSG versions (not upstream v0.1.81)
oscap xccdf evalat install time loads/usr/share/xml/scap/ssg/content/ssg-<distro>-ds.xmlfrom the installedscap-security-guideRPM /ssg-debderiveddeb on the target host — whichever version that distro shipped. ks-gen'semit_tailoringreferences rule IDs that need to exist in that datastream, not in the latest upstream release. So we pin against what's actually deployable today.When a downstream bumps SSG, re-run
scripts/audit_story/extract_ssg_rule_ids.pyper theSSG-VERSIONS.mdrecipe —git diffshows what changed.Test plan
tests/audit_story/test_extract_ssg_rule_ids.pycover: extract IDs from synthetic XCCDF 1.2 XML; rules withoutid=skipped; non-XCCDF-1.2 namespaces ignored;write_rule_id_listsorts + newline-terminates; cross-distro diff emits intersections + only-sets; CLI writes per-distro files + diff; CLI errors on missing path; CLI rejects malformedlabel=path.*-rule-ids.txtlists ARE the regression check: a diff means SSG bumped.ruff check && ruff format --check && mypy && pytest -q— all four green (941 passed = 933 from end of v0.26.0 + 8 new).BE707B220C995478.rpm2cpio | cpioanddpkg-deb -x. Output matches the committed*-rule-ids.txtbyte-for-byte.What's next on #127
Phase 2 — per-rule mapping. For each of our 14 ubuntu2404 + 15 alma8 rules (plus an alma9 sweep), map the rule's operator-facing effect to the set of SSG rule IDs it should
<disable>/<set-value>inemit_tailoring. Cross-reference against the three rule-ID files this PR produces.Spec:
docs/superpowers/specs/2026-06-20-audit-story-phase-1-datastream-introspection-design.mdRelated: #127 (parent), #81 (ubuntu2404 ports — all
emit_tailoringdeferred), #121 (alma8 re-exports — inherit alma9'semit_tailoring).