feat(audit): pack-conformance auditor + SARIF 2.1.0 emitter#51
Conversation
Review Summary by QodoAdd pack-conformance auditor with SARIF 2.1.0 output
WalkthroughsDescription• Add nboot audit command for pack-conformance drift detection - Renders pack in-memory and diffs against existing project - Reports missing/changed files as findings • Implement SARIF 2.1.0 emitter for GitHub Security tab integration - Hand-rolled with no new runtime dependencies - Declares two rules with stable fingerprints for deduplication • Single-source __version__ via importlib.metadata - Eliminates drift between __init__.py and pyproject.toml • Bump navi-sanitize 0.1.0 → 0.2.1 for enhanced sanitization Diagramflowchart LR
A["Pack + Spec"] -->|render in-memory| B["Rendered Files"]
C["Target Project"] -->|read from disk| D["Existing Files"]
B -->|compute_diffs| E["DiffResults"]
D -->|compute_diffs| E
E -->|convert| F["AuditFindings"]
F -->|format text| G["Human Report"]
F -->|format sarif| H["GitHub Security Tab"]
File Changes1. src/navi_bootstrap/__init__.py
|
Code Review by Qodo
1.
|
| manifest = sanitize_manifest(manifest) | ||
|
|
||
| # Stage 0 — resolve action SHAs (optional for audit; offline by default). | ||
| action_shas_config = manifest.get("action_shas", []) |
There was a problem hiding this comment.
🟡 MEDIUM: Non-robust dict key access for 'action_shas'
Confidence: 96%
The code uses manifest.get('action_shas', []) assuming the returned value will be a list, but downstream consumers (resolve_action_shas) may expect a specific type. If manifest ever changes structure or returns None, this may cause subtle TypeErrors.
Suggestion: Consider validating the type of 'action_shas_config' or providing a more robust default handling to avoid type issues.
— This works as long as manifests are well-formed. If not, enjoy the surprise TypeError later. 60
| return SarifResult( | ||
| rule_id=self.rule_id, | ||
| message=self.message, | ||
| artifact_uri=self.dest, |
There was a problem hiding this comment.
🔵 LOW: Lack of Logging or Audit Trail for Findings Generation
Confidence: 88%
The audit pipeline generates findings but doesn't include any logging, making debugging or post-mortem difficult if drift detection or SARIF output fails unexpectedly.
Suggestion: Include minimal logging or debug hooks during findings generation, especially for pipeline-stage failure or major decision points.
— One log line could save someone a late-night debug session.
| Each finding includes a stable `partialFingerprints.primaryLocationLineHash` | ||
| so GitHub's Security tab deduplicates across runs. | ||
|
|
||
| Upload it in CI: |
There was a problem hiding this comment.
🔵 LOW: Documentation omits SARIF rules extensibility caveat
Confidence: 90%
The documentation does not mention that only a fixed set of SARIF rules (missing/changed) is supported and that custom/errors for other drift types are not currently extensible. This could mislead consumers expecting more granular rule coverage.
Suggestion: Clarify in documentation that SARIF emission is intentionally limited to missing and changed files, and other kinds of drift will require schema extension.
— Not wrong, but future-you (or your users) might expect more than two SARIF rules.
| from importlib.metadata import version as _pkg_version | ||
|
|
||
| from navi_bootstrap.packs import get_ordered_packs | ||
| from navi_bootstrap.spec import build_spec_for_new |
There was a problem hiding this comment.
🔵 LOW: Governance: version exported in all and test coverage caveat
Confidence: 95%
Exporting version via all is sound, but the comment on the 'pragma: no cover' for the importlib.metadata fallback warrants a code comment indicating that this is intentional and not an accidental test skip.
Suggestion: Add a brief comment explaining that test coverage of the fallback is intentionally skipped because it only triggers in non-packaged dev environments.
— The pragma is fine, but a comment for future maintainers would prevent awkward codecov questions.
✅ Grippy Review — PASSScore: 86/100 | Findings: 4 Delta: 2 new Commit: 83d9376 |
There was a problem hiding this comment.
Pull request overview
Introduces nboot audit, a pack-conformance auditor that renders a pack in-memory, diffs it against a target repo, and emits results as human-readable text or SARIF 2.1.0 for GitHub Security tab ingestion.
Changes:
- Add an audit pipeline (
run_audit) with text + SARIF renderers and anAuditFindingmodel. - Add a hand-rolled SARIF 2.1.0 emitter with stable partial fingerprints for deduplication.
- Add
nboot auditCLI command and single-sourced__version__; bumpnavi-sanitizeto>=0.2.1.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Locks dependency update to navi-sanitize 0.2.1. |
| pyproject.toml | Bumps runtime dependency constraint for navi-sanitize. |
| src/navi_bootstrap/init.py | Switches __version__ to read from package metadata. |
| src/navi_bootstrap/cli.py | Adds audit command, SARIF/text output selection, and output-file support. |
| src/navi_bootstrap/audit.py | Implements in-memory render + diff audit pipeline and finding formatters. |
| src/navi_bootstrap/sarif.py | Implements minimal SARIF 2.1.0 report/result types and rule registry. |
| tests/test_audit.py | Adds unit + integration coverage for audit pipeline and CLI behavior. |
| tests/test_sarif.py | Adds SARIF emitter shape/fingerprint/rule validation tests. |
| docs/reference/audit.md | Documents nboot audit, flags, SARIF upload example, and exit codes. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 829e153e68
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
0ef25cb to
0bc5d4f
Compare
| help="Exit 0 even when drift is found (report-only mode for CI surveys)", | ||
| ) | ||
| def audit_cmd( | ||
| spec: Path, |
There was a problem hiding this comment.
🔵 LOW: Potential incomplete cleanup when writing output file
Confidence: 85%
When writing to the specified output file in audit_cmd, if an exception occurs between writing and echoing the summary message, it may leave a partially written file without warning the user.
Suggestion: Use a try/except block or atomic write pattern when writing output files to prevent incomplete files in case of errors.
— Not fatal, but partial output files frustrate CI folks.
Addresses five real bot findings on PR #51: 1. [Qodo SECURITY] compute_diffs() read boundary now confines paths to the target directory — mirrors engine.write_rendered's symlink / traversal / absolute-path defense. A crafted pack with dest '../escape.txt' or a dest that resolves through an out-of-target symlink now raises ValueError; run_audit() wraps it as AuditError('Path confinement error: ...'). The audit read boundary is now at least as strict as the write boundary. 2. [Copilot] audit_cmd now exits 2 on AuditError (pipeline failure) vs 1 on drift — lets CI distinguish 'audit ran, drift found' from 'audit failed to run'. SystemExit(2) is emitted to stderr and is never suppressed by --exit-zero. Docs table updated to match. 3. [Copilot] _GH_NOTICE reworded to be flag-agnostic — previous text said 'Re-run without --skip-resolve' which misleads audit_cmd users (audit uses --resolve, opposite polarity). Now reads 'Continuing with placeholder action SHAs. Install gh to enable full SHA resolution.' 4. [Copilot] test_conforming_target_has_no_findings dropped the vestigial 'uv run nboot apply ...' subprocess.run call whose result was ignored anyway. The in-process render+write+audit setup is now the only path — faster test, no PATH dependency, no dead code. 5. [Qodo / Codex] Pack-name fix was already in HEAD from 0bc5d4f — both bots re-scanned the pre-fix commit. No action needed. Skipped (low-value style nits): - cli.py __all__ add — file has no __all__, so not reachable - action_shas robustness — schema already tightens shape - logging/audit-trail — deferrable - atomic output-file writes — edge case - 'race condition if target mutates mid-audit' — documentation only Regression tests (4 new in test_audit.py): - test_symlink_escape_is_rejected - test_traversal_dest_is_rejected - test_confinement_violation_surfaces_as_audit_error - test_pipeline_error_exits_with_code_2 412 tests pass, mypy strict clean (14 files), ruff + format clean.
| @@ -371,6 +374,94 @@ def diff_cmd(spec: Path, pack: str, target: Path, skip_resolve: bool) -> None: | |||
| raise SystemExit(1) | |||
There was a problem hiding this comment.
🟡 MEDIUM: CLI: Output File Write Does Not Handle Exceptions or Permission Errors
Confidence: 87%
When a user provides --output and the CLI tries to write the report, it uses output.write_text() without error handling. If the output location is unwritable, the command will crash ungracefully, potentially giving a confusing error and returning an incorrect exit code.
Suggestion: Wrap the write operation in a try/except block. On failure, output a clear error message to stderr and exit with a nonzero code distinct from diff exit codes. Consider using exit code 2 (pipeline error) for consistency.
— Log file writes fail sometimes. Add error handling unless you enjoy triaging bug reports from CI failures.
| try: | ||
| diffs = compute_diffs(rendered, target, pack_name=canonical_pack_name) | ||
| except ValueError as e: | ||
| raise AuditError(f"Path confinement error: {e}") from e |
There was a problem hiding this comment.
🟡 MEDIUM: Network-Skipping Mode May Cause Audits to Miss SHA Drift
Confidence: 90%
The audit CLI skips SHA resolution by default for offline compliance. If a user expects to always detect action SHA drift, this may be misleading, as changes in remote actions will not be caught. The flag is documented, but risk of false negatives if a user expects up-to-date SHA checking by default.
Suggestion: Highlight this in documentation and ensure that in CI settings where drift matters, the --resolve flag is enforced. Consider a warning on stdout when running with skip_resolve.
— Offline by default is safe for reproducibility-but people will forget. Emit a warning in text output.
| return "OK — target conforms to the pack." | ||
|
|
||
| lines: list[str] = [ | ||
| f"Audit found {len(findings)} drift finding(s):", |
There was a problem hiding this comment.
🔵 LOW: SARIF Report Does Not Capture Extra Context or Build Metadata
Confidence: 90%
The SARIF emitter includes rule and result info but doesn't allow extra metadata (e.g., audit run time, host, triggered CLI flags). For large fleet runs, this makes trend analysis or debugging harder.
Suggestion: Add a metadata field (as allowed by SARIF spec) with CLI flags, host, audit time, or CI build link.
— SARIF supports rich metadata for a reason. Add a field for audit provenance when you can.
| "helpUri": "https://github.com/Project-Navi/navi-bootstrap/blob/main/docs/reference/audit.md", | ||
| "defaultConfiguration": {"level": "warning"}, | ||
| }, | ||
| { |
There was a problem hiding this comment.
🔵 LOW: SARIF Rule Level is Always 'Warning'
Confidence: 89%
Both SARIF rules 'pack-drift-missing' and 'pack-drift-changed' are set to 'warning' level and never surface as 'note' or 'error'. This limits triage in GitHub's Security tab, especially as some packs or policies may wish to escalate missing baseline files to error or demote non-critical drift to note.
Suggestion: Parameterize rule severity per-finding or provide a config for default rule level.
— Sarif levels are there for a reason. Expose as a future config knob.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/navi_bootstrap/diff.py:96
- The post-existence “symlink defense” check is ineffective because
file_pathis alreadytarget / rf.dest, sofile_path.resolve()and(target / rf.dest).resolve()are always identical. This makes the condition always false and leaves misleading dead code. Either remove this block (since the earlierresolve().relative_to(target_resolved)confinement check already rejects symlink escapes) or replace it with an explicit check that actually detects/handles symlinks as intended.
if is_new:
# New file: diff against empty
# For append mode, wrap in marker blocks like the engine would
if rf.mode == "append":
marker_start = _MARKER_START.format(pack_name=pack_name)
| ) | ||
| @click.option( | ||
| "--output", | ||
| type=click.Path(path_type=Path), |
There was a problem hiding this comment.
--output uses click.Path(path_type=Path) without restricting dir_okay/file_okay or validating writability, so passing a directory (or an unwritable path) will fail later with a less clear exception from Path.write_text(). Consider tightening the Click option to require a file path (dir_okay=False) and/or emitting a ClickException with a clearer message when the output path can’t be written.
| type=click.Path(path_type=Path), | |
| type=click.Path(path_type=Path, dir_okay=False, writable=True), |
| | 2 | Pipeline error (bad spec, missing pack, path-confinement violation, template render failure). Always emitted to stderr, never suppressed by `--exit-zero` | | ||
|
|
||
| Exit 1 vs 2 lets CI distinguish "the audit ran and reported drift" from "the | ||
| audit failed to run". Wire your pipeline so only exit 1 gates the merge. |
There was a problem hiding this comment.
The docs recommend wiring CI so “only exit 1 gates the merge”, but exit code 2 indicates the audit failed to run (bad spec/pack, confinement violation, render failure). Treating 2 as non-gating could allow merges when the audit is broken. Suggest rewording to say CI should fail on any non-zero exit, while using the 1 vs 2 distinction only for classifying drift vs pipeline failure in reporting/alerts.
| audit failed to run". Wire your pipeline so only exit 1 gates the merge. | |
| audit failed to run". In CI, fail the job on any non-zero exit code; use the | |
| 1 vs 2 distinction only for reporting, alerting, or classifying drift versus | |
| pipeline failure. |
Addresses two non-blocking Grippy advisories on PR #51: 1. [LOW __init__.py:9] Document why __version__ may fall back to '0.0.0+unknown' — happens only when the package isn't installed (editable-install dev sessions where the package was deleted, or a source checkout being imported via PYTHONPATH). Adds a comment explaining the rationale + flagging that downstream consumers (notably SARIF tool.driver.version) must tolerate the form. 2. [HIGH diff.py:72 / docs/reference/audit.md:94] Add an 'Operational notes' section to the audit reference doc covering: - TOCTOU window: path-confinement check is a snapshot - Mid-run symlink creation by other processes - Privilege guidance — run as low as can read the target Includes the realistic 'most CI usage doesn't hit this' framing. No code changes — pure doc polish. 413 tests still pass, mypy strict clean, ruff + format clean.
|
/review |
|
Codex (@codex) review |
Brings in material sanitizer upgrades directly relevant to the new nboot audit feature planned next: - Invisible-char coverage expanded from 411 to 492 (U+FE00-FE0F variation selectors, math invisible operators, deprecated format controls, braille blank, ogham space, Hangul fillers, Mongolian FVS, Arabic letter mark). - C0/C1 control-char stripping — terminal-injection / ANSI-escape defense. - Homoglyph map expanded 54 -> 66 pairs (Greek lowercase, Cyrillic extended, Latin dotless i). - NFD decompose before homoglyph scan — closes the NFKC composition-bypass vector. - Iterative walker (no recursion, no stack overflow); max_depth parameter prevents crash on pathological nesting. API surface unchanged: clean / walk / jinja2_escaper / path_escaper all import and behave the same. 381 tests pass (including full adversarial suite that depends on sanitizer behaviour).
Adds `nboot audit` — the pack-as-conformance-spec mode Codex review
carved out from the original two-mode design. Reuses the existing
plan + render_to_files + compute_diffs pipeline in memory; outputs
human text or GitHub-compatible SARIF 2.1.0.
New modules:
- src/navi_bootstrap/sarif.py — hand-rolled SARIF emitter (no new
runtime dep). Declares two rules (pack-drift-missing,
pack-drift-changed) with stable partialFingerprints for GitHub
cross-run deduplication.
- src/navi_bootstrap/audit.py — AuditFinding dataclass + run_audit
+ findings_to_sarif / findings_to_text helpers. AuditError wraps
all pipeline-stage failures (spec/manifest/resolve/plan/render).
CLI:
- `nboot audit --spec … --pack … --target … [--format text|sarif]
[--output path] [--resolve] [--exit-zero]`
- Offline by default — audits don't depend on GitHub API
reachability.
- Exits 1 on drift so CI fails; `--exit-zero` for report-only
surveys.
Single-sourced `__version__` via importlib.metadata so the SARIF
report tracks installed package version automatically (removes a
0.1.1-vs-0.1.2 drift between __init__.py and pyproject.toml).
Invisible-prompt-injection scan (Mode B from the original design) is
deferred — per Codex's grumpy review, clean() is string-in/string-out
with no offsets or vector classification. Filed upstream in
navi-sanitize#38 requesting a structured scan() API. Once that
lands, `nboot audit --scan-injections` becomes a thin addition.
Tests: 26 new (11 SARIF emitter + 15 audit pipeline + CLI). Fixtures
use the real scaffold pack end-to-end — renders, mutates one file,
confirms the drift is reported.
407 tests pass, mypy strict clean (14 files), ruff + ruff-format
clean, all 8 existing packs still validate against the schema.
Docs: docs/reference/audit.md covers usage, SARIF upload pattern,
exit codes, and relationship to `diff` / `apply`.
Codex stop-time review caught a real bug: `resolve_pack()` accepts
both a bundled pack name ('base') and a filesystem path
('/abs/path/to/base'), but the append-mode marker block is keyed by
the manifest's canonical pack name — which is what `apply()` writes.
`run_audit` was passing the raw CLI arg to `compute_diffs(pack_name=...)`.
When the user invoked `nboot audit --pack /abs/path/to/some-pack`,
the marker-replace regex in `_compute_append_content` looked for
`# --- nboot: /abs/path/to/some-pack ---` which never matches what
`apply` actually wrote, and every append-mode file then reported
false drift.
Fix: after `plan()`, pull `render_plan.pack_name` (manifest's 'name'
field) and pass THAT to both `compute_diffs` and the AuditFinding.
Matches what `apply_cmd` already does (cli.py:360).
Regression test `test_pack_filesystem_path_does_not_cause_false_append_drift`
renders `base` (which has append-mode entries) via `write_rendered`
using the canonical name, then audits the target passing the
filesystem path — asserts zero drift and asserts name/path forms
produce identical findings. The test skips if `base` ever loses its
append entries so the suite stays green without masking regression
coverage.
Addresses five real bot findings on PR #51: 1. [Qodo SECURITY] compute_diffs() read boundary now confines paths to the target directory — mirrors engine.write_rendered's symlink / traversal / absolute-path defense. A crafted pack with dest '../escape.txt' or a dest that resolves through an out-of-target symlink now raises ValueError; run_audit() wraps it as AuditError('Path confinement error: ...'). The audit read boundary is now at least as strict as the write boundary. 2. [Copilot] audit_cmd now exits 2 on AuditError (pipeline failure) vs 1 on drift — lets CI distinguish 'audit ran, drift found' from 'audit failed to run'. SystemExit(2) is emitted to stderr and is never suppressed by --exit-zero. Docs table updated to match. 3. [Copilot] _GH_NOTICE reworded to be flag-agnostic — previous text said 'Re-run without --skip-resolve' which misleads audit_cmd users (audit uses --resolve, opposite polarity). Now reads 'Continuing with placeholder action SHAs. Install gh to enable full SHA resolution.' 4. [Copilot] test_conforming_target_has_no_findings dropped the vestigial 'uv run nboot apply ...' subprocess.run call whose result was ignored anyway. The in-process render+write+audit setup is now the only path — faster test, no PATH dependency, no dead code. 5. [Qodo / Codex] Pack-name fix was already in HEAD from 0bc5d4f — both bots re-scanned the pre-fix commit. No action needed. Skipped (low-value style nits): - cli.py __all__ add — file has no __all__, so not reachable - action_shas robustness — schema already tightens shape - logging/audit-trail — deferrable - atomic output-file writes — edge case - 'race condition if target mutates mid-audit' — documentation only Regression tests (4 new in test_audit.py): - test_symlink_escape_is_rejected - test_traversal_dest_is_rejected - test_confinement_violation_surfaces_as_audit_error - test_pipeline_error_exits_with_code_2 412 tests pass, mypy strict clean (14 files), ruff + format clean.
Codex stop-time review caught two bugs I introduced in 5fe1668 when adding path-confinement to compute_diffs: 1. `nboot diff` now had a new uncaught ValueError path. compute_diffs raises ValueError on traversal / symlink / absolute-path escape, but diff_cmd in cli.py called it without a try/except. A crafted pack or a symlinked target would surface a raw Python traceback instead of a clean ClickException. run_audit already wrapped this correctly; diff_cmd did not. Fix: wrap the compute_diffs call in diff_cmd with try/except ValueError → ClickException('Path confinement error: ...'). 2. The 'post-existence symlink defense' block I added was a no-op. The comparison `file_path.resolve() != (target / rf.dest).resolve()` reduces to comparing .resolve() against itself — file_path IS target/rf.dest, so the two expressions always yield the same value. The real confinement work is done by the relative_to() check above it. Removed the dead block (and noted: the analogous pattern in engine.write_rendered is also a no-op, but that's pre-existing cleanup for another PR). Regression test in test_audit.py::TestAuditCli:: test_diff_cmd_friendly_error_on_confinement: - Builds a symlink escape inside target - Monkey-patches the engine's render to emit a dest hitting it - Runs `nboot diff` via CliRunner - Asserts exit 1, 'Path confinement error' in output, no 'Traceback' 413 tests pass (+1 regression), mypy strict clean, ruff + format clean. All 3 existing path-confinement tests still pass after removing the dead block — confirming that relative_to() alone does the work.
Addresses two non-blocking Grippy advisories on PR #51: 1. [LOW __init__.py:9] Document why __version__ may fall back to '0.0.0+unknown' — happens only when the package isn't installed (editable-install dev sessions where the package was deleted, or a source checkout being imported via PYTHONPATH). Adds a comment explaining the rationale + flagging that downstream consumers (notably SARIF tool.driver.version) must tolerate the form. 2. [HIGH diff.py:72 / docs/reference/audit.md:94] Add an 'Operational notes' section to the audit reference doc covering: - TOCTOU window: path-confinement check is a snapshot - Mid-run symlink creation by other processes - Privilege guidance — run as low as can read the target Includes the realistic 'most CI usage doesn't hit this' framing. No code changes — pure doc polish. 413 tests still pass, mypy strict clean, ruff + format clean.
Codex P1 review finding on PR #51: `run_audit` documented an exit-2 contract for all pipeline-stage failures and audit_cmd relied on it, but the except clause around `engine.plan()` only caught TemplateError and TypeError. plan() can raise ValueError (e.g., when a loop expands past _MAX_LOOP_ITEMS), and that escaped as a raw traceback instead of a clean AuditError. Fix: extend both plan() and render_to_files() except clauses to include ValueError. The compute_diffs() ValueError path already had its own wrapping (path confinement) and continues to surface as 'Path confinement error: ...' specifically. Regression test `test_plan_value_error_surfaces_as_audit_error` monkey-patches plan() to raise ValueError and asserts the AuditError wrap (and its 'Template planning error' message prefix) appears. 414 tests pass (+1 regression), mypy strict clean, ruff + format clean.
5 distinct findings across 3 reviewers, batched here: 1. [Copilot sarif.py:42, 58] SARIF rule fullDescription text was telling users to run `nboot apply --pack <X> --target <Y>` and `nboot diff --pack <X> --target <Y>` — both omit the required --spec flag, so the suggested commands would error out. Both texts now include `--spec nboot-spec.json` so the suggested remediation actually works. 2. [Copilot sarif.py:116] add_result() rebuilt `known_ids` set from self.rules on every call. Audits with thousands of findings would spend cumulative time recomputing the same set. Cached as `_known_rule_ids: frozenset[str] | None` populated lazily on first add_result(). Hidden from to_dict / repr / compare via field flags. 3. [Grippy HIGH diff.py:66, 72] TOCTOU + chained-symlink concern. Code had no inline comment about the threat-model caveat (only the docs added in 0122244 mentioned it). Added an explicit comment block in compute_diffs explaining that resolve() walks the full symlink chain (catches arbitrary-depth + relative links) but is a single-resolve snapshot vulnerable to mid-run mutation. 4. [Grippy HIGH 'retest with chained and relative links'] Two new regression tests prove the existing resolve()-based check catches: - test_chained_symlink_escape_is_rejected (target/a -> target/b -> outside/secret) - test_relative_symlink_escape_is_rejected (target/escape -> ../outside/secret, relative target) 5. [github-code-quality test_audit.py:272/337/432] Mixed import style: inside test bodies the module was re-imported as 'audit_mod' / 'cli_mod' even though the same names had top-level 'from … import …'. Hoisted module-level `import navi_bootstrap.audit as audit_mod` and `import navi_bootstrap.cli as cli_mod` so the file uses each module in exactly one import style. Skipped from this batch: - Grippy MEDIUM 'no line/region context in SARIF' — fundamental design choice (drift is per-file not per-line); revisit when Mode B (invisible-injection scan) lands and needs span data. - Grippy LOW '__version__ fallback' — already addressed in 0122244. 416 tests pass (+2 regression), mypy strict clean, ruff + format clean.
Latest sweep across Grippy + Copilot + github-code-quality findings on PR #51. Six addressed: 1. [Grippy MEDIUM audit.py:51] AuditFinding.kind was an unrestricted str — a typo'd value would raise KeyError when rule_id was accessed. Tightened to Literal['missing', 'changed'] + __post_init__ that rejects any other value with a clear ValueError. mypy enforces at type-check time, runtime check catches dynamic constructions. 2. [Copilot] AuditFinding.message remediation hints now include --spec nboot-spec.json (paralleling the SARIF fix in 9242ee3). 'nboot apply' and 'nboot diff' both require --spec; the previous suggestion was un-runnable. 3. [Grippy HIGH diff.py — third repeat] Reinforced the TOCTOU caveat: added a Sphinx '.. warning::' block to compute_diffs's docstring (so it surfaces in IDE help() and any future generated API docs) and a paragraph in audit_cmd's CLI --help text. The threat-model doc section was already in audit.md. 4. [Grippy LOW cli.py newline] Centralised the trailing-newline contract: SarifReport.to_json now appends '\n', findings_to_text already did. audit_cmd just forwards the rendered string verbatim (echo with nl=False; write_text without conditional fixup). 5. [github-code-quality test_audit.py imports] Dropped the 'import x as alias' lines entirely. Switched the three monkey-patch tests to pytest's monkeypatch fixture — auto-restoring, idiomatic, and clears the mixed-import lint. 6. [new regression coverage] - test_invalid_kind_rejected_at_construction (kind validation) - test_message_includes_spec_flag_for_apply_remediation (paths the runnable suggestion through both kinds) Skipped from the same review batch: - Grippy LOW 'enhance error messages with hints' — vague, deferrable - Grippy LOW 'no explicit no-drift message when --output' — edge case - Grippy MEDIUM 'exit codes praise' — advisory only 418 tests pass (+3), mypy strict clean, ruff + format clean.
4028987 to
787fe1d
Compare
| help="Path to the project spec JSON file", | ||
| ) | ||
| @click.option("--pack", required=True, type=str, help="Name of the conformance pack") | ||
| @click.option( |
There was a problem hiding this comment.
🔵 LOW: No audit log for SARIF/text output events
Confidence: 92%
The audit_cmd() function writes SARIF or text results to stdout or file, but does not emit a corresponding log entry. This could complicate postmortems or automated scripting, especially if CI misroutes output or discards it.
Suggestion: Emit a log or structured print when a report is written, especially in file-output mode. This ensures visibility in CI logs and tracking in fleet audits.
— A minor log here would save someone a headache in a messy CI setup. Up to you.
| @@ -24,7 +24,7 @@ dependencies = [ | |||
| "click>=8.1.0", | |||
There was a problem hiding this comment.
🔵 LOW: Dependency pinning does not specify upper bounds
Confidence: 88%
All dependencies use only minimum bound (>=) with no upper limit. This can introduce forward compatibility breakage if a future dependency release introduces incompatible changes.
Suggestion: Consider adding upper version bounds to all runtime dependencies in pyproject.toml to prevent accidental breaking changes from newly released major versions.
— No upper bounds means trusting every dependency forever. Sometimes that's fine; sometimes that's a fire drill.
…atch Two findings on rebase commit 787fe1d: 1. [Copilot audit.py:199] findings_to_text() returned the empty-case string 'OK — target conforms to the pack.' WITHOUT a trailing newline, breaking the trailing-newline contract that audit_cmd relies on (it forwards with click.echo(..., nl=False) and write_text() with no fixup). Empty path now returns 'OK — target conforms to the pack.\n'. Docstring updated to spell out the contract for both branches. 2. [Grippy MEDIUM audit.py:168] audit_cmd only caught AuditError; an unexpected exception class escaping run_audit (e.g. from a future engine refactor introducing a new Error type) would surface as a raw Python traceback. Added a defensive 'except Exception' branch that prints a clean error + 'file an issue' hint and still exits 2, so CI behaviour is consistent and users get actionable feedback. Skipped from this round of bot findings: - Grippy HIGH diff.py:76 TOCTOU — fourth repeat. Already exhaustively documented (code comment, .. warning:: docstring, audit.md threat model section, CLI --help text). The 'fail-fast on root' suggestion would break legitimate containerised CI; not applying. - Grippy LOW pyproject.toml upper bounds — Python ecosystem standard for libraries is no upper bound; pinning would create churn. - Grippy LOW 'log when report written' — already done for --output case ('Wrote N finding(s) to ...'); stdout case IS the log. - Grippy LOW __version__ fallback governance — addressed in 0122244. - Stale older-commit threads (cli.py:398 newline, audit.py:51 kind validation, test_audit.py imports) — already fixed; will be resolved via GraphQL. 418 tests still pass, mypy strict clean, ruff + format clean.
| # error instead of a Python traceback on crafted / symlinked targets. | ||
| try: | ||
| diffs = compute_diffs(rendered_files, target, pack_name=render_plan.pack_name) | ||
| except ValueError as e: |
There was a problem hiding this comment.
🔵 LOW: No explicit audit start log message in CLI
Confidence: 90%
The 'audit' CLI command runs with detailed exit codes and output, but does not print a 'beginning audit' or 'auditing target' message. This is minor but impacts visibility in verbose CI or multi-project batch runs, where correlation between command invocation and audit initiation/completion aids debugging.
Suggestion: Emit a clear CLI log message at the beginning of audit_cmd such as 'Auditing against pack using spec '.
— A log line here would save someone a bad day. Not blocking.
| is_flag=True, | ||
| default=False, | ||
| help="Resolve action SHAs via gh before planning (default: offline)", | ||
| ) |
There was a problem hiding this comment.
🔵 LOW: No audit summary printed when output written to file
Confidence: 85%
When the audit report is written to file (with --output), the CLI prints only the count of findings and output path. For workflows leveraging this in CI, a succinct summary of audit findings (e.g., warnings, missing files, etc.) could improve operator visibility without having to parse output files.
Suggestion: Print a one-line summary of finding counts and types when writing to file (e.g., after 'Wrote X finding(s) to ...', echo a brief summary).
— Consider adding a metric for this operation. It's minor but helpful.
Summary
Ships the first cut of
nboot audit— a fleet-level conformance auditor that reuses navi-bootstrap's render pipeline as a living specification for what a repo should look like, and reports drift as SARIF 2.1.0 for the GitHub Security tab.This PR delivers Mode A (pack-conformance audit) of the original two-mode design. Mode B (invisible-prompt-injection scan) was deferred per a grumpy Codex review:
navi_sanitize.clean()is string-in/string-out with no offsets or vector classification, so the promisedline:col + kindreporting is impossible without an upstream API change. Filed as navi-sanitize#38.What it does
--format sarifproduces a GitHub-compatible report that uploads cleanly viagithub/codeql-action/upload-sarifand lands in the Security tab alongside CodeQL / Semgrep alerts.Why this matters
Backstage Scaffolder, Cookiecutter, and Yeoman are create-only. Copier can update but needs the target to have been Copier-generated originally, and its merge UX is the documented top failure mode in 2026 platform-engineering surveys.
nboot auditinstead treats a pack as a living conformance spec: render it in memory, diff against any existing repo, report drift. No writes, no merges, no state. The same artifact generates new projects AND validates old ones — a materially different primitive.What's in the PR
New modules
src/navi_bootstrap/sarif.py— hand-rolled SARIF 2.1.0 emitter (no new runtime deps). Declares two rules with stablepartialFingerprintsfor GitHub dedup.src/navi_bootstrap/audit.py—AuditFindingdataclass,run_audit(),findings_to_{sarif,text}().CLI
nboot audit --spec … --pack … --target … [--format text|sarif] [--output path] [--resolve] [--exit-zero]--exit-zerofor report-only surveys.Single-sourced
__version__src/navi_bootstrap/__init__.pynow reads viaimportlib.metadata.version("navi-bootstrap"). Removes the 0.1.1-vs-0.1.2 drift between__init__.pyandpyproject.toml.Companion commit (already on branch)
496a706—navi-sanitize 0.1.0 → 0.2.1. Brings in 492-invisible-char coverage, C0/C1 control-char stripping, 66 homoglyph pairs, NFKC-bypass defense. Prepares for the deferred Mode B.Test plan
uv run pytest tests/— 407 passed (26 new: 11 SARIF, 15 audit pipeline + CLI)uv run mypy src/navi_bootstrap/— clean, 14 files, strict modeuv run ruff check+ruff format --check— cleanscaffoldinto tmp, mutate README, confirm audit reports the driftnboot audit --helpdisplays all flags correctlyFollow-up work
navi-sanitize 0.3.xwithscan() -> list[Finding]API (issue #38) unlocks Mode Bnboot audit --scan-injectionsas a thin addition oncescan()shipsnboot audit --org <name>) that multi-repo-audits viagh repo listCodex review status
Dispatched a grumpy review pre-build. It flagged two fatal gaps (sanitize API surface, SARIF schema complexity) and three HIGH warnings. The design was reshaped in response: Mode B cut, SARIF scope narrowed to missing/changed rules only (no hunk-region parsing), binary/j2 collision deferred to follow-up.