You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #356 fixed a multi-line rewrite bug (#355) where cmd1\ncmd2 collapsed to glued tokens like head -1echo, producing stray files in the agent's cwd. The bug was reproducible only with a specific shape (multi-segment + filter match), not directly via `tokf rewrite`. A property-based test harness over a small bash grammar would have caught it pre-merge and would harden the rewrite engine against the broader class of "output looks plausible but has different argv structure" regressions.
Scope (Tier 1 — static invariants only)
Add a property-test suite using `proptest` (already a common dev-dep candidate). Generate inputs from a small hand-rolled bash grammar and assert structural invariants on the rewrite output. No execution, no sandboxing — that's a separate (Tier 2) follow-up.
Grammar
Hand-roll a generator that emits valid bash for a deliberately narrow surface:
Words: `[a-z][a-z0-9_-]*` and a few `-Nflag` shapes (`-1`, `-n 5`, `--lines=10`).
Simple commands: ` *` (1–4 args).
Quoted args: single-quoted `'…'` and double-quoted `"…"` interleaved with bare args.
Pipes: ` | (head|tail|grep) `.
Compound: ` (&&|\|\||;|\\n) ` chained 1–4 deep.
Bias toward command names that match a stdlib filter (`git status`, `cargo test`, `ls`, `docker ps`) so the rewrite path actually fires — otherwise the engine returns the input unchanged and most properties degenerate to identity.
Invariants to assert
For every generated input `x`:
`compound_segments` round-trips byte-for-byte. `parse(x).compound_segments()` reassembled via `seg + sep` must equal `x`. Generalizes the small explicit list in `bash_ast_tests.rs`.
Argv preservation. For every "real" command in the rewrite output (i.e. every `Command` AST node whose first word's basename is not `tokf`), its full argv must appear in the original input verbatim. This is the property Hook creates stray 1<cmd> files in cwd from misformed head -1<word> rewrite #355 violated: input had argv `["head", "-1"]`, rewrite had `["head", "-1echo"]` — different argv, different file written.
Shell-parseability. `rable::parse(rewrite(x)).is_some()` must hold. A rewrite producing unparseable output is always wrong.
Idempotence. `rewrite(rewrite(x)) == rewrite(x)`. The built-in `^tokf ` skip should make this trivially true; a regression means double-wrapping leaked through.
`crates/tokf-cli/tests/proptest_rewrite.rs` — new integration test module.
`proptest = "1"` as a dev-dep on `tokf-cli`.
200–500 cases per property by default; bump in CI if cheap enough.
Use a deterministic seed for reproducibility; print failing cases via proptest's built-in shrinking.
Out of scope (deliberately)
Heredocs. rable already round-trips them in our existing test, but generating them needs a more careful grammar (matching delimiters across lines) — defer.
Side-effect testing. Running the rewrite output in a sandbox to verify no files are created is Tier 2 — needs a PATH/HOME-isolated harness or container, separate issue.
Filter execution. We're testing the rewrite engine, not `tokf run` end-to-end.
Acceptance criteria
`cargo test -p tokf proptest_rewrite` runs 5 properties × 200 cases each in under 30s.
Context
PR #356 fixed a multi-line rewrite bug (#355) where
cmd1\ncmd2collapsed to glued tokens likehead -1echo, producing stray files in the agent's cwd. The bug was reproducible only with a specific shape (multi-segment + filter match), not directly via `tokf rewrite`. A property-based test harness over a small bash grammar would have caught it pre-merge and would harden the rewrite engine against the broader class of "output looks plausible but has different argv structure" regressions.Scope (Tier 1 — static invariants only)
Add a property-test suite using `proptest` (already a common dev-dep candidate). Generate inputs from a small hand-rolled bash grammar and assert structural invariants on the rewrite output. No execution, no sandboxing — that's a separate (Tier 2) follow-up.
Grammar
Hand-roll a generator that emits valid bash for a deliberately narrow surface:
Bias toward command names that match a stdlib filter (`git status`, `cargo test`, `ls`, `docker ps`) so the rewrite path actually fires — otherwise the engine returns the input unchanged and most properties degenerate to identity.
Invariants to assert
For every generated input `x`:
`compound_segments` round-trips byte-for-byte. `parse(x).compound_segments()` reassembled via `seg + sep` must equal `x`. Generalizes the small explicit list in `bash_ast_tests.rs`.
Argv preservation. For every "real" command in the rewrite output (i.e. every `Command` AST node whose first word's basename is not `tokf`), its full argv must appear in the original input verbatim. This is the property Hook creates stray 1<cmd> files in cwd from misformed head -1<word> rewrite #355 violated: input had argv `["head", "-1"]`, rewrite had `["head", "-1echo"]` — different argv, different file written.
Shell-parseability. `rable::parse(rewrite(x)).is_some()` must hold. A rewrite producing unparseable output is always wrong.
Idempotence. `rewrite(rewrite(x)) == rewrite(x)`. The built-in `^tokf ` skip should make this trivially true; a regression means double-wrapping leaked through.
Quote integrity. Every single/double-quoted substring in `x` appears as a byte-for-byte substring in `rewrite(x)`. Catches splices into opaque ssh-style payloads (cf. tokf output filter gets injected into ssh's remote command when output is large #338).
Suggested layout
Out of scope (deliberately)
Acceptance criteria
References