From 5b10e644ebaf8ca2d2f25550e193deb991193bc7 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 10:33:01 -0700
Subject: [PATCH 01/18] linq_fold: Phase 2A loop planner (where|select array +
 counter lanes)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replaces _fold's comprehension emitter with a planner that walks the chain
and emits a plain for-loop inside invoke($block, $src). Two terminator
lanes:

- array lane: [_where*][_select?] → loop + push_clone (identity) or
  emplace-of-bound-projection (workhorse choice made at macro time from
  the projection's _type.isWorkhorseType, no runtime static_if).
- counter lane: same intermediates + _count → counter loop with `n++`.

Chained _where|_where fuse into a single && predicate; chained
_select|_select fall through (needs ExprRef2Value-aware substitution,
deferred to Phase 2B). Anything outside the two lanes (_select|_where,
_sum, _min, _max, _first, _any, _all, _long_count, _order, _distinct,
_take, _skip, _zip, _reverse, ...) returns the raw chain unfolded —
no dispatch to _old_fold or fold_linq_default.

_old_fold and fold_linq_default are untouched; the comprehension contract
now lives solely on _old_fold (10 AST tests retargeted; 8 new AST tests +
6 behavioral tests cover the new loop emission).

Benchmark deltas (100K, INTERP, ns/op per element):
  count_aggregate (where|count):       5 → 5    parity
  chained_where (where|where|count):  17 → 8    2.1× faster
  select_count (select|count):        15 → 2    7.5× faster
  to_array_filter (where|select):     11 → 13   ~18% slower vs comprehension

Out-of-scope shapes regress to m3 (plain linq) — accepted as the
forcing function for Phase 2B (sum/min/max/first/any/all + chained
selects + take/skip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/sql/LINQ.md            |  30 ++-
 benchmarks/sql/select_count.das   |  73 ++++++
 daslib/linq_fold.das              | 143 +++++++++-
 tests/linq/test_linq_fold_ast.das | 419 ++++++++++++++++++++++--------
 4 files changed, 554 insertions(+), 111 deletions(-)
 create mode 100644 benchmarks/sql/select_count.das

diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md
index 5a956110fe..afca309bce 100644
--- a/benchmarks/sql/LINQ.md
+++ b/benchmarks/sql/LINQ.md
@@ -22,8 +22,9 @@ See `~/.claude/plans/keen-hopping-balloon.md` for the long-form plan.
 |---|---|---|
 | 0 | Rename `_fold` → `_old_fold` in linq_boost; extract `_fold` and `_old_fold` into new `daslib/linq_fold.das` module; `linq_boost` `require linq_fold public` for re-export | ✅ done |
 | 1 | Benchmark suite: 24 files under `benchmarks/sql/`, each 4-way (m1 `_sql` / m3 plain linq / m3f_old `_old_fold` / m3f `_fold`) at 100K rows; baseline numbers captured | ✅ done |
-| 2 | Splice planner + initial operators (`count`, `sum`, `to_array`, `where` with literal-lambda inlining); pattern tests for "spliced" vs "fell back" | ⏳ next |
-| 3+ | Per-operator splice PRs: `select`, terminal aggregates with early-exit (`first`, `any`, `all`, `min`, `max`, `average`), `take`/`skip`/chained `where`, then buffer-required ops (`distinct`, `sort`, `groupby`, `zip`, `join`) | ⏳ |
+| 2A | Loop planner — `_fold` emits explicit for-loops for `[where_*][select?]` (array lane) and `[where_*][select?] |> count` (counter lane); anything else falls through unfolded. No comprehensions, no dispatch back to `_old_fold`. | ✅ done |
+| 2B | Aggregate accumulators: `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`. Also `take`/`skip` in counter/array lane and chained-`_select|_select` fusion (needs `ExprRef2Value`-aware projection substitution) | ⏳ next |
+| 3+ | Buffer-required operators: `distinct`, `sort`, `reverse`, `groupby`, `zip`, `join`. Once we go array, we stay array | ⏳ |
 | 4 | Final coverage pass + docs; full 4-way comparison table refresh; parity-test sweep | ⏳ |
 
 ## Baselines (100K rows, INTERP mode)
@@ -69,7 +70,30 @@ Notation: `—` means the variant is not applicable for this benchmark (operator
 
 - **m1 vs m3** shows the SQLite-vs-in-memory-LINQ cost gap. SQL wins on `indexed_lookup` (b-tree) and on sorted-take patterns (engine partial-sort + LIMIT). Arrays win on raw aggregates where the SQL overhead exceeds the in-memory work.
 - **m3 vs m3f_old** shows what the *current* `_fold` macro already achieves. Big wins on the patterns it explicitly recognizes (`where+count` 6×, `where+select+to_array` ~4×, `chained_where+count` 2.6×). Negligible difference where it falls through to the default emitter.
-- **m3f vs m3f_old** is the target of Phase 2+. Currently identical by construction. Each PR in the splice series adds a splice path for one operator family and updates this table with the new ratio.
+- **m3f vs m3f_old** is the target of Phase 2+. Each PR in the splice series adds a path for one operator family and updates this table with the new ratio.
+
+## Phase 2A — Loop planner (2026-05-16)
+
+`_fold` now emits explicit for-loops for two narrow shape families instead of comprehensions. Anything outside scope falls through unfolded to raw linq (no dispatch to `_old_fold` or `fold_linq_default`).
+
+**In scope:** `[where_*][select?]` (array lane) and `[where_*][select?] |> count` (counter lane). Chained `_where|_where|...` fuses via `&&`; single `_select` composes; chained `_select|_select` falls through (needs ExprRef2Value-aware substitution, deferred to Phase 2B).
+
+**Out of scope (falls through):** `_select|_where`, `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`, `_order`, `_distinct`, `_take`, `_skip`, `_zip`, `_reverse`, etc.
+
+### Phase 2A deltas (100K rows, INTERP)
+
+| Benchmark | Shape | m3f_old | m3f (Phase 2A) | Delta |
+|---|---|---:|---:|---|
+| count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) |
+| chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) |
+| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane ignores projection; no array materialization) |
+| to_array_filter | `where → select → to_array` | 11 | 13 | ~18% slower (explicit loop vs comprehension lowering) |
+
+Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch.
+
+### Why `to_array_filter` regressed
+
+Comprehensions `[for (it in src) where p; expr]` lower through the compiler's dedicated `ExprArrayComprehension` path, which appears to compose more aggressively with array growth than an emitted-by-macro explicit loop with `static_if (is_workhorse) var val = expr; arr.emplace(val)`. The 18% gap is small relative to the 2-7× wins elsewhere; Phase 2B can profile and tune (likely pre-reserving the result array or switching to `push` for workhorse).
 
 ## Operator-coverage checklist (parity tests)
 
diff --git a/benchmarks/sql/select_count.das b/benchmarks/sql/select_count.das
new file mode 100644
index 0000000000..63c0e5cc9a
--- /dev/null
+++ b/benchmarks/sql/select_count.das
@@ -0,0 +1,73 @@
+options gen2
+options persistent_heap
+
+require _common public
+
+// _select |> count — projection followed by counter. The projection has no effect on count
+// semantics, but on the array path m3 materializes the projected array before counting.
+// Phase-2A `_fold` recognizes the counter lane and emits a bare-loop counter that ignores
+// the projection entirely (no allocation). `_old_fold` lacks a [select, count] pattern in
+// g_foldSeq so it falls to the default nested-pass form (pass_0 = select(...); count(pass_0))
+// — materializing the same way m3 does.
+
+def run_m1(b : B?; n : int) {
+    with_sqlite(":memory:") $(db) {
+        fixture_db(db, n)
+        b |> run("m1_sql/{n}", n) {
+            let c = _sql(db |> select_from(type<Car>) |> count())
+            if (c == 0) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+def run_m3(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3_array/{n}", n) {
+        let c = arr |> _select(_.price * 2) |> count()
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+def run_m3f_old(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3f_old_array_fold/{n}", n) {
+        let c = _old_fold(each(arr)._select(_.price * 2).count())
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+def run_m3f(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3f_array_fold/{n}", n) {
+        let c = _fold(each(arr)._select(_.price * 2).count())
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def select_count_m1(b : B?) {
+    run_m1(b, 100000)
+}
+
+[benchmark]
+def select_count_m3(b : B?) {
+    run_m3(b, 100000)
+}
+
+[benchmark]
+def select_count_m3f_old(b : B?) {
+    run_m3f_old(b, 100000)
+}
+
+[benchmark]
+def select_count_m3f(b : B?) {
+    run_m3f(b, 100000)
+}
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 8975aad43d..38264fcbb9 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -522,6 +522,140 @@ def private fold_linq_default(var expr : Expression?; recursiveMacroName : strin
     return res
 }
 
+[macro_function]
+def private plan_loop_or_count(var expr : Expression?) : Expression? {
+    // Phase-2A loop planner. Recognizes chains of shape `[where_*][select?]` (array lane)
+    // and `[where_*][select?] |> count` (counter lane). Fuses chained wheres into `&&` and
+    // chained selects via expression composition; emits one inline `invoke($block, $src)`
+    // with a plain for-loop. Returns null for anything else — caller falls through unfolded.
+    var (top, calls) = flatten_linq(expr)
+    if (empty(calls)) return null
+    let lastName = calls.back()._1.name
+    if (lastName != "count" && lastName != "where_" && lastName != "select") return null
+    let counterLane = lastName == "count"
+    let intermediateCount = counterLane ? length(calls) - 1 : length(calls)
+    let at = calls[0]._0.at
+    let srcName = "`source`{at.line}`{at.column}"
+    let itName  = "`it`{at.line}`{at.column}"
+    let accName = "`acc`{at.line}`{at.column}"
+    let valName = "`val`{at.line}`{at.column}"
+    var whereCond : Expression?
+    var projection : Expression?
+    var seenSelect = false
+    var elementType = clone_type(top._type.firstType)
+    for (i in 0 .. intermediateCount) {
+        var cll & = unsafe(calls[i])
+        let opName = cll._1.name
+        if (opName == "where_") {
+            if (seenSelect) return null    // where-after-select not in Phase 2A
+            var predicate = fold_linq_cond(cll._0.arguments[1], itName)
+            if (whereCond == null) {
+                whereCond = predicate
+            } else {
+                whereCond = qmacro($e(whereCond) && $e(predicate))
+            }
+        } elif (opName == "select") {
+            if (projection != null) return null   // chained _select|_select needs ExprRef2Value-aware
+                                                  // substitution; deferred to Phase 2B.
+            projection = fold_linq_cond(cll._0.arguments[1], itName)
+            elementType = clone_type(cll._0._type.firstType)
+            seenSelect = true
+        } else {
+            return null
+        }
+    }
+    // Build the per-element loop body.
+    var loopBody : Expression?
+    if (counterLane) {
+        if (whereCond != null) {
+            loopBody = qmacro_expr() {
+                if ($e(whereCond)) {
+                    $i(accName) ++
+                }
+            }
+        } else {
+            loopBody = qmacro_expr() {
+                $i(accName) ++
+            }
+        }
+    } else {
+        // array lane
+        if (projection != null) {
+            // Pick copy- vs move-init at macro time using the projection's resolved type.
+            // Workhorse values copy cheaply; non-workhorse must move out of the temporary
+            // returned by the projection. Mirrors the `_old_fold` `fold_select_where`
+            // shape for parity (intermediate `val` binding then emplace).
+            let workhorseProj = projection._type != null && projection._type.isWorkhorseType
+            var perElem : Expression?
+            if (workhorseProj) {
+                perElem = qmacro_block() {
+                    var $i(valName) = $e(projection)
+                    $i(accName) |> emplace($i(valName))
+                }
+            } else {
+                perElem = qmacro_block() {
+                    var $i(valName) <- $e(projection)
+                    $i(accName) |> emplace($i(valName))
+                }
+            }
+            if (whereCond != null) {
+                loopBody = qmacro_expr() {
+                    if ($e(whereCond)) {
+                        $e(perElem)
+                    }
+                }
+            } else {
+                loopBody = perElem
+            }
+        } elif (whereCond != null) {
+            loopBody = qmacro_expr() {
+                if ($e(whereCond)) {
+                    $i(accName) |> push_clone($i(itName))
+                }
+            }
+        } else {
+            // identity chain — nothing to fuse; let the caller fall through.
+            return null
+        }
+    }
+    var topExpr = clone_expression(top)
+    topExpr.genFlags.alwaysSafe = true
+    var res : Expression?
+    if (counterLane) {
+        res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
+            var $i(accName) = 0
+            for ($i(itName) in $i(srcName)) {
+                $e(loopBody)
+            }
+            return $i(accName)
+        }, $e(topExpr)))
+    } else {
+        let isIter = expr._type.isIterator
+        if (isIter) {
+            res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
+                var $i(accName) : array<$t(elementType)>
+                for ($i(itName) in $i(srcName)) {
+                    $e(loopBody)
+                }
+                return <- $i(accName).to_sequence_move()
+            }, $e(topExpr)))
+        } else {
+            res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
+                var $i(accName) : array<$t(elementType)>
+                for ($i(itName) in $i(srcName)) {
+                    $e(loopBody)
+                }
+                return <- $i(accName)
+            }, $e(topExpr)))
+        }
+    }
+    res.force_at(at)
+    res.force_generated(true)
+    let blk = (res as ExprInvoke).arguments[0] as ExprMakeBlock
+    (blk._block as ExprBlock).arguments[0].flags.can_shadow = true
+    return res
+}
+
 [call_macro(name="_fold")]
 class private LinqFold : AstCallMacro {
     //! implements _fold(expression) that folds LINQ expressions into optimized sequnences
@@ -534,12 +668,9 @@ class private LinqFold : AstCallMacro {
         //! Visits the _fold macro call and folds LINQ expressions into optimized sequences.
         macro_verify(call.arguments |> length == 1, prog, call.at, "expecting _fold(expression)")
         macro_verify(call.arguments[0]._type != null, prog, call.at, "expecting linq expression")
-        var res : Expression? = fold_linq_default(call.arguments[0], "_fold")
-        if (res == null) {
-            prog |> macro_error(call.at, "cannot fold LINQ expression\n{describe(call.arguments[0])}")
-            return res
-        }
-        return res
+        var res : Expression? = plan_loop_or_count(call.arguments[0])
+        if (res != null) return res
+        return clone_expression(call.arguments[0])
     }
 }
 
diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das
index 638eb7e226..d1a7cd6d17 100644
--- a/tests/linq/test_linq_fold_ast.das
+++ b/tests/linq/test_linq_fold_ast.das
@@ -13,6 +13,7 @@ require dastest/testing_boost public
 
 // ── Target functions (fold happens at macro time) ──────────────────────
 
+// `_fold` targets — used by behavioral *_fold_result tests + new loop-AST tests.
 [export, marker(no_coverage)]
 def target_where_fold() : array<int> {
     return <- [1, 2, 3, 4, 5]._where(_ > 3)._fold()
@@ -53,16 +54,59 @@ def target_zip3_predicate_fold() : array<int> {
     return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1), [100, 200, 300]._select(_ / 10), $(a, b, c : int) => a + b + c)._fold()
 }
 
-// ── Tests: fold_where — comprehension with where ───────────────────────
+// `_old_fold` targets — used by retargeted AST tests that document the frozen comprehension contract.
+[export, marker(no_coverage)]
+def target_where_old_fold() : array<int> {
+    return <- [1, 2, 3, 4, 5]._where(_ > 3)._old_fold()
+}
+
+[export, marker(no_coverage)]
+def target_select_old_fold() : array<int> {
+    return <- [1, 2, 3, 4, 5]._select(_ * 2)._old_fold()
+}
+
+[export, marker(no_coverage)]
+def target_where_select_old_fold() : array<int> {
+    return <- [1, 2, 3, 4, 5]._where(_ > 3)._select(_ * 2)._old_fold()
+}
+
+[export, marker(no_coverage)]
+def target_select_where_old_fold() : array<int> {
+    return <- [1, 2, 3, 4, 5]._select(_ * 2)._where(_ > 6)._old_fold()
+}
+
+[export, marker(no_coverage)]
+def target_reverse_where_old_fold() : array<int> {
+    return <- [1, 2, 3, 4, 5].to_sequence().reverse()._where(_ > 3).to_array()._old_fold()
+}
+
+[export, marker(no_coverage)]
+def target_zip_old_fold() : array<tuple<int; int>> {
+    return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1))._old_fold()
+}
+
+[export, marker(no_coverage)]
+def target_zip3_old_fold() : array<tuple<int; int; int>> {
+    return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1), [100, 200, 300]._select(_ / 10))._old_fold()
+}
+
+[export, marker(no_coverage)]
+def target_zip3_predicate_old_fold() : array<int> {
+    return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1), [100, 200, 300]._select(_ / 10), $(a, b, c : int) => a + b + c)._old_fold()
+}
+
+// ── Tests: _old_fold contract — comprehension emission (frozen baseline) ──
+// These tests retain the pre-rewrite AST shape that `_fold` used to emit.
+// `_fold` itself has diverged (Phase 2A loop planner); see test_*_fold_emits_loop
+// below for the current `_fold` shape contract. The pair documents the
+// comprehension-vs-loop split between the two macros.
 
 [test]
-def test_where_fold_produces_comprehension(t : T?) {
+def test_where_old_fold_produces_comprehension(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_where_fold)
-        t |> success(func != null, "should find target_where_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_where_old_fold)
+        t |> success(func != null, "should find target_where_old_fold")
+        if (func == null) return
         // fold_where output: invoke($(var source) .. var pass_0 <- COMP; return <- pass_0 .., src)
         var comp_expr : ExpressionPtr
         var source_expr : ExpressionPtr
@@ -73,27 +117,21 @@ def test_where_fold_produces_comprehension(t : T?) {
             }, $e(source_expr))
         }
         t |> success(r.matched, "should match fold invoke structure, error={int(r.error)}")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         // Verify the captured expression is a comprehension with where
         var resolved <- qm_resolve_comprehension(comp_expr)
         t |> success(resolved != null, "inner expression should be a comprehension")
-        if (resolved == null) {
-            return
-        }
+        if (resolved == null) return
         let ac = resolved as ExprArrayComprehension
         t |> success(ac.exprWhere != null, "comprehension should have where clause")
     }
 }
 
 [test]
-def test_where_fold_comprehension_pattern(t : T?) {
+def test_where_old_fold_comprehension_pattern(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_where_fold)
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_where_old_fold)
+        if (func == null) return
         // Match the full structure including comprehension pattern
         var where_cond : ExpressionPtr
         var source_expr : ExpressionPtr
@@ -107,16 +145,12 @@ def test_where_fold_comprehension_pattern(t : T?) {
     }
 }
 
-// ── Tests: fold_select — comprehension without where ───────────────────
-
 [test]
-def test_select_fold_produces_comprehension(t : T?) {
+def test_select_old_fold_produces_comprehension(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_select_fold)
-        t |> success(func != null, "should find target_select_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_old_fold)
+        t |> success(func != null, "should find target_select_old_fold")
+        if (func == null) return
         var comp_expr : ExpressionPtr
         var source_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
@@ -126,26 +160,20 @@ def test_select_fold_produces_comprehension(t : T?) {
             }, $e(source_expr))
         }
         t |> success(r.matched, "should match fold structure, error={int(r.error)}")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         var resolved <- qm_resolve_comprehension(comp_expr)
         t |> success(resolved != null, "inner should be a comprehension")
-        if (resolved == null) {
-            return
-        }
+        if (resolved == null) return
         let ac = resolved as ExprArrayComprehension
         t |> success(ac.exprWhere == null, "select-only comprehension should have no where clause")
     }
 }
 
 [test]
-def test_select_fold_comprehension_pattern(t : T?) {
+def test_select_old_fold_comprehension_pattern(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_select_fold)
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_old_fold)
+        if (func == null) return
         var select_expr : ExpressionPtr
         var source_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
@@ -155,25 +183,19 @@ def test_select_fold_comprehension_pattern(t : T?) {
             }, $e(source_expr))
         }
         t |> success(r.matched, "should match comprehension without where, error={int(r.error)}")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         // Verify the select expression is a multiplication: it * 2
         let r2 = qmatch(select_expr, it * 2)
         t |> success(r2.matched, "select expression should be it * 2")
     }
 }
 
-// ── Tests: fold_where_select — comprehension with both ─────────────────
-
 [test]
-def test_where_select_fold_comprehension(t : T?) {
+def test_where_select_old_fold_comprehension(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_where_select_fold)
-        t |> success(func != null, "should find target_where_select_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_where_select_old_fold)
+        t |> success(func != null, "should find target_where_select_old_fold")
+        if (func == null) return
         var select_expr : ExpressionPtr
         var where_cond : ExpressionPtr
         var source_expr : ExpressionPtr
@@ -184,9 +206,7 @@ def test_where_select_fold_comprehension(t : T?) {
             }, $e(source_expr))
         }
         t |> success(r.matched, "should match comprehension with where+select, error={int(r.error)}")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         // Verify select is multiplication and where is comparison
         let r_sel = qmatch(select_expr, it * 2)
         t |> success(r_sel.matched, "select should be it * 2")
@@ -195,16 +215,12 @@ def test_where_select_fold_comprehension(t : T?) {
     }
 }
 
-// ── Tests: fold_select_where — not a simple comprehension ──────────────
-
 [test]
-def test_select_where_fold_structure(t : T?) {
+def test_select_where_old_fold_structure(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_select_where_fold)
-        t |> success(func != null, "should find target_select_where_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_where_old_fold)
+        t |> success(func != null, "should find target_select_where_old_fold")
+        if (func == null) return
         // select_where fold produces an invoke with a lambda that has a for loop + if
         // It is NOT a simple comprehension - verify the fold still happened
         var inner_expr : ExpressionPtr
@@ -216,57 +232,43 @@ def test_select_where_fold_structure(t : T?) {
             }, $e(source_expr))
         }
         t |> success(r.matched, "should match fold invoke structure, error={int(r.error)}")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         // The inner expression should NOT be a comprehension (select_where uses a different strategy)
         var resolved <- qm_resolve_comprehension(inner_expr)
         t |> success(resolved == null, "select_where should not produce a simple comprehension")
     }
 }
 
-// ── Tests: multi-step fold (reverse + where) ───────────────────────────
-
 [test]
-def test_reverse_where_fold_structure(t : T?) {
+def test_reverse_where_old_fold_structure(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_reverse_where_fold)
-        t |> success(func != null, "should find target_reverse_where_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_reverse_where_old_fold)
+        t |> success(func != null, "should find target_reverse_where_old_fold")
+        if (func == null) return
         // Multi-step fold: reverse_to_array + where comprehension
         var body_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
             return <- $e(body_expr)
         }
         t |> success(r.matched, "should have a return expression")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         t |> success(body_expr is ExprInvoke, "fold should produce invoke wrapper")
     }
 }
 
-// ── Tests: zip fold with recursive subexpression folding ───────────────
-
 [test]
-def test_zip_fold_structure(t : T?) {
+def test_zip_old_fold_structure(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_zip_fold)
-        t |> success(func != null, "should find target_zip_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_zip_old_fold)
+        t |> success(func != null, "should find target_zip_old_fold")
+        if (func == null) return
         // zip fold recursively folds the second argument
         var body_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
             return <- $e(body_expr)
         }
         t |> success(r.matched, "should match return expression")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         t |> success(body_expr is ExprInvoke, "fold should produce invoke wrapper")
     }
 }
@@ -332,42 +334,34 @@ def test_zip_fold_result(t : T?) {
 // ── Tests: zip3 fold — all 3 subexpressions fold ──────────────────────
 
 [test]
-def test_zip3_fold_structure(t : T?) {
+def test_zip3_old_fold_structure(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_fold)
-        t |> success(func != null, "should find target_zip3_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_old_fold)
+        t |> success(func != null, "should find target_zip3_old_fold")
+        if (func == null) return
         // zip3 fold: all three sources should be folded into invoke wrappers
         var body_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
             return <- $e(body_expr)
         }
         t |> success(r.matched, "should match return expression")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         t |> success(body_expr is ExprInvoke, "zip3 fold should produce invoke wrapper")
     }
 }
 
 [test]
-def test_zip3_predicate_fold_structure(t : T?) {
+def test_zip3_predicate_old_fold_structure(t : T?) {
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_predicate_fold)
-        t |> success(func != null, "should find target_zip3_predicate_fold")
-        if (func == null) {
-            return
-        }
+        var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_predicate_old_fold)
+        t |> success(func != null, "should find target_zip3_predicate_old_fold")
+        if (func == null) return
         var body_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
             return <- $e(body_expr)
         }
         t |> success(r.matched, "should match return expression")
-        if (!r.matched) {
-            return
-        }
+        if (!r.matched) return
         t |> success(body_expr is ExprInvoke, "zip3 predicate fold should produce invoke wrapper")
     }
 }
@@ -403,3 +397,224 @@ def test_zip3_predicate_fold_result(t : T?) {
         t |> equal(result[2], 67)
     }
 }
+
+// ── Targets for `_fold` Phase-2A loop planner ──────────────────────────
+
+[export, marker(no_coverage)]
+def target_chained_where_fold() : array<int> {
+    return <- [1, 2, 3, 4, 5]._where(_ > 1)._where(_ < 5)._fold()
+}
+
+[export, marker(no_coverage)]
+def target_chained_select_fold() : array<int> {
+    return <- [1, 2, 3, 4, 5]._select(_ * 2)._select(_ + 1)._fold()
+}
+
+[export, marker(no_coverage)]
+def target_where_count_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._where(_ > 2).count())
+}
+
+[export, marker(no_coverage)]
+def target_chained_where_count_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._where(_ > 1)._where(_ < 5).count())
+}
+
+[export, marker(no_coverage)]
+def target_count_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5]).count())
+}
+
+[export, marker(no_coverage)]
+def target_select_count_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._select(_ * 2).count())
+}
+
+// ── Tests: `_fold` Phase-2A loop emission ──────────────────────────────
+// Phase-2A `_fold` emits explicit for-loops inside an `invoke($block, $src)` wrapper
+// (no `ExprArrayComprehension` nodes). Each test asserts the invoke wrapper exists
+// and the inner body is NOT a comprehension. Out-of-scope shapes fall through
+// unfolded — body is the raw chain, not an invoke.
+
+[test]
+def test_where_fold_emits_loop(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_where_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return <- $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper")
+        if (!(body_expr is ExprInvoke)) return
+        let inv = body_expr as ExprInvoke
+        var arg0 = clone_expression(inv.arguments[0])
+        var maybe_comp <- qm_resolve_comprehension(arg0)
+        t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension")
+    }
+}
+
+[test]
+def test_select_fold_emits_loop(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return <- $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper")
+        if (!(body_expr is ExprInvoke)) return
+        let inv = body_expr as ExprInvoke
+        var arg0 = clone_expression(inv.arguments[0])
+        var maybe_comp <- qm_resolve_comprehension(arg0)
+        t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension")
+    }
+}
+
+[test]
+def test_chained_where_fold_emits_loop(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_chained_where_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return <- $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        // Phase 2A fuses chained _where|_where into a single loop with && predicate
+        t |> success(body_expr is ExprInvoke, "chained where should fuse into single invoke loop")
+    }
+}
+
+[test]
+def test_chained_select_fold_falls_through(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_chained_select_fold)
+        if (func == null) return
+        // Chained _select|_select needs ExprRef2Value-aware projection substitution; the
+        // Phase-2A planner bails out and `_fold` returns the raw chain unfolded. Phase 2B
+        // will lift this restriction by adding a substitution-aware composition pass.
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return <- $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        t |> success(!(body_expr is ExprInvoke), "chained _select|_select should fall through (no invoke wrapper)")
+    }
+}
+
+[test]
+def test_where_count_fold_emits_counter(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_where_count_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        t |> success(body_expr is ExprInvoke, "_where|_count should fuse into counter invoke")
+    }
+}
+
+[test]
+def test_chained_where_count_fold_emits_counter(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_chained_where_count_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        t |> success(body_expr is ExprInvoke, "chained where + count should fuse into single counter invoke")
+    }
+}
+
+[test]
+def test_count_fold_emits_counter(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_count_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        t |> success(body_expr is ExprInvoke, "bare count should fuse into unconditional counter invoke")
+    }
+}
+
+[test]
+def test_select_where_fold_falls_through(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_where_fold)
+        if (func == null) return
+        // _select |> _where is out of Phase 2A scope (where-after-select) — chain falls
+        // through unfolded. The function body is the raw `where_(select(...), ...)` call,
+        // NOT a generated invoke wrapper.
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return <- $e(body_expr)
+        }
+        t |> success(r.matched, "should have return expression")
+        t |> success(!(body_expr is ExprInvoke), "select_where should fall through unfolded (no invoke wrapper)")
+    }
+}
+
+// ── Behavioral parity: results of new shapes ───────────────────────────
+
+[test]
+def test_chained_where_fold_result(t : T?) {
+    t |> run("chained where _fold produces correct values") @(t : T?) {
+        let result <- target_chained_where_fold()
+        t |> equal(length(result), 3)
+        t |> equal(result[0], 2)
+        t |> equal(result[1], 3)
+        t |> equal(result[2], 4)
+    }
+}
+
+[test]
+def test_chained_select_fold_result(t : T?) {
+    t |> run("chained select _fold produces correct values") @(t : T?) {
+        let result <- target_chained_select_fold()
+        t |> equal(length(result), 5)
+        // [1,2,3,4,5] * 2 = [2,4,6,8,10] + 1 = [3,5,7,9,11]
+        let expected = [3, 5, 7, 9, 11]
+        for (i, v in 0..5, result) {
+            t |> equal(expected[i], v)
+        }
+    }
+}
+
+[test]
+def test_where_count_fold_result(t : T?) {
+    t |> run("where _count _fold produces correct count") @(t : T?) {
+        t |> equal(target_where_count_fold(), 3)
+    }
+}
+
+[test]
+def test_chained_where_count_fold_result(t : T?) {
+    t |> run("chained where _count _fold produces correct count") @(t : T?) {
+        t |> equal(target_chained_where_count_fold(), 3)
+    }
+}
+
+[test]
+def test_count_fold_result(t : T?) {
+    t |> run("bare _count _fold produces source length") @(t : T?) {
+        t |> equal(target_count_fold(), 5)
+    }
+}
+
+[test]
+def test_select_count_fold_result(t : T?) {
+    t |> run("select _count _fold produces correct count (projection ignored by counter)") @(t : T?) {
+        t |> equal(target_select_count_fold(), 5)
+    }
+}

From 41d8ce129b0290b319e392e59bf6a6ba9f27df63 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 10:39:19 -0700
Subject: [PATCH 02/18] =?UTF-8?q?linq=5Ffold:=20peel=20each(<array>)=20+?=
 =?UTF-8?q?=20reserve=20+=20workhorse=20push=20=E2=80=94=20to=5Farray=5Ffi?=
 =?UTF-8?q?lter=20parity?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The first Phase-2A cut was ~18% slower than the _old_fold comprehension on
where|select|to_array. Four small fixes brought it to 11 ns/op parity:

1. Workhorse decision at macro time, not runtime. The projection's _type is
   resolved when the planner runs, so the macro reads
   projection._type.isWorkhorseType directly and emits exactly one branch
   instead of a runtime static_if.

2. Pre-reserve when the source has a known length. The planner emits
   acc |> reserve(length(src)) when top._type isn't an iterator — matches
   what ExprArrayComprehension lowering does internally.

3. Peel each(<array>) at macro time. each(arr) reports as iterator<T> so
   (2) wouldn't fire on benchmark sources like each(arr)._where(...). The
   planner now detects each(<expr>) where the inner has length and unwraps
   it — the emitted loop iterates the array directly.

4. Drop the intermediate var binding for workhorse projections. Workhorse
   values copy cheaply, so the planner emits acc |> push(projection)
   directly. Non-workhorse keeps the bind-then-emplace dance because <- is
   a statement, not an expression.

Phase 2A benchmark deltas (100K, INTERP, ns/op per element):
  count_aggregate (where|count):       5 → 5    parity
  chained_where (where|where|count):  17 → 8    2.1× faster
  select_count (select|count):        15 → 2    7.5× faster
  to_array_filter (where|select):     11 → 11   parity (was 13 pre-fix)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/sql/LINQ.md | 12 +++++++++---
 daslib/linq_fold.das   | 38 +++++++++++++++++++++++++++++++-------
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md
index afca309bce..599b3641db 100644
--- a/benchmarks/sql/LINQ.md
+++ b/benchmarks/sql/LINQ.md
@@ -87,13 +87,19 @@ Notation: `—` means the variant is not applicable for this benchmark (operator
 | count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) |
 | chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) |
 | select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane ignores projection; no array materialization) |
-| to_array_filter | `where → select → to_array` | 11 | 13 | ~18% slower (explicit loop vs comprehension lowering) |
+| to_array_filter | `where → select → to_array` | 11 | 11 | parity (after `each(<array>)` peel + reserve + workhorse `push`) |
 
 Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch.
 
-### Why `to_array_filter` regressed
+### Three small things that closed the to_array_filter gap
 
-Comprehensions `[for (it in src) where p; expr]` lower through the compiler's dedicated `ExprArrayComprehension` path, which appears to compose more aggressively with array growth than an emitted-by-macro explicit loop with `static_if (is_workhorse) var val = expr; arr.emplace(val)`. The 18% gap is small relative to the 2-7× wins elsewhere; Phase 2B can profile and tune (likely pre-reserving the result array or switching to `push` for workhorse).
+The first cut was 18% slower than the comprehension. Three independent fixes brought it to parity:
+
+1. **Workhorse decision at macro time, not runtime.** The first emission used `static_if (typeinfo is_workhorse(projection))` inside the qmacro so the compiler picked copy- vs move-init. The projection's `_type` is already resolved when the planner runs, so the macro now reads `projection._type.isWorkhorseType` directly and emits exactly one branch — less AST, no static_if to fold away.
+2. **Pre-reserve when the source has a known length.** ExprArrayComprehension lowering reserves the result array to the source's length to avoid growth reallocs; the explicit loop has to do the same explicitly. The planner emits `acc |> reserve(length(src))` when the source isn't an iterator.
+3. **Peel `each(<array>)` at macro time.** The benchmark source `each(arr)` reports as `iterator<T>`, so the reserve from (2) wouldn't fire. The planner now detects `each(<expr>)` where the inner expression has length and unwraps it — the emitted loop iterates the array directly. `for (it in arr)` and `for (it in each(arr))` yield the same element refs; the wrapper iterator is incidental in fold context.
+
+A fourth simplification dropped the intermediate `var val = projection; emplace(val)` for workhorse types — comprehension lowering pushes the projection expression directly, so the planner now emits `acc |> push(projection)` in that case (no temp binding). Non-workhorse projections still need the bind-then-emplace dance because `<-` is a statement, not an expression.
 
 ## Operator-coverage checklist (parity tests)
 
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 38264fcbb9..2ac058e140 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -530,6 +530,18 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     // with a plain for-loop. Returns null for anything else — caller falls through unfolded.
     var (top, calls) = flatten_linq(expr)
     if (empty(calls)) return null
+    // Peel `each(<array>)` so the emitted loop iterates the array directly and the
+    // array-lane reserve below has a length to use. Iteration semantics are unchanged —
+    // `for (it in each(arr))` and `for (it in arr)` yield the same element refs.
+    if (top is ExprCall) {
+        var topCall = top as ExprCall
+        if (topCall.func != null && topCall.func.name == "each"
+                && topCall.arguments |> length == 1
+                && topCall.arguments[0]._type != null
+                && !topCall.arguments[0]._type.isIterator) {
+            top = topCall.arguments[0]
+        }
+    }
     let lastName = calls.back()._1.name
     if (lastName != "count" && lastName != "where_" && lastName != "select") return null
     let counterLane = lastName == "count"
@@ -581,16 +593,15 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     } else {
         // array lane
         if (projection != null) {
-            // Pick copy- vs move-init at macro time using the projection's resolved type.
-            // Workhorse values copy cheaply; non-workhorse must move out of the temporary
-            // returned by the projection. Mirrors the `_old_fold` `fold_select_where`
-            // shape for parity (intermediate `val` binding then emplace).
+            // Workhorse projections copy cheaply — push the expression directly with no
+            // intermediate binding (matches ExprArrayComprehension lowering). Non-workhorse
+            // values must move out of the temporary returned by the projection, which `<-`
+            // can only do via an intermediate `var v` and then `emplace(v)`.
             let workhorseProj = projection._type != null && projection._type.isWorkhorseType
             var perElem : Expression?
             if (workhorseProj) {
-                perElem = qmacro_block() {
-                    var $i(valName) = $e(projection)
-                    $i(accName) |> emplace($i(valName))
+                perElem = qmacro_expr() {
+                    $i(accName) |> push($e(projection))
                 }
             } else {
                 perElem = qmacro_block() {
@@ -631,6 +642,10 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
         }, $e(topExpr)))
     } else {
         let isIter = expr._type.isIterator
+        // Pre-reserve the accumulator to the source's length when the source has a known
+        // length (array, table, range — anything that isn't an iterator). Avoids realloc
+        // walks during growth; matches what ExprArrayComprehension lowering does.
+        let sourceHasLength = top._type != null && !top._type.isIterator
         if (isIter) {
             res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
                 var $i(accName) : array<$t(elementType)>
@@ -639,6 +654,15 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
                 }
                 return <- $i(accName).to_sequence_move()
             }, $e(topExpr)))
+        } elif (sourceHasLength) {
+            res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
+                var $i(accName) : array<$t(elementType)>
+                $i(accName) |> reserve(length($i(srcName)))
+                for ($i(itName) in $i(srcName)) {
+                    $e(loopBody)
+                }
+                return <- $i(accName)
+            }, $e(topExpr)))
         } else {
             res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
                 var $i(accName) : array<$t(elementType)>

From d4586a103298978c7aa5ac9474eb62c39298aedd Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 10:49:35 -0700
Subject: [PATCH 03/18] linq_fold: fuse chained workhorse selects + drop
 emplace from emission
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two follow-up improvements on top of the Phase-2A loop planner:

1. Chained _select|_select|... now fuses (for workhorse projections).
   The planner emits intermediate `var v_N = projection_N` let-bindings
   inside the loop body; each next lambda's `_` is renamed straight to
   the prior binding's name via fold_linq_cond. No expression substitution
   = no ExprRef2Value-wrapper trap. Non-workhorse chained selects still
   fall through (needs `:=` clone semantics — Phase 2B).

2. Drop emplace from emission. emplace moves out of its argument and
   can corrupt the source when the projection returns a ref into it
   (e.g. `_._field`). The planner now emits `push` for workhorse and
   `push_clone` for non-workhorse — no intermediate `var v <- proj;
   emplace(v)` dance, which both simplifies the AST and is safer.

The chained-select AST test (previously asserting fall-through) now
asserts invoke emission. All 118 fold + ast tests pass; benchmark
deltas held vs the previous commit:
  count_aggregate:    5  parity
  chained_where:      8  2.1× faster
  select_count:       2  7.5× faster
  to_array_filter:   11  parity

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/sql/LINQ.md            |  4 +-
 daslib/linq_fold.das              | 72 ++++++++++++++++++++++++-------
 tests/linq/test_linq_fold_ast.das | 10 ++---
 3 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md
index 599b3641db..4dcee49b34 100644
--- a/benchmarks/sql/LINQ.md
+++ b/benchmarks/sql/LINQ.md
@@ -76,7 +76,7 @@ Notation: `—` means the variant is not applicable for this benchmark (operator
 
 `_fold` now emits explicit for-loops for two narrow shape families instead of comprehensions. Anything outside scope falls through unfolded to raw linq (no dispatch to `_old_fold` or `fold_linq_default`).
 
-**In scope:** `[where_*][select?]` (array lane) and `[where_*][select?] |> count` (counter lane). Chained `_where|_where|...` fuses via `&&`; single `_select` composes; chained `_select|_select` falls through (needs ExprRef2Value-aware substitution, deferred to Phase 2B).
+**In scope:** `[where_*][select*]` (array lane) and `[where_*][select*] |> count` (counter lane). Chained `_where|_where|...` fuses via `&&`. Chained `_select|_select|...` fuses via intermediate `var v_N = projection_N` let-bindings — each next lambda's `_` is renamed straight to the prior binding's name, no expression substitution needed (which would have hit the ExprRef2Value-wrapper problem documented in `skills/das_macros.md`). Chained selects currently require all projections to be workhorse; non-workhorse intermediates would need `:=` (clone) since `<-` (move) can corrupt source for lvalue projections — deferred to Phase 2B.
 
 **Out of scope (falls through):** `_select|_where`, `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`, `_order`, `_distinct`, `_take`, `_skip`, `_zip`, `_reverse`, etc.
 
@@ -99,7 +99,7 @@ The first cut was 18% slower than the comprehension. Three independent fixes bro
 2. **Pre-reserve when the source has a known length.** ExprArrayComprehension lowering reserves the result array to the source's length to avoid growth reallocs; the explicit loop has to do the same explicitly. The planner emits `acc |> reserve(length(src))` when the source isn't an iterator.
 3. **Peel `each(<array>)` at macro time.** The benchmark source `each(arr)` reports as `iterator<T>`, so the reserve from (2) wouldn't fire. The planner now detects `each(<expr>)` where the inner expression has length and unwraps it — the emitted loop iterates the array directly. `for (it in arr)` and `for (it in each(arr))` yield the same element refs; the wrapper iterator is incidental in fold context.
 
-A fourth simplification dropped the intermediate `var val = projection; emplace(val)` for workhorse types — comprehension lowering pushes the projection expression directly, so the planner now emits `acc |> push(projection)` in that case (no temp binding). Non-workhorse projections still need the bind-then-emplace dance because `<-` is a statement, not an expression.
+A fourth simplification dropped `emplace` from the emission entirely. emplace **moves** out of its argument and can corrupt the source when the projection returns a ref into it (e.g. `_._field`). The safe pattern is `push` for workhorse (cheap copy) and `push_clone` for non-workhorse (deep clone). No intermediate `var v = projection; emplace(v)` is needed in either case — the planner pushes the projection expression directly.
 
 ## Operator-coverage checklist (parity tests)
 
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 2ac058e140..246a76a824 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -550,11 +550,12 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     let srcName = "`source`{at.line}`{at.column}"
     let itName  = "`it`{at.line}`{at.column}"
     let accName = "`acc`{at.line}`{at.column}"
-    let valName = "`val`{at.line}`{at.column}"
     var whereCond : Expression?
     var projection : Expression?
+    var intermediateBinds : array<Expression?>
     var seenSelect = false
     var elementType = clone_type(top._type.firstType)
+    var lastBindName = itName
     for (i in 0 .. intermediateCount) {
         var cll & = unsafe(calls[i])
         let opName = cll._1.name
@@ -567,9 +568,23 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
                 whereCond = qmacro($e(whereCond) && $e(predicate))
             }
         } elif (opName == "select") {
-            if (projection != null) return null   // chained _select|_select needs ExprRef2Value-aware
-                                                  // substitution; deferred to Phase 2B.
-            projection = fold_linq_cond(cll._0.arguments[1], itName)
+            // Chained selects: bind the previous projection to a fresh local now so the next
+            // lambda's `_` can be renamed straight to that name — avoids the
+            // ExprRef2Value-substitution trap that plain `Template.replaceVariable` hits when
+            // splicing a typed expression into another typed expression. Phase 2A only
+            // chains workhorse projections; a non-workhorse intermediate binding would need
+            // a clone (`:=`) since `<-` (move) can corrupt source for lvalue projections
+            // like `_._field`. Deferred to Phase 2B.
+            if (projection != null) {
+                let prevWorkhorse = projection._type != null && projection._type.isWorkhorseType
+                if (!prevWorkhorse) return null   // chained non-workhorse selects — Phase 2B
+                let bindName = "`v`{at.line}`{at.column}`{length(intermediateBinds)}"
+                intermediateBinds |> push <| qmacro_expr() {
+                    var $i(bindName) = $e(projection)
+                }
+                lastBindName = bindName
+            }
+            projection = fold_linq_cond(cll._0.arguments[1], lastBindName)
             elementType = clone_type(cll._0._type.firstType)
             seenSelect = true
         } else {
@@ -593,20 +608,35 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     } else {
         // array lane
         if (projection != null) {
-            // Workhorse projections copy cheaply — push the expression directly with no
-            // intermediate binding (matches ExprArrayComprehension lowering). Non-workhorse
-            // values must move out of the temporary returned by the projection, which `<-`
-            // can only do via an intermediate `var v` and then `emplace(v)`.
+            // push for workhorse (cheap copy), push_clone for non-workhorse (deep clone,
+            // never mutates source). emplace would move out of the projection's value,
+            // which is unsafe when the projection returns a ref into the source.
+            // For chained selects, `intermediateBinds` carries N-1 prior bindings; splice
+            // them in before the push so each lambda body can resolve its renamed parameter
+            // to the correct binding name.
             let workhorseProj = projection._type != null && projection._type.isWorkhorseType
-            var perElem : Expression?
+            var pushStmt : Expression?
             if (workhorseProj) {
-                perElem = qmacro_expr() {
+                pushStmt = qmacro_expr() {
                     $i(accName) |> push($e(projection))
                 }
             } else {
+                pushStmt = qmacro_expr() {
+                    $i(accName) |> push_clone($e(projection))
+                }
+            }
+            var perElem : Expression?
+            if (empty(intermediateBinds)) {
+                perElem = pushStmt
+            } else {
+                var perElemStmts : array<Expression?>
+                perElemStmts |> reserve(length(intermediateBinds) + 1)
+                for (b in intermediateBinds) {
+                    perElemStmts |> push(b)
+                }
+                perElemStmts |> push(pushStmt)
                 perElem = qmacro_block() {
-                    var $i(valName) <- $e(projection)
-                    $i(accName) |> emplace($i(valName))
+                    $b(perElemStmts)
                 }
             }
             if (whereCond != null) {
@@ -619,9 +649,21 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
                 loopBody = perElem
             }
         } elif (whereCond != null) {
-            loopBody = qmacro_expr() {
-                if ($e(whereCond)) {
-                    $i(accName) |> push_clone($i(itName))
+            // Identity case (no projection): `it` aliases the source element. Workhorse
+            // types can `push` (cheap copy); non-workhorse needs `push_clone` to avoid
+            // mutating the source via a move.
+            let elemWorkhorse = elementType != null && elementType.isWorkhorseType
+            if (elemWorkhorse) {
+                loopBody = qmacro_expr() {
+                    if ($e(whereCond)) {
+                        $i(accName) |> push($i(itName))
+                    }
+                }
+            } else {
+                loopBody = qmacro_expr() {
+                    if ($e(whereCond)) {
+                        $i(accName) |> push_clone($i(itName))
+                    }
                 }
             }
         } else {
diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das
index d1a7cd6d17..cb067a3175 100644
--- a/tests/linq/test_linq_fold_ast.das
+++ b/tests/linq/test_linq_fold_ast.das
@@ -490,19 +490,19 @@ def test_chained_where_fold_emits_loop(t : T?) {
 }
 
 [test]
-def test_chained_select_fold_falls_through(t : T?) {
+def test_chained_select_fold_emits_loop(t : T?) {
     ast_gc_guard() {
         var func = find_module_function_via_rtti(compiling_module(), @@target_chained_select_fold)
         if (func == null) return
-        // Chained _select|_select needs ExprRef2Value-aware projection substitution; the
-        // Phase-2A planner bails out and `_fold` returns the raw chain unfolded. Phase 2B
-        // will lift this restriction by adding a substitution-aware composition pass.
+        // Chained _select|_select fuses via intermediate `var v_N = projection_N` bindings
+        // — the next lambda's `_` is renamed straight to the prior binding's name so no
+        // expression-substitution (and no ExprRef2Value-wrapping headaches) is needed.
         var body_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
             return <- $e(body_expr)
         }
         t |> success(r.matched, "should have return expression")
-        t |> success(!(body_expr is ExprInvoke), "chained _select|_select should fall through (no invoke wrapper)")
+        t |> success(body_expr is ExprInvoke, "chained _select|_select should fuse into single invoke loop")
     }
 }
 

From 6226a1e47a822bfd56a4bf50eb5254fb43529e59 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 11:34:46 -0700
Subject: [PATCH 04/18] linq_fold: counter lane evaluates projection per
 iteration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR #2689 review fixes (Copilot):

1. Counter lane drop-projection bug. `_fold(src._select(f).count())` was
   skipping the projection entirely, which diverges from raw LINQ
   `count(select(src, f))` when `f` has side effects. Counter lane now
   binds the final projection to a discardable local per matched element
   so user-visible side effects fire. The optimizer dead-code-eliminates
   the binding for pure projections (the common case — `_.x * 2`,
   `_.price` etc.), so the 7.5× select_count speedup is preserved.

2. Vacuous comprehension assertion in two AST tests. Pass `body_expr`
   (the full ExprInvoke wrapper) to `qm_resolve_comprehension` instead
   of `inv.arguments[0]` (the inner ExprMakeBlock, which can never match
   either branch of the resolver). The fixed form actually verifies the
   loop output is not the `fromComprehension=true` shape.

Adds 2 behavioral tests for the side-effects invariant (single
`select|count` and `where|select|count`). All Phase 2A benchmarks held:
count_aggregate 5/5, chained_where 8/17 (2.1×), select_count 2/15
(7.5×), to_array_filter 11/11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/sql/LINQ.md            |  2 +-
 daslib/linq_fold.das              | 32 +++++++++++++++++++++++++++----
 tests/linq/test_linq_fold.das     | 28 +++++++++++++++++++++++++++
 tests/linq/test_linq_fold_ast.das | 10 ++++------
 4 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md
index 4dcee49b34..8868addf04 100644
--- a/benchmarks/sql/LINQ.md
+++ b/benchmarks/sql/LINQ.md
@@ -86,7 +86,7 @@ Notation: `—` means the variant is not applicable for this benchmark (operator
 |---|---|---:|---:|---|
 | count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) |
 | chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) |
-| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane ignores projection; no array materialization) |
+| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane evaluates projection per iteration to preserve side effects; optimizer DCEs pure projections, no array materialization) |
 | to_array_filter | `where → select → to_array` | 11 | 11 | parity (after `each(<array>)` peel + reserve + workhorse `push`) |
 
 Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch.
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 246a76a824..66a0bbf2ad 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -594,16 +594,40 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     // Build the per-element loop body.
     var loopBody : Expression?
     if (counterLane) {
+        // Counter lane must evaluate the projection (and any chained intermediates) per
+        // matched element so user-visible side effects fire — `count(select(src, f))` in
+        // plain LINQ invokes f per element, and our fold must match. Bind the final
+        // projection to a discardable local; daslang macro output bypasses LINT002.
+        var sideEffectStmts : array<Expression?>
+        sideEffectStmts |> reserve(length(intermediateBinds) + 2)
+        for (b in intermediateBinds) {
+            sideEffectStmts |> push(b)
+        }
+        if (projection != null) {
+            let finalBindName = "`vfinal`{at.line}`{at.column}"
+            sideEffectStmts |> push <| qmacro_expr() {
+                var $i(finalBindName) = $e(projection)
+            }
+        }
+        sideEffectStmts |> push <| qmacro_expr() {
+            $i(accName) ++
+        }
+        var incBlock : Expression?
+        if (length(sideEffectStmts) == 1) {
+            incBlock = sideEffectStmts[0]
+        } else {
+            incBlock = qmacro_block() {
+                $b(sideEffectStmts)
+            }
+        }
         if (whereCond != null) {
             loopBody = qmacro_expr() {
                 if ($e(whereCond)) {
-                    $i(accName) ++
+                    $e(incBlock)
                 }
             }
         } else {
-            loopBody = qmacro_expr() {
-                $i(accName) ++
-            }
+            loopBody = incBlock
         }
     } else {
         // array lane
diff --git a/tests/linq/test_linq_fold.das b/tests/linq/test_linq_fold.das
index 180e96ceda..929037be93 100644
--- a/tests/linq/test_linq_fold.das
+++ b/tests/linq/test_linq_fold.das
@@ -754,3 +754,31 @@ def test_where_count_fold(t : T?) {
     }
 }
 
+var g_proj_hits = 0
+
+def projection_with_side_effect(x : int) : int {
+    g_proj_hits ++
+    return x * 2
+}
+
+[test]
+def test_counter_lane_projection_side_effects(t : T?) {
+    // Counter lane must evaluate the projection per matched element so user-visible
+    // side effects fire — matches raw `count(select(src, f))` semantics. Tests guard
+    // the projection-is-evaluated invariant after the Phase-2A planner fix.
+    t |> run("select|count fires projection once per element") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        g_proj_hits = 0
+        let c = _fold(each(arr)._select(projection_with_side_effect(_)).count())
+        t |> equal(5, c)
+        t |> equal(5, g_proj_hits)
+    }
+    t |> run("where|select|count fires projection only on matches") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+        g_proj_hits = 0
+        let c = _fold(each(arr)._where(_ > 5)._select(projection_with_side_effect(_)).count())
+        t |> equal(5, c)
+        t |> equal(5, g_proj_hits)
+    }
+}
+
diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das
index cb067a3175..8c54608ac4 100644
--- a/tests/linq/test_linq_fold_ast.das
+++ b/tests/linq/test_linq_fold_ast.das
@@ -448,9 +448,8 @@ def test_where_fold_emits_loop(t : T?) {
         t |> success(r.matched, "should have return expression")
         t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper")
         if (!(body_expr is ExprInvoke)) return
-        let inv = body_expr as ExprInvoke
-        var arg0 = clone_expression(inv.arguments[0])
-        var maybe_comp <- qm_resolve_comprehension(arg0)
+        var body_clone = clone_expression(body_expr)
+        var maybe_comp <- qm_resolve_comprehension(body_clone)
         t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension")
     }
 }
@@ -467,9 +466,8 @@ def test_select_fold_emits_loop(t : T?) {
         t |> success(r.matched, "should have return expression")
         t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper")
         if (!(body_expr is ExprInvoke)) return
-        let inv = body_expr as ExprInvoke
-        var arg0 = clone_expression(inv.arguments[0])
-        var maybe_comp <- qm_resolve_comprehension(arg0)
+        var body_clone = clone_expression(body_expr)
+        var maybe_comp <- qm_resolve_comprehension(body_clone)
         t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension")
     }
 }

From 6cda3c763dfcb02071fac3e45990fd17b32b4f4f Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 11:35:18 -0700
Subject: [PATCH 05/18] linq_fold: update select_count benchmark header comment

Reflect counter-lane semantics fix: projection is now evaluated per
matched element (side effects fire); optimizer DCEs pure projections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/sql/select_count.das | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/benchmarks/sql/select_count.das b/benchmarks/sql/select_count.das
index 63c0e5cc9a..84e2253422 100644
--- a/benchmarks/sql/select_count.das
+++ b/benchmarks/sql/select_count.das
@@ -3,12 +3,14 @@ options persistent_heap
 
 require _common public
 
-// _select |> count — projection followed by counter. The projection has no effect on count
-// semantics, but on the array path m3 materializes the projected array before counting.
-// Phase-2A `_fold` recognizes the counter lane and emits a bare-loop counter that ignores
-// the projection entirely (no allocation). `_old_fold` lacks a [select, count] pattern in
-// g_foldSeq so it falls to the default nested-pass form (pass_0 = select(...); count(pass_0))
-// — materializing the same way m3 does.
+// _select |> count — projection followed by counter. The final count value doesn't depend
+// on the projection, but plain LINQ `count(select(src, f))` still evaluates `f` per element
+// so user-visible side effects fire. Phase-2A `_fold` matches that: the counter lane binds
+// the final projection to a discardable local per matched element (side effects preserved)
+// and skips array materialization. The optimizer DCEs the binding for pure projections
+// like `_.price * 2`, leaving a bare-loop counter for the common case. `_old_fold` lacks a
+// [select, count] pattern in g_foldSeq so it falls to the default nested-pass form
+// (pass_0 = select(...); count(pass_0)) — materializing the same way m3 does.
 
 def run_m1(b : B?; n : int) {
     with_sqlite(":memory:") $(db) {

From 52a2d4089ed0be0756e9cebcc5b1439578770227 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 11:55:26 -0700
Subject: [PATCH 06/18] linq_fold: extract peel helper + tighten length check
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR #2689 review fixes (Copilot, round 2):

1. Peel-each + reserve guard. The inline `each(<x>)` peel + `sourceHasLength`
   gate previously accepted any non-iterator inner type, including
   `each(lambda)` (a lambda iterable per builtin.das:1351). That would peel
   to a lambda, then emit `reserve(length(lambda))` which has no overload
   and would fail to compile inside the macro output. Phase 2A never hit
   this in practice because the test suite only uses array sources, but
   it's a latent trap.

   Extracted `peel_each_length_source` and `type_has_length` helpers.
   Peel now triggers only when the inner type satisfies `isGoodArrayType
   || isGoodTableType || isString || isArray (T[N]) || isRange`. Same
   predicate gates the array-lane reserve emission, so the two stay in
   sync. Lambdas / custom user iterables fall through unfolded.

2. Reworded `test_select_count_fold_result` assertion message: the old
   "(projection ignored by counter)" wording was outdated after the
   counter-lane fix in 6226a1e47 — the planner now evaluates the
   projection per iteration (for side effects); only the value is
   discarded. Reads "(projection does not affect count value)" now.

select_count benchmark held at 2 ns/op (vs 15 for old fold), to_array_filter
held at 11/11 parity. AST + behavioral tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 daslib/linq_fold.das              | 41 +++++++++++++++++++++----------
 tests/linq/test_linq_fold_ast.das |  2 +-
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 66a0bbf2ad..d4eaa83e6b 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -522,6 +522,32 @@ def private fold_linq_default(var expr : Expression?; recursiveMacroName : strin
     return res
 }
 
+[macro_function]
+def private type_has_length(t : TypeDecl?) : bool {
+    // True for types where `length(<expr>)` is statically resolvable: arrays, tables,
+    // strings, fixed-arrays (T[N]), and the range family. Lambdas (`def each(lam :
+    // lambda<...>)`) and custom user iterables are excluded — they have no length()
+    // overload and would make a macro-emitted `reserve(length(src))` fail to compile.
+    if (t == null) return false
+    return (t.isGoodArrayType || t.isGoodTableType || t.isString
+        || t.isArray || t.isRange)
+}
+
+[macro_function]
+def private peel_each_length_source(var top : Expression?) : Expression? {
+    // If `top` is `each(<x>)` and `<x>` has a length-supporting type, return `<x>` so
+    // the emitted loop iterates the underlying container directly — lets the array-lane
+    // reserve fire and avoids the iterator wrapper. Iteration semantics are preserved
+    // (`for (it in each(arr))` and `for (it in arr)` yield the same element refs).
+    // Restricted to length-supporting types to keep `reserve(length(src))` valid.
+    if (!(top is ExprCall)) return top
+    var topCall = top as ExprCall
+    if (topCall.func == null || topCall.func.name != "each"
+            || topCall.arguments |> length != 1
+            || !type_has_length(topCall.arguments[0]._type)) return top
+    return clone_expression(topCall.arguments[0])
+}
+
 [macro_function]
 def private plan_loop_or_count(var expr : Expression?) : Expression? {
     // Phase-2A loop planner. Recognizes chains of shape `[where_*][select?]` (array lane)
@@ -530,18 +556,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     // with a plain for-loop. Returns null for anything else — caller falls through unfolded.
     var (top, calls) = flatten_linq(expr)
     if (empty(calls)) return null
-    // Peel `each(<array>)` so the emitted loop iterates the array directly and the
-    // array-lane reserve below has a length to use. Iteration semantics are unchanged —
-    // `for (it in each(arr))` and `for (it in arr)` yield the same element refs.
-    if (top is ExprCall) {
-        var topCall = top as ExprCall
-        if (topCall.func != null && topCall.func.name == "each"
-                && topCall.arguments |> length == 1
-                && topCall.arguments[0]._type != null
-                && !topCall.arguments[0]._type.isIterator) {
-            top = topCall.arguments[0]
-        }
-    }
+    top = peel_each_length_source(top)
     let lastName = calls.back()._1.name
     if (lastName != "count" && lastName != "where_" && lastName != "select") return null
     let counterLane = lastName == "count"
@@ -711,7 +726,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
         // Pre-reserve the accumulator to the source's length when the source has a known
         // length (array, table, range — anything that isn't an iterator). Avoids realloc
         // walks during growth; matches what ExprArrayComprehension lowering does.
-        let sourceHasLength = top._type != null && !top._type.isIterator
+        let sourceHasLength = type_has_length(top._type)
         if (isIter) {
             res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
                 var $i(accName) : array<$t(elementType)>
diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das
index 8c54608ac4..e47cf2e785 100644
--- a/tests/linq/test_linq_fold_ast.das
+++ b/tests/linq/test_linq_fold_ast.das
@@ -612,7 +612,7 @@ def test_count_fold_result(t : T?) {
 
 [test]
 def test_select_count_fold_result(t : T?) {
-    t |> run("select _count _fold produces correct count (projection ignored by counter)") @(t : T?) {
+    t |> run("select _count _fold produces correct count (projection does not affect count value)") @(t : T?) {
         t |> equal(target_select_count_fold(), 5)
     }
 }

From 3f0f8907e9b954a665ecd43da5696f0ec079e63f Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 12:30:04 -0700
Subject: [PATCH 07/18] tests/fio: regression coverage for ref_time_ticks() ns
 normalization
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR #2685 normalized ref_time_ticks() to nanoseconds across every
platform (Windows used to return raw QPC ticks at the underlying
counter's frequency — typically 10 MHz). The fix shipped without a
unit test that would have caught a units regression.

Add four tests under tests/fio/perf_time.das (sleep() lives in fio,
so this is the right neighborhood):

  - monotonic — 1000 successive reads never go backwards. Catches
    any signed/unsigned mixup or wrap-around bug in the ns conversion
    arithmetic.

  - sleep_roundtrip — sleep(100 ms) -> delta_ns must land in
    [80 ms, 500 ms]. The 80 ms lower bound is the load-bearing
    assertion: if Windows reverted to raw QPC ticks (10 MHz counter
    on the typical box -> a 100 ms wall-clock sleep would surface as
    1000000 "ticks" interpreted as ns, i.e. 1 ms), the test would
    trip. Wide upper bound covers CI runner scheduler jitter.

  - get_time_usec_agrees — the get_time_usec(t0) helper agrees with
    (ref_time_ticks() - t0) / 1000 within 5 ms. Two helpers reading
    the same underlying clock should not drift; if one ever ends up
    on a different code path, this notices.

  - units_are_nanoseconds — three back-to-back sleep(100 ms) deltas
    stay within 200 ms spread. If the unit accidentally changed
    mid-run (think: thread-local frequency cache going stale), the
    deltas would diverge wildly.

The test runs cleanly in both interpreter and AOT mode on Windows
(Win11 local): sleep(100 ms) -> 102-109 ms delta, get_time_usec
agrees to within microseconds. tests/aot/CMakeLists.txt:224 already
covers tests/fio/*.das via FILE(GLOB CONFIGURE_DEPENDS); cmake
reconfigure picks the new file up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 tests/fio/perf_time.das | 89 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)
 create mode 100644 tests/fio/perf_time.das

diff --git a/tests/fio/perf_time.das b/tests/fio/perf_time.das
new file mode 100644
index 0000000000..938d06e627
--- /dev/null
+++ b/tests/fio/perf_time.das
@@ -0,0 +1,89 @@
+options gen2
+require dastest/testing_boost public
+
+require daslib/fio
+require math
+
+//! Regression coverage for `ref_time_ticks()` and `get_time_usec(int64)`.
+//!
+//! The cross-platform normalization landed via PR #2685 (Windows now returns
+//! nanoseconds via QPC, matching POSIX clock_gettime — previously Windows
+//! returned raw QPC ticks). These tests pin the post-fix contract:
+//!   - `ref_time_ticks()` returns a monotonic non-decreasing nanosecond
+//!     timestamp on every platform.
+//!   - `get_time_usec(t0)` returns elapsed time in microseconds since `t0`
+//!     and stays consistent with the `ref_time_ticks` delta.
+//!   - sleep(N ms) round-trip lands close to N ms (with slack for OS
+//!     scheduler granularity — Windows is the worst at ~16 ms ticks).
+
+[test]
+def test_ref_time_ticks_monotonic(t : T?) {
+    //! Successive calls never go backwards. Catches QPC wrap-around
+    //! arithmetic bugs and any future signed/unsigned mixups in the
+    //! nanosecond conversion path.
+    var prev = ref_time_ticks()
+    for (i in range(1000)) {
+        let now = ref_time_ticks()
+        if (now < prev) {
+            t |> success(false, "ref_time_ticks went backwards at iter {i}: prev={prev} now={now}")
+            return
+        }
+        prev = now
+    }
+    t |> success(true, "1000 successive ref_time_ticks reads stayed monotonic")
+}
+
+[test]
+def test_ref_time_ticks_sleep_roundtrip(t : T?) {
+    //! sleep(100ms) → delta_ns should be in [80ms, 500ms]. Wide upper
+    //! bound covers CI runner jitter (GitHub-hosted Windows scheduling
+    //! can balloon to 200 ms even for a 100 ms sleep). Lower bound
+    //! catches a units bug: if Windows still reported raw QPC ticks
+    //! (10 MHz → 10× short for the same delta), delta_ns would land
+    //! around 10 ms and we'd trip.
+    let before = ref_time_ticks()
+    sleep(100u)
+    let after = ref_time_ticks()
+    let delta_ns = after - before
+    let delta_ms = delta_ns / 1_000_000l
+    t |> success(delta_ms >= 80l,
+        "sleep(100ms) elapsed only {delta_ms}ms — ref_time_ticks may not be in ns")
+    t |> success(delta_ms <= 500l,
+        "sleep(100ms) elapsed {delta_ms}ms — way over budget")
+}
+
+[test]
+def test_get_time_usec_agrees_with_ref_delta(t : T?) {
+    //! `get_time_usec(t0)` and `(ref_time_ticks() - t0) / 1000` should
+    //! agree to within a few µs (the two calls happen sequentially,
+    //! so a small drift is normal).
+    let t0 = ref_time_ticks()
+    sleep(50u)
+    let usec_via_helper = int64(get_time_usec(t0))
+    let usec_via_delta = (ref_time_ticks() - t0) / 1_000l
+    let drift_us = int(abs(usec_via_delta - usec_via_helper))
+    t |> success(drift_us <= 5_000,
+        "get_time_usec={usec_via_helper} vs delta/1000={usec_via_delta} drift={drift_us}us")
+}
+
+[test]
+def test_ref_time_ticks_units_are_nanoseconds(t : T?) {
+    //! Sanity check that two sleep(100ms) calls in a row produce roughly
+    //! the same delta. If one platform reports µs and another reports ns,
+    //! repeated calls would diverge wildly. Same-platform tick uniformity
+    //! is also expected.
+    var deltas : array<int64>
+    deltas |> reserve(3)
+    for (_i in range(3)) {
+        let a = ref_time_ticks()
+        sleep(100u)
+        let b = ref_time_ticks()
+        deltas |> push(b - a)
+    }
+    let lo = min(deltas[0], min(deltas[1], deltas[2]))
+    let hi = max(deltas[0], max(deltas[1], deltas[2]))
+    let spread_ms = int((hi - lo) / 1_000_000l)
+    t |> success(spread_ms <= 200,
+        "sleep(100ms) deltas span {spread_ms}ms — non-uniform tick rate")
+    delete deltas
+}

From c6a9d799c038b9134fc0f7c43c173d416a7c2c2c Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 15:23:24 -0700
Subject: [PATCH 08/18] macro_boost: add has_sideeffects + counter-lane elision
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds a reusable conservative `has_sideeffects(expr) : bool` predicate to
daslib/macro_boost. Returns true if an expression has — or might have —
side effects; false ONLY when provably pure. Intended for macro-time
elision of discardable evaluations.

Classification:
- Safe leaves: ExprVar, all ExprConst*, ExprAddr, ExprTypeInfo/Decl/Tag.
- Safe via recursion: ExprField/SafeField/Swizzle, ExprRef2Value/Ptr,
  ExprPtr2Ref, ExprAddr, ExprIs/IsVariant/AsVariant/SafeAsVariant,
  ExprCast, ExprNullCoalescing, ExprStringBuilder (string heap is
  no-op per compiler), ExprKeyExists (pure container read).
- ExprAt: safe when subexpr type is NOT isGoodTableType (tables auto-
  insert on missing key); ExprSafeAt always safe.
- ExprOp1/Op2/Op3: op-name allowlist for pure ops on workhorse types
  (bypasses func==null artifacts from partial folding); falls back to
  the function-flag check. `/` and `%` blacklisted (div-by-zero panic).
- ExprCall: allowlist `func.flags.builtIn && !knownSideEffects &&
  !unsafeOperation`, recurse args.
- Everything else: conservative true.

Counter-lane integration in daslib/linq_fold.das:

1. Discardable `var vfinal = projection` bind is now emitted only when
   `has_sideeffects(projection)` returns true. Pure projections like
   `_._field * 2` produce a bare-loop counter at macro time, no
   optimizer DCE required.

2. count→length shortcut: when the counter lane has no where-filter
   AND every projection in the chain is pure AND the source has a known
   length (array/table/string/range/fixed-array), the planner emits
   `length(src)` directly — the loop is elided entirely. select_count
   benchmark drops from 2 ns/op to 0 ns/op.

3. peel_each fix: `each` is a daslang generic, so the resolved
   `func.name` on a typed call is the mangled instance. The original
   peel only matched `func.name == "each"` and never fired for typed
   chains. Now also checks `func.fromGeneric.name == "each"`. Gated to
   array-shaped arguments (isGoodArrayType || isArray) so iterator-
   yielding sources like `each(range(10))` keep their wrapper.

4. Block-parameter typedecl branched on source shape: iterator sources
   keep `-const` (rvalue, must be consumable); array sources keep the
   source's `const&` modifier (peeled `let arr <-` is const-ref).

Tests:
- tests/macro_boost/test_has_sideeffects.das — 24 cases (17 safe + 5
  unsafe + 2 conservative-unsafe) wired via a `_test_has_sideeffects`
  probe call_macro that emits ExprConstBool at macro time.
- tests/linq/test_linq_fold_ast.das — 5 new tests:
  * test_pure_projection_uses_length_shortcut — invoke body returns
    `length(src)` directly, no for loop.
  * test_bare_count_uses_length_shortcut — same for `each(arr).count()`.
  * test_impure_projection_keeps_bind — for-body has bind + ++acc.
  * test_peel_each_on_array_source / _on_bare_count — assert peel fires.
  * test_peel_each_skips_non_array_source — `each(range(...))` keeps
    its wrapper (gate prevents iterator-source peeling).
  * test_target_each_range_count_runs — behavioral check for
    iterator-source chains.

Benchmarks (100K rows, INTERP, vs Phase 2A baseline):
- select_count: 2 → 0 ns/op (length shortcut elides loop entirely)
- chained_where: 8 → 6 ns/op (peel + const-ref param)
- count_aggregate: 5 → 4 ns/op (1ns from peel)
- to_array_filter: 11 → 10 ns/op (1ns from peel)

569/569 linq tests + 51/51 fold-AST + 24/24 has_sideeffects pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/sql/LINQ.md                       |   8 +-
 daslib/linq_fold.das                         |  96 ++++++++--
 daslib/macro_boost.das                       | 135 +++++++++++++
 tests/linq/test_linq_fold_ast.das            | 188 +++++++++++++++++++
 tests/macro_boost/_has_sideeffects_probe.das |  32 ++++
 tests/macro_boost/test_has_sideeffects.das   | 181 ++++++++++++++++++
 6 files changed, 616 insertions(+), 24 deletions(-)
 create mode 100644 tests/macro_boost/_has_sideeffects_probe.das
 create mode 100644 tests/macro_boost/test_has_sideeffects.das

diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md
index 8868addf04..54506d59c2 100644
--- a/benchmarks/sql/LINQ.md
+++ b/benchmarks/sql/LINQ.md
@@ -84,10 +84,10 @@ Notation: `—` means the variant is not applicable for this benchmark (operator
 
 | Benchmark | Shape | m3f_old | m3f (Phase 2A) | Delta |
 |---|---|---:|---:|---|
-| count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) |
-| chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) |
-| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane evaluates projection per iteration to preserve side effects; optimizer DCEs pure projections, no array materialization) |
-| to_array_filter | `where → select → to_array` | 11 | 11 | parity (after `each(<array>)` peel + reserve + workhorse `push`) |
+| count_aggregate | `where → count` | 5 | 4 | parity-ish (1ns improvement from `each(<array>)` peel) |
+| chained_where | `where → where → count` | 17 | 6 | **2.8× faster** (fuses chained wheres into single `&&` predicate; small gain from peel + const-ref param) |
+| select_count | `select → count` | 15 | 0 | **∞ faster** — when the projection is pure (`has_sideeffects == false`) and the source has length, the counter lane shortcuts to `length(src)` and elides the loop entirely. See [macro_boost::has_sideeffects](../../daslib/macro_boost.das) and `linq_fold.das:plan_loop_or_count` |
+| to_array_filter | `where → select → to_array` | 11 | 10 | parity (after `each(<array>)` peel + reserve + workhorse `push`) |
 
 Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch.
 
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index d4eaa83e6b..39ebbde3f3 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -534,18 +534,30 @@ def private type_has_length(t : TypeDecl?) : bool {
 }
 
 [macro_function]
-def private peel_each_length_source(var top : Expression?) : Expression? {
-    // If `top` is `each(<x>)` and `<x>` has a length-supporting type, return `<x>` so
-    // the emitted loop iterates the underlying container directly — lets the array-lane
-    // reserve fire and avoids the iterator wrapper. Iteration semantics are preserved
-    // (`for (it in each(arr))` and `for (it in arr)` yield the same element refs).
-    // Restricted to length-supporting types to keep `reserve(length(src))` valid.
+def private is_each_call(call : ExprCall?) : bool {
+    //! `each` in daslib/builtin.das is generic, so the resolved `func.name` on a typed
+    //! call is the mangled instance name (e.g. `builtin\`each\`30908...`). The generic's
+    //! original name lives in `func.fromGeneric.name`. Match either.
+    if (call == null || call.func == null) return false
+    return (call.func.name == "each"
+        || (call.func.fromGeneric != null && call.func.fromGeneric.name == "each"))
+}
+
+[macro_function]
+def private peel_each(var top : Expression?) : Expression? {
+    // Unwrap `each(<arr>)` to `<arr>` when `<arr>` is a true array (or fixed-size array).
+    // Iteration semantics are preserved: `for it in <arr>` implicitly re-wraps via the
+    // same `each` overload. We gate on array-ness because peeling an iterator-typed
+    // argument (e.g. `each(range(10))`, `each(generator())`) would put the iterator in
+    // place — the downstream length shortcut and reserve-by-length hints assume an
+    // indexable source. Only peel when we can prove that's true.
     if (!(top is ExprCall)) return top
     var topCall = top as ExprCall
-    if (topCall.func == null || topCall.func.name != "each"
-            || topCall.arguments |> length != 1
-            || !type_has_length(topCall.arguments[0]._type)) return top
-    return clone_expression(topCall.arguments[0])
+    if (!is_each_call(topCall) || topCall.arguments |> length != 1) return top
+    let argExpr = topCall.arguments[0]
+    if ((argExpr == null || argExpr._type == null)
+            || (!argExpr._type.isGoodArrayType && !argExpr._type.isArray)) return top
+    return clone_expression(argExpr)
 }
 
 [macro_function]
@@ -556,7 +568,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     // with a plain for-loop. Returns null for anything else — caller falls through unfolded.
     var (top, calls) = flatten_linq(expr)
     if (empty(calls)) return null
-    top = peel_each_length_source(top)
+    top = peel_each(top)
     let lastName = calls.back()._1.name
     if (lastName != "count" && lastName != "where_" && lastName != "select") return null
     let counterLane = lastName == "count"
@@ -569,6 +581,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     var projection : Expression?
     var intermediateBinds : array<Expression?>
     var seenSelect = false
+    var allProjectionsPure = true
     var elementType = clone_type(top._type.firstType)
     var lastBindName = itName
     for (i in 0 .. intermediateCount) {
@@ -593,6 +606,9 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
             if (projection != null) {
                 let prevWorkhorse = projection._type != null && projection._type.isWorkhorseType
                 if (!prevWorkhorse) return null   // chained non-workhorse selects — Phase 2B
+                if (has_sideeffects(projection)) {
+                    allProjectionsPure = false
+                }
                 let bindName = "`v`{at.line}`{at.column}`{length(intermediateBinds)}"
                 intermediateBinds |> push <| qmacro_expr() {
                     var $i(bindName) = $e(projection)
@@ -606,6 +622,26 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
             return null
         }
     }
+    if (projection != null && has_sideeffects(projection)) {
+        allProjectionsPure = false
+    }
+    // Counter-lane shortcut: when there's no filter and every projection in the chain is
+    // pure, the count is simply `length(source)`. Skip the loop entirely — no per-element
+    // increments, no per-element side-effect evaluation. Gated on `type_has_length` so we
+    // only emit `length(src)` when it's statically resolvable.
+    if (counterLane && whereCond == null && allProjectionsPure
+            && type_has_length(top._type)) {
+        var topExpr = clone_expression(top)
+        topExpr.genFlags.alwaysSafe = true
+        var res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) {
+            return length($i(srcName))
+        }, $e(topExpr)))
+        res.force_at(at)
+        res.force_generated(true)
+        let blk = (res as ExprInvoke).arguments[0] as ExprMakeBlock
+        (blk._block as ExprBlock).arguments[0].flags.can_shadow = true
+        return res
+    }
     // Build the per-element loop body.
     var loopBody : Expression?
     if (counterLane) {
@@ -618,7 +654,10 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
         for (b in intermediateBinds) {
             sideEffectStmts |> push(b)
         }
-        if (projection != null) {
+        // Bind the final projection only when it might have side effects. Pure projections
+        // (the common case — `_._field * 2`) can be elided entirely; no need to rely on
+        // the optimizer to DCE a dead store afterwards.
+        if (projection != null && has_sideeffects(projection)) {
             let finalBindName = "`vfinal`{at.line}`{at.column}"
             sideEffectStmts |> push <| qmacro_expr() {
                 var $i(finalBindName) = $e(projection)
@@ -713,14 +752,31 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     var topExpr = clone_expression(top)
     topExpr.genFlags.alwaysSafe = true
     var res : Expression?
+    // Pick the block-parameter typedecl modifier by source shape:
+    //   - iterator (rvalue, e.g. `each(range(10))`) — strip `-const` so the body can
+    //     consume the iterator. Without the strip, daslang's typer reports
+    //     "can't iterate over const iterator".
+    //   - container with length (array/table/string/range/fixed-array) — keep modifiers
+    //     so a `const&` source (e.g. `let arr <-`) matches the param exactly.
+    let topIsIter = top._type != null && top._type.isIterator
     if (counterLane) {
-        res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
-            var $i(accName) = 0
-            for ($i(itName) in $i(srcName)) {
-                $e(loopBody)
-            }
-            return $i(accName)
-        }, $e(topExpr)))
+        if (topIsIter) {
+            res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
+                var $i(accName) = 0
+                for ($i(itName) in $i(srcName)) {
+                    $e(loopBody)
+                }
+                return $i(accName)
+            }, $e(topExpr)))
+        } else {
+            res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) {
+                var $i(accName) = 0
+                for ($i(itName) in $i(srcName)) {
+                    $e(loopBody)
+                }
+                return $i(accName)
+            }, $e(topExpr)))
+        }
     } else {
         let isIter = expr._type.isIterator
         // Pre-reserve the accumulator to the source's length when the source has a known
@@ -736,7 +792,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
                 return <- $i(accName).to_sequence_move()
             }, $e(topExpr)))
         } elif (sourceHasLength) {
-            res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
+            res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) {
                 var $i(accName) : array<$t(elementType)>
                 $i(accName) |> reserve(length($i(srcName)))
                 for ($i(itName) in $i(srcName)) {
diff --git a/daslib/macro_boost.das b/daslib/macro_boost.das
index 02cb2923bf..1fa5452b6c 100644
--- a/daslib/macro_boost.das
+++ b/daslib/macro_boost.das
@@ -149,3 +149,138 @@ def public collect_labels(expr : ExpressionPtr) {
     return <- res
 }
 
+[macro_function]
+def public has_sideeffects(expr : Expression?) : bool {
+    //! Conservative side-effect detection. Returns true when the expression has — or
+    //! might have — side effects. Returns false ONLY when provably pure (no function
+    //! calls, no heap allocation, no container mutation).
+    //!
+    //! Intended for macro-time elision of discardable evaluations.
+    //! Callers treat false as a promise; true is the safe default — when in doubt, true.
+    // null / compiler-tagged-pure / variable reads / constant literals — leaf, safe.
+    if (expr == null || expr.flags.noSideEffects
+            || expr is ExprVar
+            || expr is ExprConstInt || expr is ExprConstInt8 || expr is ExprConstInt16
+            || expr is ExprConstInt64 || expr is ExprConstUInt || expr is ExprConstUInt8
+            || expr is ExprConstUInt16 || expr is ExprConstUInt64 || expr is ExprConstFloat
+            || expr is ExprConstDouble || expr is ExprConstBool || expr is ExprConstString
+            || expr is ExprConstPtr || expr is ExprConstRange || expr is ExprConstURange
+            || expr is ExprConstRange64 || expr is ExprConstURange64
+            || expr is ExprConstEnumeration || expr is ExprConstBitfield) return false
+    // Member access — recurse into operand.
+    if (expr is ExprField) return has_sideeffects((expr as ExprField).value)
+    if (expr is ExprSafeField) return has_sideeffects((expr as ExprSafeField).value)
+    if (expr is ExprSwizzle) return has_sideeffects((expr as ExprSwizzle).value)
+    // Pointer / reference artifacts.
+    if (expr is ExprRef2Value) return has_sideeffects((expr as ExprRef2Value).subexpr)
+    if (expr is ExprRef2Ptr) return has_sideeffects((expr as ExprRef2Ptr).subexpr)
+    if (expr is ExprPtr2Ref) return has_sideeffects((expr as ExprPtr2Ref).subexpr)
+    if (expr is ExprAddr) return false
+    // Type / variant checks.
+    if (expr is ExprIs) return has_sideeffects((expr as ExprIs).subexpr)
+    if (expr is ExprIsVariant) return has_sideeffects((expr as ExprIsVariant).value)
+    if (expr is ExprAsVariant) return has_sideeffects((expr as ExprAsVariant).value)
+    if (expr is ExprSafeAsVariant) return has_sideeffects((expr as ExprSafeAsVariant).value)
+    // Cast — recurse.
+    if (expr is ExprCast) return has_sideeffects((expr as ExprCast).subexpr)
+    // Compile-time meta.
+    if (expr is ExprTypeInfo || expr is ExprTypeDecl || expr is ExprTag) return false
+    // Subscripts.
+    if (expr is ExprAt) {
+        let at_e = expr as ExprAt
+        // tables auto-insert on missing key — unsafe; arrays/strings safe (read-only).
+        if (at_e.subexpr == null || at_e.subexpr._type == null
+                || at_e.subexpr._type.isGoodTableType) return true
+        return has_sideeffects(at_e.subexpr) || has_sideeffects(at_e.index)
+    }
+    if (expr is ExprSafeAt) {
+        let sat = expr as ExprSafeAt
+        return has_sideeffects(sat.subexpr) || has_sideeffects(sat.index)
+    }
+    // Null coalescing.
+    if (expr is ExprNullCoalescing) {
+        let nc = expr as ExprNullCoalescing
+        return has_sideeffects(nc.subexpr) || has_sideeffects(nc.defaultValue)
+    }
+    // String builder — string heap allocation is no-op by compiler; recurse into operands.
+    if (expr is ExprStringBuilder) {
+        let sb = expr as ExprStringBuilder
+        for (e in sb.elements) {
+            if (has_sideeffects(e)) return true
+        }
+        return false
+    }
+    // key_exists is a pure container read.
+    if (expr is ExprKeyExists) {
+        let ke = expr as ExprKeyExists
+        for (a in ke.arguments) {
+            if (has_sideeffects(a)) return true
+        }
+        return false
+    }
+    // Function-call-shaped expressions: ExprCall (regular call) and ExprOp1/ExprOp2/ExprOp3
+    // (operators, which also resolve to a function). All carry a resolved `func` field
+    // when typing completed. But the typer sometimes leaves `func` null on operator
+    // expressions (e.g. after partial constant folding), so we also keep an op-name
+    // allowlist for the common pure operators on workhorse types — that bypasses
+    // resolution-timing artifacts. `/` and `%` stay UNSAFE (div-by-zero panic; design
+    // decision). Compound-assignment ops are not in the allowlist (mutation).
+    //
+    // `is`/`as` on handled types is EXACT-rtti (see CLAUDE.md), so each shape needs its
+    // own branch — can't cast ExprOp2 to ExprCallFunc even though the C++ class inherits.
+    if (expr is ExprOp1) {
+        let e1 = expr as ExprOp1
+        if (!is_safe_op1(e1.op) && func_has_sideeffects(e1.func)) return true
+        return has_sideeffects(e1.subexpr)
+    }
+    if (expr is ExprOp2) {
+        let e2 = expr as ExprOp2
+        // Unsafe: division/modulo (div-by-zero panic, design decision); or op not in the
+        // safe allowlist AND the resolved func indicates side effects. The allowlist also
+        // bypasses func==null artifacts from partial folding.
+        if (e2.op == "/" || e2.op == "%"
+                || (!is_safe_op2(e2.op) && func_has_sideeffects(e2.func))) return true
+        return has_sideeffects(e2.left) || has_sideeffects(e2.right)
+    }
+    if (expr is ExprOp3) {
+        let e3 = expr as ExprOp3
+        // ExprOp3 is the only ternary `?:` in daslang — pure if operands pure.
+        return has_sideeffects(e3.subexpr) || has_sideeffects(e3.left) || has_sideeffects(e3.right)
+    }
+    if (expr is ExprCall) {
+        let ec = expr as ExprCall
+        if (func_has_sideeffects(ec.func)) return true
+        for (a in ec.arguments) {
+            if (has_sideeffects(a)) return true
+        }
+        return false
+    }
+    // Default: unknown → unsafe.
+    return true
+}
+
+[macro_function]
+def private func_has_sideeffects(f : Function?) : bool {
+    //! True when calling `f` may have side effects. Allowlists builtins
+    //! (`flags.builtIn`) without `knownSideEffects` or `unsafeOperation`.
+    return (f == null || !f.flags.builtIn
+        || f.flags.knownSideEffects || f.flags.unsafeOperation)
+}
+
+[macro_function]
+def private is_safe_op1(op : das_string) : bool {
+    //! Unary operators that are pure on workhorse types — no overflow trap, no mutation.
+    //! Excludes `++` / `--` (mutation).
+    return op == "-" || op == "!" || op == "~" || op == "+"
+}
+
+[macro_function]
+def private is_safe_op2(op : das_string) : bool {
+    //! Binary operators that are pure on workhorse types. Excludes `/`, `%` (div-by-zero
+    //! panic — design decision) and all compound-assignment ops (mutation).
+    return (op == "+" || op == "-" || op == "*"
+        || op == "==" || op == "!=" || op == "<" || op == "<=" || op == ">" || op == ">="
+        || op == "&" || op == "|" || op == "^" || op == "<<" || op == ">>"
+        || op == "&&" || op == "||")
+}
+
diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das
index e47cf2e785..78a70051c3 100644
--- a/tests/linq/test_linq_fold_ast.das
+++ b/tests/linq/test_linq_fold_ast.das
@@ -430,6 +430,26 @@ def target_select_count_fold() : int {
     return _fold(each([1, 2, 3, 4, 5])._select(_ * 2).count())
 }
 
+var g_select_count_proj_hits = 0
+
+def side_effect_select_proj(x : int) : int {
+    g_select_count_proj_hits ++
+    return x * 2
+}
+
+[export, marker(no_coverage)]
+def target_select_count_fold_impure() : int {
+    return _fold(each([1, 2, 3, 4, 5])._select(side_effect_select_proj(_)).count())
+}
+
+// `each(range(...))` — argument is a `range`, not an array. peel_each must NOT fire
+// here; we'd otherwise replace the iterator-yielding each call with the raw range
+// and downstream length-shortcut / reserve-by-length would silently misbehave.
+[export, marker(no_coverage)]
+def target_each_range_count() : int {
+    return _fold(each(range(10))._where(_ > 5).count())
+}
+
 // ── Tests: `_fold` Phase-2A loop emission ──────────────────────────────
 // Phase-2A `_fold` emits explicit for-loops inside an `invoke($block, $src)` wrapper
 // (no `ExprArrayComprehension` nodes). Each test asserts the invoke wrapper exists
@@ -616,3 +636,171 @@ def test_select_count_fold_result(t : T?) {
         t |> equal(target_select_count_fold(), 5)
     }
 }
+
+// ── Counter-lane projection elision (has_sideeffects integration) ──────
+// For pure counter chains (`_select(_ * 2).count()`, bare `.count()`, etc.) on
+// length-supporting sources, the planner emits a `length(source)` shortcut and
+// the for-loop is dropped entirely. For impure projections (function call w/
+// side effects), the per-element loop is preserved with the discardable bind.
+
+// Returns the number of ExprLet/ExprFor statements in the counter-lane invoke's
+// inner block. Pure shortcut: `[var src, return length(src)]` → 0 for-loops.
+// Impure loop: `[var src, var acc=0, for {...}, return acc]` → 1 for-loop.
+def count_inner_for_loops(body_expr : Expression?) : int {
+    if (!(body_expr is ExprInvoke)) return -1
+    let inv = body_expr as ExprInvoke
+    if (empty(inv.arguments) || !(inv.arguments[0] is ExprMakeBlock)) return -1
+    let mb = inv.arguments[0] as ExprMakeBlock
+    let outer = mb._block as ExprBlock
+    if (outer == null) return -1
+    var n = 0
+    for (stmt in outer.list) {
+        if (stmt is ExprFor) {
+            n ++
+        }
+    }
+    return n
+}
+
+// Returns the number of stmts in the for-body, or -1 if no for loop exists.
+def count_for_body_stmts(body_expr : Expression?) : int {
+    if (!(body_expr is ExprInvoke)) return -1
+    let inv = body_expr as ExprInvoke
+    if (empty(inv.arguments) || !(inv.arguments[0] is ExprMakeBlock)) return -1
+    let mb = inv.arguments[0] as ExprMakeBlock
+    let outer = mb._block as ExprBlock
+    if (outer == null) return -1
+    for (stmt in outer.list) {
+        if (stmt is ExprFor) {
+            let fe = stmt as ExprFor
+            let fbody = fe.body as ExprBlock
+            if (fbody == null) return -1
+            return length(fbody.list)
+        }
+    }
+    return -1
+}
+
+[test]
+def test_pure_projection_uses_length_shortcut(t : T?) {
+    // `_select(_ * 2).count()` on a length-supporting source should collapse to
+    // `length(source)` — no for-loop emitted at all.
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_count_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        let n = count_inner_for_loops(body_expr)
+        t |> equal(n, 0, "pure select-count must emit length() shortcut (no for loop)")
+    }
+}
+
+[test]
+def test_bare_count_uses_length_shortcut(t : T?) {
+    // Bare `.count()` on an array source should also use the length shortcut.
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_count_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        let n = count_inner_for_loops(body_expr)
+        t |> equal(n, 0, "bare count on length-supporting source must use length() shortcut")
+    }
+}
+
+[test]
+def test_impure_projection_keeps_bind(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_count_fold_impure)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected counter-lane invoke wrapper")
+        let n = count_for_body_stmts(body_expr)
+        t |> equal(n, 2, "impure projection should preserve vfinal bind (for-body has bind + ++acc)")
+    }
+}
+
+// ── peel_each invariant: each(<array>) must always be peeled ──────────
+// The planner's `peel_each` helper unwraps `each(x)` regardless of x's type so
+// the emitted block sees the underlying container directly. Without this, the
+// length() shortcut would never fire (each returns an iterator, which has no
+// length) and array-lane reserve would emit against the iterator wrapper.
+
+// Returns the second arg of the invoke (the source expression passed in). If
+// it's still an ExprCall to `each`, peel didn't run.
+def invoke_source_is_each_wrapped(body_expr : Expression?) : bool {
+    if (!(body_expr is ExprInvoke)) return false
+    let inv = body_expr as ExprInvoke
+    if (length(inv.arguments) < 2 || !(inv.arguments[1] is ExprCall)) return false
+    let src_call = inv.arguments[1] as ExprCall
+    if (src_call.func == null) return false
+    return (src_call.func.name == "each"
+        || (src_call.func.fromGeneric != null && src_call.func.fromGeneric.name == "each"))
+}
+
+[test]
+def test_peel_each_on_array_source(t : T?) {
+    // Sanity: target_select_count_fold uses `each([1,2,3,4,5])`. After peel, the
+    // invoke wrapper must NOT receive an each-wrapped source.
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_select_count_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> success(!invoke_source_is_each_wrapped(body_expr),
+            "peel_each must unwrap each(array) at macro time")
+    }
+}
+
+[test]
+def test_peel_each_on_bare_count(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_count_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> success(!invoke_source_is_each_wrapped(body_expr),
+            "peel_each must unwrap each(array) at macro time")
+    }
+}
+
+// Negative case: `each(range(...))` argument is an iterator-yielding range, not an
+// array. peel_each must NOT fire — peeling would drop the each call and put the raw
+// range in source position; the downstream length-shortcut and reserve hints would
+// then misbehave on a non-indexable source.
+[test]
+def test_peel_each_skips_non_array_source(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_each_range_count)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> success(invoke_source_is_each_wrapped(body_expr),
+            "peel_each must keep each(range) wrapper — only arrays may be peeled")
+    }
+}
+
+[test]
+def test_target_each_range_count_runs(t : T?) {
+    // Behavioral: ensure the iterator-source chain still compiles and produces the
+    // expected count. range(10) → [0,1,2,3,4,5,6,7,8,9]; filter > 5 → 4 elements.
+    t |> equal(target_each_range_count(), 4)
+}
diff --git a/tests/macro_boost/_has_sideeffects_probe.das b/tests/macro_boost/_has_sideeffects_probe.das
new file mode 100644
index 0000000000..4b7dad96b6
--- /dev/null
+++ b/tests/macro_boost/_has_sideeffects_probe.das
@@ -0,0 +1,32 @@
+// Helper module for tests/macro_boost/test_has_sideeffects.das.
+//
+// Provides ``_test_has_sideeffects(expr)`` — a [call_macro] that invokes
+// ``macro_boost::has_sideeffects`` on its argument at macro time and replaces
+// the call with an ``ExprConstBool`` of the result. Lets test functions
+// assert side-effect classification by writing ``t |> equal(_test_has_sideeffects(...), false)``.
+//
+// Lives in a separate ``.das`` with a leading underscore so dastest's file
+// discovery skips it as a test.
+options gen2
+options indenting = 4
+
+module _has_sideeffects_probe public
+
+require daslib/ast public
+require daslib/ast_boost
+require daslib/macro_boost public
+
+[call_macro(name = "_test_has_sideeffects")]
+class private TestHasSideeffects : AstCallMacro {
+    def override visit(prog : ProgramPtr; mod : Module?; var call : ExprCallMacro?) : Expression? {
+        if (call.arguments |> length != 1) {
+            macro_error(prog, call.at, "expecting _test_has_sideeffects(expression)")
+            return null
+        }
+        let b = has_sideeffects(call.arguments[0])
+        var res : Expression? = new ExprConstBool(at = call.at, value = b)
+        res.force_at(call.at)
+        res.force_generated(true)
+        return res
+    }
+}
diff --git a/tests/macro_boost/test_has_sideeffects.das b/tests/macro_boost/test_has_sideeffects.das
new file mode 100644
index 0000000000..7651eeb91f
--- /dev/null
+++ b/tests/macro_boost/test_has_sideeffects.das
@@ -0,0 +1,181 @@
+options gen2
+require dastest/testing_boost public
+require _has_sideeffects_probe public
+
+// ── Side-effect-bearing helpers (used as test sources) ────────────────────
+
+var g_proj_hits = 0
+
+def side_effect_fn(_x : int) : int {
+    g_proj_hits ++
+    return _x * 2
+}
+
+struct Foo {
+    a : int
+    b : int
+}
+
+// ── SAFE cases — has_sideeffects must return false ───────────────────────
+//
+// Note: `let _x = 5` lets the compiler fold expressions using `_x` into constants
+// before the call_macro runs (so the macro sees ExprConstInt, not the original
+// ExprOp2). To exercise the operator paths explicitly, tests below use `var`.
+
+[test]
+def test_const_int(t : T?) {
+    t |> equal(_test_has_sideeffects(42), false)
+}
+
+[test]
+def test_const_string(t : T?) {
+    t |> equal(_test_has_sideeffects("hello"), false)
+}
+
+[test]
+def test_const_bool(t : T?) {
+    t |> equal(_test_has_sideeffects(true), false)
+}
+
+[test]
+def test_var_read(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(_x), false)
+}
+
+[test]
+def test_arith_pure(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(_x + 1), false)
+}
+
+[test]
+def test_arith_nested(t : T?) {
+    var _x = 5
+    var _y = 3
+    t |> equal(_test_has_sideeffects(_x * 2 + _y - 3), false)
+}
+
+[test]
+def test_field_access(t : T?) {
+    var _s = Foo(a = 1, b = 2)
+    t |> equal(_test_has_sideeffects(_s.a), false)
+}
+
+[test]
+def test_array_index(t : T?) {
+    var _arr = [1, 2, 3, 4, 5]
+    t |> equal(_test_has_sideeffects(_arr[0]), false)
+}
+
+[test]
+def test_safe_table_lookup(t : T?) {
+    var tab : table<string; int>
+    tab |> insert("k", 1)
+    t |> equal(_test_has_sideeffects(tab?["k"]), false)
+}
+
+[test]
+def test_comparison(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(_x == 0), false)
+}
+
+[test]
+def test_ternary(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(_x > 0 ? 1 : 2), false)
+}
+
+[test]
+def test_null_coalescing(t : T?) {
+    var _p : int? = null
+    t |> equal(_test_has_sideeffects(_p ?? 0), false)
+}
+
+[test]
+def test_logical_and(t : T?) {
+    var _x = 5
+    var _y = 10
+    t |> equal(_test_has_sideeffects(_x > 0 && _y < 100), false)
+}
+
+[test]
+def test_unary_neg(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(-_x), false)
+}
+
+[test]
+def test_string_builder_safe(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects("hello {_x}"), false)
+}
+
+// ── UNSAFE cases — has_sideeffects must return true ──────────────────────
+
+[test]
+def test_user_call_unsafe(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(side_effect_fn(_x)), true)
+}
+
+[test]
+def test_table_insert_subscript(t : T?) {
+    var _tab : table<string; int>
+    // _tab[k] auto-inserts a default value if k is missing — side effect.
+    t |> equal(_test_has_sideeffects(_tab["k"]), true)
+}
+
+[test]
+def test_division_unsafe(t : T?) {
+    var _x = 10
+    var _y = 2
+    // `/` can panic on div-by-zero — kept on the unsafe side by explicit blacklist.
+    t |> equal(_test_has_sideeffects(_x / _y), true)
+}
+
+[test]
+def test_modulo_unsafe(t : T?) {
+    var _x = 10
+    var _y = 3
+    t |> equal(_test_has_sideeffects(_x % _y), true)
+}
+
+[test]
+def test_array_literal_alloc(t : T?) {
+    t |> equal(_test_has_sideeffects([1, 2, 3]), true)
+}
+
+[test]
+def test_struct_construct_alloc(t : T?) {
+    t |> equal(_test_has_sideeffects(Foo(a = 1, b = 2)), true)
+}
+
+[test]
+def test_string_builder_unsafe_part(t : T?) {
+    var _x = 5
+    // The string interpolation itself is safe, but a side-effecting operand propagates.
+    t |> equal(_test_has_sideeffects("hello {side_effect_fn(_x)}"), true)
+}
+
+// ── Conservative-unsafe cases — daslang-generic helpers fall through ─────
+//
+// `length`, `key_exists`, etc. are defined as daslang generics in builtin.das
+// (`def length(a : auto | #) ...`). The compile-time func resolution doesn't
+// always reach a `flags.builtIn=true` C++ overload before the call_macro runs,
+// so the conservative classifier rejects them. A future Function-level
+// `[no_side_effects]` annotation could let user-defined helpers opt in.
+
+[test]
+def test_generic_length_unresolved(t : T?) {
+    var _arr = [1, 2, 3]
+    t |> equal(_test_has_sideeffects(length(_arr)), true)
+}
+
+[test]
+def test_key_exists_unresolved(t : T?) {
+    var tab : table<string; int>
+    tab |> insert("k", 1)
+    t |> equal(_test_has_sideeffects(key_exists(tab, "k")), true)
+}

From f77a072c570b65e910423cc8f65d37fc446c705e Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 16:17:07 -0700
Subject: [PATCH 09/18] mouse-data/docs: 16 new + 1 updated card from linq_fold
 + Phase 2A session
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cards added in the course of the linq_fold splice rewrite + PR #2691
(has_sideeffects + counter-lane elision). Topics:

linq_fold / macro-emission patterns:
- daslang-generic-instance-detect-via-fromgeneric — func.fromGeneric is
  the canonical "which generic was this instantiated from?" link;
  func.name on typed instances is mangled.
- daslib-macro-boost-has-sideeffects-predicate — new public predicate,
  full classification table, known limitations, test plumbing.
- qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array — typedecl
  block-param const/ref handling differs between iterator and array
  sources; the two diagnostic error messages tell you which branch you
  picked wrong.
- qmacro-gensym-per-callsite-via-lineinfo — backtick-prefixed names +
  line+column suffix, force_at / force_generated / can_shadow.
- my-fold-macro-emits-a-loop-with-for-it-in-source-... (UPDATED) —
  peel_each pattern corrected for generic-instance detection + positive
  array gate + block-param typedecl handling.

LINQ semantics:
- are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-...
- which-typedecl-predicates-identify-types-where-length-expr-is-...
- why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-...
- what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-...
- my-macro-substitutes-it-for-a-projection-expression-via-template-...
- when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-...
- where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-...

Tooling / ops:
- how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-...
- cpp-profiler-macos-samply-instruments.md
- what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-...
- how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-...

Updated:
- why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-...
  — original card pointed at a require-order red herring; real cause
  was ref_time_ticks() returning ns on POSIX while wait_until_ready's
  deadline math assumed μs. Fix landed in PR #2685.

No code changes — docs only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 ...output-to-the-underlying-linq-operators.md | 37 ++++++++
 .../cpp-profiler-macos-samply-instruments.md  | 68 ++++++++++++++
 ...generic-instance-detect-via-fromgeneric.md | 33 +++++++
 ...b-macro-boost-has-sideeffects-predicate.md | 43 +++++++++
 ...pointer-e-g-addfontfromfilettf-on-getio.md | 66 ++++++++++++++
 ...mode-and-what-s-the-command-line-syntax.md | 45 +++++++++
 ...e-doesn-t-fire-when-the-chain-starts-wi.md | 56 ++++++++++++
 ...-apply-template-but-the-result-fails-to.md | 24 +++++
 ...qmacro-gensym-per-callsite-via-lineinfo.md | 43 +++++++++
 ...ce-bind-typedecl-modifier-iter-vs-array.md | 46 ++++++++++
 ...daslib-das-module-so-docs-build-cleanly.md | 46 ++++++++++
 ...tors-aren-t-supported-as-sql-chain-term.md | 39 ++++++++
 ...f-typeinfo-is-workhorse-e-proj-or-decid.md | 33 +++++++
 ...res-at-the-user-s-call-site-rather-than.md | 36 ++++++++
 ...statically-resolvable-in-daslang-macros.md | 63 +++++++++++++
 ...-what-s-the-alternative-in-a-linq-chain.md | 36 ++++++++
 ...status-works-fine-is-it-a-require-order.md | 91 +++++++++++--------
 17 files changed, 769 insertions(+), 36 deletions(-)
 create mode 100644 mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md
 create mode 100644 mouse-data/docs/cpp-profiler-macos-samply-instruments.md
 create mode 100644 mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md
 create mode 100644 mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md
 create mode 100644 mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md
 create mode 100644 mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md
 create mode 100644 mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md
 create mode 100644 mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md
 create mode 100644 mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md
 create mode 100644 mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md
 create mode 100644 mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md
 create mode 100644 mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md
 create mode 100644 mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md
 create mode 100644 mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md
 create mode 100644 mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md
 create mode 100644 mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md

diff --git a/mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md b/mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md
new file mode 100644
index 0000000000..551cbb2b0e
--- /dev/null
+++ b/mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md
@@ -0,0 +1,37 @@
+---
+slug: are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators
+title: Are there parity tests in tests/linq/ that compare `_fold` output to the underlying linq operators?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+There's no file named "parity" or similar. The parity-test surface IS the broader [tests/linq/](tests/linq/) directory:
+
+- `test_linq.das` — comprehension basics
+- `test_linq_aggregation.das` — count/sum/min/max/avg
+- `test_linq_querying.das` — any/all/contains
+- `test_linq_transform.das` — select / select_many / zip
+- `test_linq_sorting.das` — order / reverse
+- `test_linq_group_by.das` — group_by / having
+- `test_linq_join.das` — joins
+- `test_linq_partition.das` — take / skip / chunk / take_while / skip_while
+- `test_linq_set.das` — distinct / union / except / intersect / unique
+- `test_linq_element.das` — first / last / single / element_at
+- `test_linq_concat.das` — concat / prepend / append
+- `test_linq_generation.das` — range / repeat
+- `test_linq_bugs.das` — regressions
+
+Each file uses `[test]` functions with `t |> run("name") @(t) { ... }` blocks asserting `t |> equal(actual, expected)`. These exercise the regular linq operators (`where_`, `select`, `count`, ...) directly — they're not split into "fold-on" vs "fold-off" variants.
+
+Dedicated `_fold` tests live in `test_linq_fold.das` (functional output) and `test_linq_fold_ast.das` (AST-shape verification — pattern-matches the macro expansion). These DO compare `_fold(chain)` output against the plain `chain` output for the shapes the macro recognizes.
+
+When the user says "parity tests" in linq context, treat the full `test_linq_*.das` suite as the operator-coverage map. Phase-2+ benchmark/splice PRs should add a `benchmarks/sql/` entry for each shape exercised here that isn't already covered (tracked as a checklist in `benchmarks/sql/LINQ.md`).
+
+## Questions
+- Are there parity tests in tests/linq/ that compare `_fold` output to the underlying linq operators?
+- What's the "parity test" coverage surface for linq?
+- Where are tests for linq operators?
+
+## Questions
+- Are there parity tests in tests/linq/ that compare `_fold` output to the underlying linq operators?
diff --git a/mouse-data/docs/cpp-profiler-macos-samply-instruments.md b/mouse-data/docs/cpp-profiler-macos-samply-instruments.md
new file mode 100644
index 0000000000..db57cabc9b
--- /dev/null
+++ b/mouse-data/docs/cpp-profiler-macos-samply-instruments.md
@@ -0,0 +1,68 @@
+---
+slug: cpp-profiler-macos-samply-instruments
+title: What C++ sampling profiler should I use on macOS for daslang (and how do I run it)?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+# C++ sampling profiler on macOS (Apple Silicon)
+
+VS Code has **no first-class C++ profiler integration on macOS** — the "Performance Profiler" / similar extensions wrap Linux `perf` and don't help here. Skip them. Run a sampler from the integrated terminal and view results in browser/Instruments.
+
+## samply (default choice)
+
+Rust-built, Firefox-Profiler frontend, zero config.
+
+```bash
+cargo install samply
+samply record ./build/daslang script.das
+```
+
+- Opens flamegraph in browser automatically.
+- Symbolicates Mach-O cleanly if you build `-DCMAKE_BUILD_TYPE=RelWithDebInfo` (do NOT use plain `Release` — symbols are stripped).
+- Works without sudo on Apple Silicon.
+- Good for "where does the CPU go" questions.
+
+## Xcode Instruments — Time Profiler (second opinion)
+
+Native macOS sampler, kernel-assisted, best symbolication on Apple Silicon. Use when samply's view is ambiguous or you want call-tree + timeline together.
+
+```bash
+xcrun xctrace record --template 'Time Profiler' --launch -- ./build/daslang script.das
+```
+
+Then open the resulting `.trace` bundle (Instruments launches). UI is outside VS Code.
+
+## daslang-specific recipe
+
+Pair the sampler with the per-module compile-time logging (`-log-compile-time` CLI flag, added on branch `bbatkin/log-compile-time-cli`):
+
+```bash
+cmake --build build --config RelWithDebInfo -j 64
+samply record ./build/daslang -log-compile-time path/to/script.das
+```
+
+- `-log-compile-time` tells you which module is slow.
+- Sampling tells you which function inside that module is hot.
+- Together they narrow "compile is slow" to a specific phase + symbol.
+
+## What NOT to use
+
+- `perf` — Linux only, doesn't exist on Darwin.
+- Intel VTune — x86-mostly, ignore on Apple Silicon.
+- `gprof` — instrumenting, not sampling; ancient.
+- VS Code C++ profiler extensions — see above, all are Linux/perf wrappers or toys.
+- `hyperfine` / `poop` — benchmarking (whole-program timing), not profiling (per-function hotspots). Different question.
+
+## Build flag reminder
+
+Both samply and Instruments need symbols. The two viable build types on this repo:
+
+- `RelWithDebInfo` — fast code + symbols. Use this for profiling.
+- `Debug` — slow code; profile reflects debug overhead, not real hotspots. Avoid.
+
+Plain `Release` strips symbols and you'll get `???` everywhere in the flamegraph.
+
+## Questions
+- What C++ sampling profiler should I use on macOS for daslang (and how do I run it)?
diff --git a/mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md b/mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md
new file mode 100644
index 0000000000..922061d8e7
--- /dev/null
+++ b/mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md
@@ -0,0 +1,33 @@
+---
+slug: daslang-generic-instance-detect-via-fromgeneric
+title: How do I detect that an ExprCall is to a daslang generic (e.g. each, length, find) when func.name is the mangled instance name and not the original generic's name?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+When a daslang generic function (`def each(a : array<auto(TT)>) : iterator<TT&>`, `def length(a : auto | #) : int`, etc.) is resolved against a concrete type at infer time, the resolved `Function?` instance gets a **mangled name** like `` `builtin`each`30908`12345 ``. Macro code that compares `call.func.name == "each"` will never match a typed instance.
+
+The original generic's identity lives in `call.func.fromGeneric`:
+
+```das
+[macro_function]
+def private is_each_call(call : ExprCall?) : bool {
+    if (call == null || call.func == null) return false
+    return (call.func.name == "each"
+        || (call.func.fromGeneric != null && call.func.fromGeneric.name == "each"))
+}
+```
+
+The `name == "each"` branch covers the unusual case where you see the call before the typer has specialized it (e.g. inside a custom call_macro that runs early). The `fromGeneric.name` branch is the normal case for any post-infer chain.
+
+**When this bites:** writing a `[macro_function]` that pattern-matches on a stdlib helper by name — `each`, `length`, `key_exists`, `find`, `set_insert`, all the generic `to_array`/`to_table` variants. Without the `fromGeneric` check, every typed chain silently falls through your match and your macro behaves as if the helper wasn't there.
+
+**Generalizes beyond function calls:** same applies to method overload resolution. `call.func.fromGeneric` is the canonical "which generic was this instantiated from?" link. There's no `originalName` field — the chain is `func → func.fromGeneric → fromGeneric.name`.
+
+**Doesn't apply to:** C++ builtins from `addExtern<>` (no fromGeneric, the `func.name` is the bound name directly). Builtins also have `func.flags.builtIn = true` if you need to distinguish.
+
+See [[my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi]] for the concrete case where this broke `peel_each` in `daslib/linq_fold.das`.
+
+## Questions
+- How do I detect that an ExprCall is to a daslang generic (e.g. each, length, find) when func.name is the mangled instance name and not the original generic's name?
diff --git a/mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md b/mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md
new file mode 100644
index 0000000000..5d551d415c
--- /dev/null
+++ b/mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md
@@ -0,0 +1,43 @@
+---
+slug: daslib-macro-boost-has-sideeffects-predicate
+title: Is there a conservative side-effect detector for Expression nodes in daslib macro_boost — something I can call from a call_macro to know if it's safe to elide an evaluation at macro time?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+Yes — `has_sideeffects(expr : Expression?) : bool` in `daslib/macro_boost` (added in PR #2691, follow-up to Phase 2A loop planner). Returns `true` if the expression has or **might have** side effects; `false` ONLY when provably pure.
+
+```das
+require daslib/macro_boost public
+
+if (has_sideeffects(projection)) {
+    // Emit the bind — projection must run for its side effects.
+    sideEffectStmts |> push <| qmacro_expr() {
+        var $i(finalBindName) = $e(projection)
+    }
+} else {
+    // Skip the bind — pure projection, no observable effect.
+}
+```
+
+**Conservative — false is a promise:**
+
+- SAFE leaves: `ExprVar`, all `ExprConst*`, `ExprAddr`, `ExprTypeInfo/Decl/Tag`.
+- SAFE via recursion: `ExprField`, `ExprSafeField`, `ExprSwizzle`, `ExprRef2Value/Ptr`, `ExprPtr2Ref`, `ExprIs`, `ExprAsVariant`, `ExprIsVariant`, `ExprSafeAsVariant`, `ExprCast`, `ExprNullCoalescing`, `ExprStringBuilder` (string heap is no-op per compiler), `ExprKeyExists` (pure container read).
+- `ExprAt`: safe when `subexpr._type` is NOT `isGoodTableType` (tables auto-insert default on missing key — a write). `ExprSafeAt` (`?[...]`) always safe.
+- `ExprOp1/Op2/Op3`: op-name allowlist for pure ops on workhorse types — `+ - * == != < <= > >= & | ^ << >> && || ?:` (Op2), `- ! ~ +` (Op1). Falls back to `func.flags.builtIn && !knownSideEffects && !unsafeOperation`. `/` and `%` BLACKLISTED (div-by-zero panic).
+- `ExprCall`/`ExprCallFunc`: allowed when `func.flags.builtIn && !knownSideEffects && !unsafeOperation`, then recurse args.
+- Everything else (including `ExprNew`, all `ExprMake*`, user-defined calls, `ExprInvoke`, `ExprYield`, statement-context exprs): UNSAFE.
+
+**Known limitations / when it returns conservative-unsafe:**
+
+- daslang-generic helpers like `length(arr)` and `key_exists(tab, k)` — the resolved `func.name` is the mangled instance, and the typer hasn't always reached the `flags.builtIn=true` C++ overload before the call_macro fires. They show up as user-call shapes and get rejected. Workaround: don't rely on this for length/key_exists in projections (they appear in `has_sideeffects` tests as `target_generic_length_unresolved` / `target_key_exists_unresolved` returning `true`).
+- User-defined pure helpers — there's no `[no_side_effects]` annotation yet. The compiler's `expr.flags.noSideEffects` fast path catches some cases (set during infer), but anything the typer didn't tag falls through to UNSAFE.
+
+**Tests:** `tests/macro_boost/test_has_sideeffects.das` has 24 cases (17 safe + 5 unsafe + 2 conservative-unsafe) wired via a `_test_has_sideeffects` probe `call_macro` ([`tests/macro_boost/_has_sideeffects_probe.das`](../../tests/macro_boost/_has_sideeffects_probe.das)) that runs the predicate at macro time and emits `ExprConstBool` of the result. Use the same probe pattern when testing any new predicate that needs to run at macro time but be exercised via runtime tests.
+
+**Real use:** `daslib/linq_fold.das` `plan_loop_or_count` uses it for three optimizations: discardable `var vfinal =` bind elision, count→length shortcut gate (whole loop elided when no filter + all projections pure + source has length), and tracking `allProjectionsPure` across chained selects. select_count benchmark went from 2 → 0 ns/op.
+
+## Questions
+- Is there a conservative side-effect detector for Expression nodes in daslib macro_boost — something I can call from a call_macro to know if it's safe to elide an evaluation at macro time?
diff --git a/mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md b/mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md
new file mode 100644
index 0000000000..450226dba8
--- /dev/null
+++ b/mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md
@@ -0,0 +1,66 @@
+---
+slug: how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio
+title: How do I call a dasImgui (or any managed C++) method on a struct field that's bound as a raw pointer — e.g. AddFontFromFileTTF on GetIO().Fonts?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+## TL;DR
+
+When a managed struct's field is bound as a pointer (`T?`) and the method on that pointed-to struct expects the value by-ref (`T implicit`), you must explicitly **dereference**. Plain `field |> method(...)` errors with mismatched types.
+
+## The error you'll hit
+
+```
+error[30341]: no matching functions or generics: AddFontFromFileTTF(imgui::ImFontAtlas?&, string const&, ...)
+  candidate function: ImFontAtlas implicit ...
+  invalid argument 'self' (0). expecting 'imgui::ImFontAtlas implicit', passing 'imgui::ImFontAtlas?&'
+```
+
+The `?` is the giveaway — `GetIO().Fonts` is `ImFontAtlas?` (raw pointer; field bound via `addField<DAS_BIND_MANAGED_FIELD(Fonts)>` against C++ `ImFontAtlas* Fonts`), but the method binding `das_call_member< ImFont * (ImFontAtlas::*)(...) >` takes the receiver by-value/ref.
+
+## The fix
+
+Bind a local ref through `unsafe(*ptr)`, then call as usual:
+
+```daslang
+var atlas & = unsafe(*GetIO().Fonts)
+let f = atlas |> AddFontFromFileTTF(ttf, 14.0f, null, null)
+```
+
+Equivalent inline form:
+
+```daslang
+unsafe(*GetIO().Fonts) |> AddFontFromFileTTF(ttf, 14.0f, null, null)
+```
+
+## Why each part
+
+- **`*ptr`** is daslang's pointer-deref syntax (see `daslib/if_not_null.rst`: *"a dereferenced call: ``if (ptr != null) { call(*ptr, args) }``"*). The alternative `deref(ptr)` exists too but is rarer in modules; `*` is the idiom.
+- **`unsafe(...)`** is required because dereferencing a raw `T?` is unsafe (no null check, no lifetime guarantee).
+- **`var atlas &`** binds a local *reference* — without `&` you'd be copying the whole `ImFontAtlas` struct into a stack temporary, which (a) wastes memory and (b) means any mutation the method does (font atlas builds, glyph rasterization) hits the copy and is lost.
+- **The pipe `|>` works fine on the local ref** — `atlas |> method(x, y)` desugars to `method(atlas, x, y)` and the `implicit` first-param accepts the ref directly.
+
+## Why NOT the other shapes
+
+- `GetIO().Fonts.AddFontFromFileTTF(...)` — `.method()` sugar is sugar for `method(self, ...)` only when `self` is a struct value. CLAUDE.md explicitly: *"Does NOT work on: primitives, tuples/arrays, and lambda typedefs"* — and (this case) raw pointers. Field *access* on a pointer auto-derefs (`GetIO().Fonts.TexID` works); method dispatch does not.
+- `GetIO().Fonts->AddFontFromFileTTF(...)` — `->` is for class instances (smart_ptr / class types), not raw C-struct pointers from `ManagedStructureAnnotation`.
+- `deref(GetIO().Fonts) |> AddFontFromFileTTF(...)` — works but the pipe gets a temporary value not a ref; mutations on the receiver disappear. Use `var x & = unsafe(*p)` instead.
+
+## When this comes up
+
+Anywhere a C++ binding exposes a struct field as `T*` (typical for "owns-an-atlas" or "owns-a-context" patterns):
+- `ImGuiIO::Fonts` → `ImFontAtlas?`
+- `ImDrawData::CmdLists` → indirection on lists
+- anything bound via raw `addField<DAS_BIND_MANAGED_FIELD(X)>` where the C++ type is `Foo*`
+
+If the C++ field were a value (`ImFontAtlas Fonts;` instead of `ImFontAtlas* Fonts;`), it'd bind as the struct directly and the pipe would just work.
+
+## Related
+
+- [[dasimgui-new-state-struct-widget-auto-emit-just-works]] — different topic (state-struct registration) but same module family.
+- [[how-do-i-pack-an-im-col32-color-from-dasimgui-v2-code-without-depending-on-the-v1-daslib-imgui-boost-path]] — sibling dasImgui idiom.
+
+## Questions
+- How do I call a dasImgui (or any managed C++) method on a struct field that's bound as a raw pointer — e.g. AddFontFromFileTTF on GetIO().Fonts?
diff --git a/mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md b/mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md
new file mode 100644
index 0000000000..014873daea
--- /dev/null
+++ b/mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md
@@ -0,0 +1,45 @@
+---
+slug: how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax
+title: How do I run dastest in benchmark-only mode and what's the command-line syntax?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+Benchmarks are functions annotated with `[benchmark]` from `dastest/testing_boost.das`. Run them via the dastest harness with `--bench`:
+
+```bash
+# All benchmarks in a directory (skip the regular tests)
+./bin/daslang dastest/dastest.das -- --bench --test benchmarks/sql --test-names none
+
+# Just one file
+./bin/daslang dastest/dastest.das -- --bench --test benchmarks/sql/count_aggregate.das --test-names none
+
+# Filter by [benchmark] function-name prefix (substring match on the function name)
+./bin/daslang dastest/dastest.das -- --bench --bench-names sum_ --test benchmarks/sql --test-names none
+
+# Collect N samples for variance / averaging
+./bin/daslang dastest/dastest.das -- --bench --test benchmarks/sql/count_aggregate.das --test-names none --count 5
+```
+
+Key flags:
+- `--bench` — enable benchmark execution
+- `--test <path>` — folder or single file (NOT positional)
+- `--test-names none` — skip regular `[test]` discovery (benchmarks only)
+- `--bench-names <prefix>` — filter benchmarks by function-name prefix
+- `--bench-format <native|go|json>` — output format
+- `--count <n>` — repeat all benchmarks N times
+
+Benchmarks only run after all module **tests** have passed; that's why `--test-names none` is the canonical "skip tests, run benchmarks" combo.
+
+Output is `<sub_name> N ns/op <bytes>/op <allocs>/op <SB>/op <strings>/op`. If the benchmark `b |> run(name, chunk_size, op)` form passes a chunk_size (typically the dataset size), the displayed ns/op is **divided by that chunk_size** — i.e. per-element time, not per-op-call time. Sub-nanosecond results (`0 ns/op`) usually mean early-exit hit the answer in O(1) regardless of dataset size.
+
+Reference: `dastest/README.md` and `dastest/dastest_clargs.das`.
+
+## Questions
+- How do I run dastest in benchmark-only mode and what's the command-line syntax?
+- What's the dastest --bench command line?
+- How do I filter dastest benchmarks by name?
+
+## Questions
+- How do I run dastest in benchmark-only mode and what's the command-line syntax?
diff --git a/mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md b/mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md
new file mode 100644
index 0000000000..456c78e348
--- /dev/null
+++ b/mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md
@@ -0,0 +1,56 @@
+---
+slug: my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi
+title: My fold macro emits a loop with `for (it in source); acc |> reserve(length(source))` but the reserve doesn't fire when the chain starts with `each(arr)`. How do I make it work?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: [[daslang-generic-instance-detect-via-fromgeneric]]
+---
+
+Peel `each(<expr>)` at macro time. `each(arr)` reports as `iterator<T>`, so any "is the source an iterator?" check (e.g. `top._type.isIterator`) sees `true` and the array-only reserve path is skipped. But the iteration semantics of `for (it in each(arr))` and `for (it in arr)` are identical — the wrapper iterator is incidental in fold context.
+
+Pattern (corrected version, from `daslib/linq_fold.das` after Phase 2A bind-elision PR):
+
+```das
+[macro_function]
+def private is_each_call(call : ExprCall?) : bool {
+    // `each` in daslib/builtin.das is generic — the resolved `func.name`
+    // on a typed instance is mangled (e.g. `builtin`each`30908...`).
+    // The original generic's name lives in `func.fromGeneric.name`.
+    if (call == null || call.func == null) return false
+    return (call.func.name == "each"
+        || (call.func.fromGeneric != null && call.func.fromGeneric.name == "each"))
+}
+
+[macro_function]
+def private peel_each(var top : Expression?) : Expression? {
+    if (!(top is ExprCall)) return top
+    var topCall = top as ExprCall
+    if (!is_each_call(topCall) || topCall.arguments |> length != 1) return top
+    let argExpr = topCall.arguments[0]
+    // Only peel when the argument is a true array (or fixed-size array).
+    // Don't peel iterator-typed args like `each(range(10))` — replacing the
+    // each call with the raw range would break length-shortcut + reserve
+    // hints that assume an indexable source.
+    if ((argExpr == null || argExpr._type == null)
+            || (!argExpr._type.isGoodArrayType && !argExpr._type.isArray)) return top
+    return clone_expression(argExpr)
+}
+```
+
+**Two gotchas the original version missed:**
+
+1. `func.name == "each"` never matched typed instances — generic-instance detection requires `fromGeneric.name`. See [[daslang-generic-instance-detect-via-fromgeneric]].
+2. Peel gate must be **positive** (`is good array`) not negative (`isn't iterator`). `each(range(N))` returns an iterator but its argument `range(N)` is also iterator-shaped (`isRange`) and would otherwise pass `!isIterator`. The positive `isGoodArrayType || isArray` gate cleanly excludes range/string/lambda sources.
+
+**Block-parameter typedecl needs branching on source shape after peel.** When peel fires, the source goes from iterator (rvalue, no modifiers) to array (`array<T> const&` for `let arr <-` chains). The block parameter type:
+- iterator source: `typedecl($e(topExpr)) - const` — strip rvalue const so body can iterate
+- array source: `typedecl($e(topExpr))` (no modifier) — keep `const&` so const-ref source matches
+
+Both wrong → either `array<int> const& vs array<int>` mismatch or `can't iterate over const iterator`.
+
+**What this is worth:** brought `linq_fold`'s `each(arr)._where(...)._select(_.price).to_array()` benchmark from 13 → 10 ns/op (parity with comprehension baseline). The count→length shortcut built on top brings pure `each(arr)._select(_.x).count()` from 2 → 0 ns/op (loop entirely elided).
+
+**Generalizes:** any fused-loop emitter that needs the source's length (reserve, two-pass, length-aware operators like `take_last`), peel inner-array-yielding wrappers — but use `fromGeneric` for generic helpers and a positive array gate, not a negative iterator gate.
+
+## Questions
+- My fold macro emits a loop with `for (it in source); acc |> reserve(length(source))` but the reserve doesn't fire when the chain starts with `each(arr)`. How do I make it work?
diff --git a/mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md b/mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md
new file mode 100644
index 0000000000..9652ba3436
--- /dev/null
+++ b/mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md
@@ -0,0 +1,24 @@
+---
+slug: my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to
+title: My macro substitutes `it` for a projection expression via `Template.replaceVariable("it", proj) + apply_template`, but the result fails to compile with "can only dereference a reference". What's going wrong?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+Post-typer, reads of a `var` local appear wrapped as `ExprRef2Value(ExprVar(name))` — the invisible adapter the typer inserts to dereference a reference for its value. `templates_boost.TemplateVisitor.visitExprVar` (the engine behind `Template.replaceVariable + apply_template`) only matches the inner `ExprVar` and replaces IT with a clone of the substitute. The outer `ExprRef2Value` wrapper stays, but now it wraps a non-reference value — compile error `30921: can only dereference a reference`.
+
+This is the same `ExprRef2Value`-transparency problem `daslib/ast_match.das` documents for `qmatch` — they solve it on both pattern and source sides via `qm_peel_ref2value`. `apply_template` does NOT auto-peel.
+
+Two fixes for substitution:
+
+1. **Pre-peel the destination** before `apply_template`: walk `dst` and replace every `ExprRef2Value(ExprVar(name))` with the inner `ExprVar(name)` first. After substitution, the result is clean. Drawback: removes wrappers globally (around other identifiers too) — if other refs still need the wrapper, the typer will re-insert them, but you've added a pass.
+
+2. **Use a custom visitor instead of `Template.replaceVariable`**: override `visitExprRef2Value` to detect `ExprRef2Value(ExprVar(name))` and return `clone_expression(replacement)` directly (stripping the wrapper as part of the substitution). Override `visitExprVar` as a fallback for bare ExprVars. The pattern mirrors `qm_peel_ref2value`'s "peel both sides" approach.
+
+Concrete repro: daslang `linq_fold`'s Phase 2A planner tried to fuse chained `_select|_select` via `substitute_it_for(proj2, "it", proj1)`. proj1 was `it * 2` (where `it` is the typed-and-wrapped loop var), proj2 was `it + 1`. Substituting via Template replaced the inner ExprVar in proj2 but left `ExprRef2Value(it * 2) + 1` — type error. The fix was deferred (chained-select falls through unfolded in Phase 2A) but Phase 2B needs option 2.
+
+See `skills/das_macros.md` "Peel ExprRef2Value before qmatch" for the matcher-side analog. The substitution side has no in-tree helper yet.
+
+## Questions
+- My macro substitutes `it` for a projection expression via `Template.replaceVariable("it", proj) + apply_template`, but the result fails to compile with "can only dereference a reference". What's going wrong?
diff --git a/mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md b/mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md
new file mode 100644
index 0000000000..515d4d51dd
--- /dev/null
+++ b/mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md
@@ -0,0 +1,43 @@
+---
+slug: qmacro-gensym-per-callsite-via-lineinfo
+title: How do I generate a uniquely-named gensym inside an AstCallMacro for a per-call-site variable, using LineInfo?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+Use the call site's `LineInfo.line` + `.column` interpolated into a backtick-prefixed identifier string. Backtick-prefixed names live in a separate namespace so they don't collide with user-typed identifiers and they survive lint/style passes.
+
+```das
+def override visit(prog : ProgramPtr; mod : Module?; var call : ExprCallMacro?) : Expression? {
+    let at = call.at  // LineInfo of the call site
+    let accName = "`acc`{at.line}`{at.column}"
+    let itName  = "`it`{at.line}`{at.column}"
+    let srcName = "`src`{at.line}`{at.column}"
+
+    var res = qmacro(invoke($($i(srcName) : typedecl($e(src))) {
+        var $i(accName) = 0
+        for ($i(itName) in $i(srcName)) {
+            // ...
+        }
+        return $i(accName)
+    }, $e(src)))
+    res.force_at(at)
+    res.force_generated(true)
+    return res
+}
+```
+
+Two follow-up steps you almost always want:
+
+1. `res.force_at(at)` + `res.force_generated(true)` — sets `at = call.at` on every emitted node and marks them macro-generated. The latter bypasses lint rules that would otherwise fire on synthesized code (e.g. STYLE001, LINT002 "unused variable").
+2. `(blk._block as ExprBlock).arguments[0].flags.can_shadow = true` on the bound let-variable — quiets shadow warnings if the user already has an `acc`/`it`/`src` in scope. Reach for `.flags.can_shadow` on any qmacro-bound name that might collide with caller context.
+
+**Why include both line AND column:** macros emitted from nested helpers can have several emission sites on the same line (e.g. piped chains where each `|>` step emits a separate gensym). Line alone is not unique.
+
+**Why backtick prefix:** the backtick is a daslang lexer hint that this is an internal/synthesized name. Without it, very-long generated names sometimes clash with user identifiers or trip naming rules (the formatter, the auto-rename tools).
+
+**Worked example:** `daslib/linq_fold.das` `plan_loop_or_count` — multiple gensyms per emission site (accumulator, iterator, source, bound projection). Variants per fold-helper too (`fold_where_count` uses `nName` over `accName`).
+
+## Questions
+- How do I generate a uniquely-named gensym inside an AstCallMacro for a per-call-site variable, using LineInfo?
diff --git a/mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md b/mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md
new file mode 100644
index 0000000000..a1193075a1
--- /dev/null
+++ b/mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md
@@ -0,0 +1,46 @@
+---
+slug: qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array
+title: In a call_macro that emits an `invoke($($i(src) : typedecl($e(topExpr)) <modifier>) { ... }, $e(topExpr))` wrapper, what `<modifier>` do I use so the param matches both array and iterator sources without const/ref mismatches?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+There is no single modifier that works for both — branch on `top._type.isIterator`:
+
+```das
+if (top._type != null && top._type.isIterator) {
+    // Iterator source — rvalue from a function call like each(range(10)).
+    // typedecl() picks up the function-return type which carries const;
+    // -const strips it so the body can `for (it in src)` (otherwise
+    // daslang complains "can't iterate over const iterator").
+    res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) {
+        // ... body uses $i(srcName) ...
+    }, $e(topExpr)))
+} else {
+    // Container source with length — array/table/string/range/fixed-array.
+    // `let arr <- [...]` is `array<T> const&`. Stripping -const would
+    // produce a non-const-ref param; passing the const-ref source then
+    // fails with `array<T> const& vs array<T>` ("can't ref types
+    // can only add constness"). Keep modifiers — typedecl() preserves
+    // them and the const-ref source matches exactly.
+    res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) {
+        // ... body uses $i(srcName) ...
+    }, $e(topExpr)))
+}
+```
+
+The two error messages are diagnostic — they tell you which branch you're on:
+- `can't iterate over const iterator` → you forgot `-const` on an iterator path
+- `array<T> const& vs array<T> ... can't ref types can only add constness` → you have `-const` on an array path
+
+**Why this is needed in the first place:** the block param is your way to bind the source expression to a stable name so the loop body can reference it once without re-evaluating side effects. The "right" param type is "whatever the source actually is" — but qmacro `typedecl(expr)` produces the raw type-of including const-ref from the call return, which only sometimes matches what the consumer needs.
+
+**Use `top._type != null` guard** — `_type` is null for freshly cloned expressions that haven't gone through the typer yet. Treating null as "not iterator" (default to array branch) is wrong if you're past the typer; pick conservatively and call out the assumption.
+
+**See `daslib/linq_fold.das` `plan_loop_or_count`** for a working example with five emission sites — counter lane, array-lane iter/sourceHasLength/else, and the length-shortcut path that's only reachable when the source has length (so it always uses the no-modifier form).
+
+**Fast path for length-shortcut:** if you can emit `length($e(topExpr))` directly without the invoke wrapper, do that — no source-bind problem. Works when the entire body is one expression and the source's evaluation cost is "you'd evaluate it once anyway."
+
+## Questions
+- In a call_macro that emits an `invoke($($i(src) : typedecl($e(topExpr)) <modifier>) { ... }, $e(topExpr))` wrapper, what `<modifier>` do I use so the param matches both array and iterator sources without const/ref mismatches?
diff --git a/mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md b/mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md
new file mode 100644
index 0000000000..bba99ca5e3
--- /dev/null
+++ b/mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md
@@ -0,0 +1,46 @@
+---
+slug: what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly
+title: What's the end-to-end checklist for adding a new daslib/*.das module so docs build cleanly?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+Four things to update, in order:
+
+**1. `doc/reflections/das2rst.das`** — add a `require daslib/<modname>` near the other daslib requires, write a `document_module_<modname>(root : string)` function modeled on a sibling (e.g. `document_module_linq_boost`), and call it from the dispatcher block near the end. Minimal form for a module with mostly-private internals:
+
+```daslang
+def document_module_my_new_module(root : string) {
+    var mod = find_module("my_new_module")
+    var groups : array<DocGroup>
+    document("Short description", mod, "my_new_module.rst", groups)
+}
+```
+
+For modules with many public functions, copy the `linq_boost` pattern and add `group_by_regex(...)` entries for each named group — anything left over lands in "Uncategorized" and **fails CI**.
+
+**2. `doc/source/stdlib/handmade/module-<modname>.rst`** — `das2rst` auto-creates this as `// stub\nModule <modname>`. Replace the **whole file** with a plain-text description (1-2 paragraphs, with a `.. code-block:: das` require + minimal example). See `module-linq.rst` / `module-linq_boost.rst` for the convention.
+
+**3. `doc/source/stdlib/sec_*.rst`** — find the section your module belongs in (e.g. `sec_algorithms.rst` for linq family, `sec_strings.rst` for strings, etc.) and add `generated/<modname>.rst` to its `.. toctree::`. Without this the page builds but isn't linked.
+
+**4. Regenerate + verify:**
+
+```bash
+./bin/daslang doc/reflections/das2rst.das        # picks up new module + handmade stub
+grep -rl "// stub" doc/source/stdlib/handmade/   # must be empty after step 2
+grep -c Uncategorized doc/source/stdlib/generated/*.rst | grep -v ':0$'  # must be empty
+rm -rf doc/sphinx-build site/doc                 # clean cache (cached builds hide warnings)
+sphinx-build -b html -d doc/sphinx-build doc/source site/doc 2>&1 | tee /tmp/sphinx_out.txt
+grep -iE "warning:|error:" /tmp/sphinx_out.txt   # must be empty
+```
+
+`doc/source/stdlib/generated/*.rst` and `generated/detail/*.rst` are **gitignored** — only commit (1) das2rst.das, (2) the handmade module-<modname>.rst, and (3) the sec_*.rst toctree update.
+
+## Questions
+- What's the end-to-end checklist for adding a new daslib/*.das module so docs build cleanly?
+- Where do I register a new daslib module in das2rst.das?
+- Why does my new module appear as `// stub` in the generated RST?
+
+## Questions
+- What's the end-to-end checklist for adding a new daslib/*.das module so docs build cleanly?
diff --git a/mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md b/mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md
new file mode 100644
index 0000000000..c37c00c27a
--- /dev/null
+++ b/mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md
@@ -0,0 +1,39 @@
+---
+slug: what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term
+title: What's the right sqlite_linq chain form for aggregates (sum/min/max/average), and what operators aren't supported as `_sql` chain terminals?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+Column aggregates in `_sql` chains use the **regular linq function name** after a `_select`, NOT an `_aggregate(_.Col)` macro:
+
+```daslang
+// CORRECT — _sql analyzer recognizes `_select(_.Col) |> sum()` and emits SELECT SUM(price)
+let s = _sql(db |> select_from(type<Car>) |> _select(_.price) |> sum())
+let m = _sql(db |> select_from(type<Car>) |> _select(_.price) |> min())
+let a = _sql(db |> select_from(type<Car>) |> _select(_.price) |> average())  // promotes to double
+```
+
+There is no `_sum` / `_min` / `_max` / `_average` chain macro. The error if you try one is `error[30838]: can't locate variable '_'` because `_sum` doesn't dispatch as a call macro.
+
+The full set of `_sql` chain terminals is **`_to_array()`, `_first()`, `_first_opt()`, `count()`, and `sum()`/`min()`/`max()`/`average()` after a 1-column `_select`**. These are NOT supported as chain terminals:
+
+| Chain | Why not | Workaround |
+|---|---|---|
+| `_any()` (no args, terminal) | not implemented | `_first_opt() \|> is_some` |
+| `_all(pred)` | no SQL idiom recognized | invert: `_where(NOT pred) \|> count() == 0` |
+| `take(N) \|> count()` | LIMIT-after-aggregate has no effect (aggregate collapses to 1 row) | drop count, materialize: `take(N)` returns array, take `length()` |
+| `skip(M) \|> take(N) \|> count()` | same | same — terminate in to_array |
+| `distinct() \|> count()` | `COUNT(DISTINCT col)` not yet implemented | `distinct()` alone, then `length()` of result array |
+| `_sql(... \|> _join(select_from(type<T>), ...))` | inner `select_from` needs db handle wired inside the analyzer | omit m1 / use raw SQL string for join benchmarks |
+
+The error messages from `sqlite_linq.das` are explicit — read them, they spell out the alternative form. Pattern matching for these lives in `modules/dasSQLITE/daslib/sqlite_linq.das` `peel_column_aggregate` and `analyze_chain`.
+
+## Questions
+- What's the right sqlite_linq chain form for aggregates (sum/min/max/average), and what operators aren't supported as `_sql` chain terminals?
+- Why does `_sum(_.price)` fail in `_sql` with "can't locate variable '_'"?
+- How do I express `any`/`all`/distinct-count/take-count in `_sql`?
+
+## Questions
+- What's the right sqlite_linq chain form for aggregates (sum/min/max/average), and what operators aren't supported as `_sql` chain terminals?
diff --git a/mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md b/mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md
new file mode 100644
index 0000000000..50512f0506
--- /dev/null
+++ b/mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md
@@ -0,0 +1,33 @@
+---
+slug: when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid
+title: When a call_macro needs to pick copy-vs-move-init for a projection, should I emit `static_if (typeinfo is_workhorse($e(proj)))` or decide at macro time?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+Decide at macro time. By the time a `[call_macro]` `visit()` fires, inner macros have expanded and the typer has run, so every sub-expression carries a resolved `_type`. Read `projection._type.isWorkhorseType` directly and emit exactly one branch — no `static_if`, no `typeinfo is_workhorse` at runtime, less AST for the typer to fold away later.
+
+Pattern:
+
+```das
+let workhorseProj = projection._type != null && projection._type.isWorkhorseType
+var perElem : Expression?
+if (workhorseProj) {
+    perElem = qmacro_expr() { $i(accName) |> push($e(projection)) }
+} else {
+    perElem = qmacro_block() {
+        var $i(valName) <- $e(projection)
+        $i(accName) |> emplace($i(valName))
+    }
+}
+```
+
+For workhorse types (`int`, `float`, `bool`, `string`, …, anything `isWorkhorseType` returns true for) you can push the expression directly with no intermediate `var v = expr`. For non-workhorse, `<-` is a statement not an expression — you need `var v <- proj; acc |> emplace(v)`. The two-step is only required there.
+
+This trick brought daslang `linq_fold`'s `where|select|to_array` emission from 13 → 11 ns/op (parity with the `_old_fold` comprehension baseline) at 100K rows. See [daslib/linq_fold.das](daslib/linq_fold.das) `plan_loop_or_count` (the array lane). The previous version had a runtime `static_if` inside the qmacro — correct but generated 2× the AST and lost the temp-binding optimization opportunity.
+
+Other `TypeDecl` predicates available at macro time: `isIterator`, `isGoodArrayType`, `isConst`, `isPod`, plus `firstType` / `secondType` / `argTypes` for compound types. Use them; the typer has already done the work.
+
+## Questions
+- When a call_macro needs to pick copy-vs-move-init for a projection, should I emit `static_if (typeinfo is_workhorse($e(proj)))` or decide at macro time?
diff --git a/mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md b/mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md
new file mode 100644
index 0000000000..e73aff48a7
--- /dev/null
+++ b/mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md
@@ -0,0 +1,36 @@
+---
+slug: where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than
+title: Where does `// nolint:RULE` go when a lint warning is emitted from inside a `qmacro_expr` and fires at the user's call site rather than at the macro source?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+The nolint comment must be **inline at the end of the offending line**, inside the `qmacro_expr {...}` block — NOT on a separate comment line above and NOT at the user call site.
+
+When a macro emits code via `qmacro_expr { var $i(name) = $e(expr) }`, lint analyzes the expansion at the user's call site but **reports the source position** as the line inside the qmacro_expr body. To suppress, the comment must travel with the emitted line:
+
+```daslang
+} else {
+    blk.list |> emplace_new <| qmacro_expr() {
+        var $i(newArgName) = $e(newCall)  // nolint:PERF009
+    }
+    ...
+}
+```
+
+What DOESN'T work:
+- `// nolint:PERF009` on a comment line above the qmacro_expr block — suppresses nothing.
+- `// nolint:PERF009` on the user-side `let x = _fold(...)` line — the lint engine reports against the macro source position, not the user site.
+
+The placement rule generalizes: `nolint:RULE` must be **on the literal line** that the lint output points at. For macro-quoted code, that's inside the `qmacro_expr { ... }` body.
+
+Concrete example: PERF009 ("redundant move into variable immediately returned") fired at `daslib/linq_fold.das:490:24` (a line inside `fold_linq_default`'s qmacro_expr) when called via `benchmarks/sql/take_count.das`'s single-pass chain. Inline `// nolint:PERF009` on the emitted `var = expr` line suppresses it cleanly.
+
+## Questions
+- Where does `// nolint:RULE` go when a lint warning is emitted from inside a `qmacro_expr` and fires at the user's call site rather than at the macro source?
+- nolint for macro-generated lint warnings
+- How to suppress a lint rule that fires only at certain user call sites?
+
+## Questions
+- Where does `// nolint:RULE` go when a lint warning is emitted from inside a `qmacro_expr` and fires at the user's call site rather than at the macro source?
diff --git a/mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md b/mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md
new file mode 100644
index 0000000000..b847fb36f8
--- /dev/null
+++ b/mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md
@@ -0,0 +1,63 @@
+---
+slug: which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros
+title: Which TypeDecl predicates identify types where length(expr) is statically resolvable in daslang macros?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+# Length-supporting types in daslang macros
+
+When a `[macro_function]` / `[call_macro]` needs to emit `length($e(src))` and have it compile, the source's `TypeDecl` must be one of:
+
+- `isGoodArrayType` — `array<T>` (the dynamic array, including `array<T>#`)
+- `isGoodTableType` — `table<K; V>`
+- `isString` — `string` / `string#`
+- `isArray` — fixed array `T[N]` (NOT `array<T>` — that's `isGoodArrayType`; the naming is confusing)
+- `isRange` — `range` / `urange` / `range64` / `urange64`
+
+**Excluded** (no `length()` overload — emitting `length(src)` will fail to compile inside macro output):
+
+- `isIterator` — iterators don't carry length, even when wrapping a length-having source. Use the underlying container.
+- `isGoodLambdaType` — `def each(lam : lambda<...>)` makes lambdas iterable, but they have no `length()`. This is a common trap when peeling `each(<x>)` based solely on "not an iterator."
+- Custom user `def each(MyType)` types — depends on whether the user also defined `length(MyType)`; assume no.
+
+## Canonical predicate
+
+```das
+[macro_function]
+def private type_has_length(t : TypeDecl?) : bool {
+    if (t == null) return false
+    return (t.isGoodArrayType || t.isGoodTableType || t.isString
+        || t.isArray || t.isRange)
+}
+```
+
+Note the parenthesization: a bare `||`-chain split across lines hits a `gen2` parse error at the leading `||`. Wrap the chain in `(...)`.
+
+## Why this matters for `each(<x>)` peeling
+
+A common optimization: when a chain starts `each(<x>)._where(...)...`, peel the `each` and iterate `<x>` directly so reserve/length work. The peel must be gated on `type_has_length(<x>._type)` — checking only `!isIterator` would silently accept `each(lambda)` and emit broken `reserve(length(lambda))`.
+
+Example from `daslib/linq_fold.das` (PR #2689, Phase 2A):
+
+```das
+[macro_function]
+def private peel_each_length_source(var top : Expression?) : Expression? {
+    if (!(top is ExprCall)) return top
+    var topCall = top as ExprCall
+    if (topCall.func == null || topCall.func.name != "each"
+            || topCall.arguments |> length != 1
+            || !type_has_length(topCall.arguments[0]._type)) return top
+    return clone_expression(topCall.arguments[0])
+}
+```
+
+The `clone_expression` is needed because `topCall.arguments[0]` is `Expression? const` (the args vector entry is const-typed even when the outer call is `var`); the planner stores `top` as `var Expression?` so the clone drops the const.
+
+## Discovery
+
+The set of `length()`-supporting types is not advertised as a single predicate anywhere — assembled from `mcp__daslang__describe_type TypeDecl` (the `isXxx` method list) cross-referenced with the `def length(...)` overloads in `daslib/builtin.das` and the `def each(...)` overloads. Lambda iterables surfaced as a Copilot review finding on PR #2689.
+
+## Questions
+- Which TypeDecl predicates identify types where length(expr) is statically resolvable in daslang macros?
diff --git a/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md b/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md
new file mode 100644
index 0000000000..cc8f58dc2a
--- /dev/null
+++ b/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md
@@ -0,0 +1,36 @@
+---
+slug: why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain
+title: Why does `each(arr)` fail with "unsafe when not source of for-loop" outside a for, and what's the alternative in a linq chain?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+`each(arr)` returns an iterator that walks the array. Daslang's safety rules say that iterator is unsafe **unless it's directly consumed by a for-loop in the same scope** — passing it through `|>` chains, capturing it in a `let`, or handing it to a function argument all trip:
+
+```
+error[31013]: '__::builtin`each`...' is unsafe, when not source of the for-loop;
+              must be inside the 'unsafe' block
+```
+
+**Inside `_fold(...)`** the error doesn't fire because `_fold` is a macro that expands to a for-loop body where `each(arr)` IS the source. So `_fold(each(arr)._where(...).count())` compiles cleanly.
+
+**Outside a fold macro**, in a plain pipe chain, use the array directly — most `_<op>` call-macros (`AstCallMacro_LinqPred2`) accept both `iterator<T>` and `array<T>` for arg 0:
+
+| Doesn't work | Use instead |
+|---|---|
+| `let prices <- (each(arr) \|> _where(...) \|> _select_to_array(_.price))` (iterator outside `_fold`) | `let prices <- (arr \|> _where(...) \|> _select(_.price))` — array+macros chains as array; result is `array<T>`, no `_to_array` suffix needed |
+| `let c = each(cars)._join(each(dealers), ...)` inside `_fold` (two `each()`s, one not the chain source) | `let c = _fold(cars \|> _join(dealers, ..., ...) \|> count())` — pass arrays directly |
+| `let r = each(arr) \|> ...` outside any fold | wrap in `unsafe(each(arr))`, OR start the chain with `arr` directly and let the macro handle iterator promotion |
+
+**Heuristic:** if the chain ends in a `_fold(...)` / `_old_fold(...)` wrapper or a for-loop, `each(arr)` works as the source. If the chain produces a value (or array) that escapes the expression — a `let`, a function return, the second arg to a macro — drop the `each()` and pass the array directly.
+
+The lint at runtime points at the **specific** `each(arr)` call that escapes, so for multi-each chains (`_join`, `_zip`), check which side is the issue.
+
+## Questions
+- Why does `each(arr)` fail with "unsafe when not source of for-loop" outside a for, and what's the alternative in a linq chain?
+- error[31013] '__::builtin`each`' is unsafe — how to fix?
+- When can I use `each(arr)` in a linq pipe chain?
+
+## Questions
+- Why does `each(arr)` fail with "unsafe when not source of for-loop" outside a for, and what's the alternative in a linq chain?
diff --git a/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md b/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md
index efa990fc8a..9a9e930854 100644
--- a/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md
+++ b/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md
@@ -15,61 +15,80 @@ links: []
 [imgui_playwright] readiness gate FAILED
 ```
 
-(30s `wait_until_ready` timeout, then 120s popen drain timeout. External `curl http://localhost:9090/status` from a sibling shell returns 200 with proper status JSON throughout — only the popen parent's request loop can't see it.)
+External `curl http://localhost:9090/status` from a sibling shell returns 200 with proper status JSON throughout — only the popen parent's poll loop "can't see it". Reproduces on macOS and Linux; appears to NOT reproduce on Windows (which is the trap — see below).
 
 # Root cause
 
-`live/live_api` was required BEFORE `imgui_app + glfw/glfw_boost + opengl/* + glfw_live + opengl_live` somewhere in the requirer chain (usually a wrapper module like `imgui/imgui_harness`). The `[_macro] installing` in `live_api.das` calls `fork_debug_agent_context(@@debug_agent)` at compile time. If that fork happens before GLFW is initialized in the live runtime, the resulting LiveApiServer becomes unreachable from a popen parent on Windows.
+**`ref_time_ticks()` returns nanoseconds on POSIX, but the wait-loop math assumes microseconds.**
 
-Filed: [#2677](https://github.com/GaijinEntertainment/daScript/issues/2677). Distinct from #2675 (`ANY("*")` route shadowing).
+`src/hal/performance_time.cpp` defines `ref_time_ticks()` per platform:
 
-# Fix (mechanical)
+| Platform | Returns |
+|---|---|
+| Linux  | `tv_sec * 1e9 + tv_nsec` — **nanoseconds** |
+| macOS  | `clock_gettime_nsec_np(CLOCK_MONOTONIC_RAW)` — **nanoseconds** |
+| Windows | `QueryPerformanceCounter().QuadPart` — counter ticks, freq depends on hardware (often ~10 MHz, accidentally close to 1 MHz / microsecond scaling) |
 
-In the requirer module (yours or a wrapper you control), reorder requires so the **windowed backend stack comes first**:
+`imgui_playwright`'s `wait_until_ready` (and other deadline loops) used:
 
 ```das
-// Windowed backend FIRST (correctness, not aesthetics).
-require imgui
-require imgui_app
-require glfw/glfw_boost
-require opengl/opengl_boost
-require live/glfw_live
-require live/opengl_live
-
-// Live-host + boost-runtime stack AFTER.
-require live/live_api
-require live/live_commands
-require live/live_vars
-require live_host
-require imgui/imgui_live
-require imgui/imgui_boost_runtime
-require imgui/imgui_boost_v2
-require imgui/imgui_widgets_builtin
-require imgui/imgui_containers_builtin
-require imgui/imgui_visual_aids
+let deadline = ref_time_ticks() + int64(timeout_sec * 1000000.0f)
+while (ref_time_ticks() < deadline) {
+    GET("{base_url}/status") $(resp) { ... }
+    sleep(READY_POLL_INTERVAL_MS)
+}
 ```
 
-This mirrors the canonical order every pre-`imgui_harness` example/test used verbatim. Reordering is a no-op for visibility / re-export semantics — purely a workaround for the install-time ordering bug.
+That `* 1000000.0f` assumes ref-time is in microseconds. So:
+- **Linux/macOS**: a "30s" deadline is `30 * 1e6 = 30 million nanoseconds = 30 milliseconds`. Loop fires 0-1 polls and exits. The `connect 127.0.0.1:9090 failed!` line is the one in-flight libhv connect attempt timing out — server health is fine; the loop just budgeted itself out of existence.
+- **Windows**: QPC freq is hardware-dependent but on common runners works out near enough to 1 MHz that `* 1e6` lands in the "seconds" ballpark by accident, masking the bug.
+
+# The Windows-only "require order" workaround is misleading
+
+[#2677](https://github.com/GaijinEntertainment/daScript/issues/2677) and a prior version of this card blamed require-order — windowed-backend stack vs. live-host stack — claiming `[_macro] installing` in `live_api.das` calling `fork_debug_agent_context(@@debug_agent)` before GLFW init was the issue. That diagnosis was wrong. The reorder happened to nudge timings just enough on Windows for the (already-too-short) loop to occasionally win the race, which read as "fix". On POSIX, the same reorder changes nothing — the loop still exits in 30 ms regardless of require order.
+
+If you see code in `imgui_harness.das` carrying a `// NOTE on require ordering` comment about live_api needing to come after the windowed stack: that comment is load-bearing only by accident on Windows. The real fix is in the timing math.
+
+# Fix
+
+Replace any `ref_time_ticks() + int64(seconds * 1000000.0f)` deadline pattern with platform-correct math. Two options:
+
+```das
+// Option A — use the elapsed-microsec helper (always microseconds, all platforms)
+let t_start = ref_time_ticks()
+let timeout_us = int(timeout_sec * 1000000.0f)
+while (get_time_usec(t_start) < timeout_us) {
+    ...
+}
+
+// Option B — compute deadline in nanoseconds, on POSIX
+let deadline = ref_time_ticks() + int64(timeout_sec * 1000000000.0f)
+// (DON'T do this without a per-platform branch — breaks Windows)
+```
+
+**Option A is the right one.** `get_time_usec(reft)` is defined per-platform in `performance_time.cpp` and always returns microseconds. Audit any other `ref_time_ticks() + ... * 1000000.0f` patterns in your codebase the same way.
 
 # How to recognize this gotcha
 
 - Test hangs at `readiness gate FAILED` (not at `body did not converge` or similar).
-- External `curl` to `localhost:9090/status` works while the test hangs (proves the server is up — the popen parent specifically can't reach it).
-- Always reproduces — not a flaky timing issue.
-- ONLY triggers when run via `popen` (via `with_imgui_app` in `imgui_playwright`, or any `dastest` integration test). Direct `bin/Release/daslang-live.exe <script>.das` runs fine because there's no popen parent.
+- External `curl` to `localhost:9090/status` works while the test hangs.
+- Reproduces on macOS / Linux; "works" on Windows (deceptive — see above).
+- Suspect any deadline loop using `ref_time_ticks() + ... * 1e6` — that's the smoking gun.
 
-# Verification commands
+# Why this took a while to spot
 
-After reordering, the full dasImgui suite passes:
+The symptom looked exactly like a network or popen-inheritance bug:
+- libhv server is up (curl proves it)
+- The dastest process's libhv client errors with `connect 127.0.0.1:9090 failed!`
+- It's specific to the popen parent
 
-```bash
-bin/Release/daslang.exe dastest/dastest.das -- --test modules/dasImgui/tests/integration
-# 110 tests, 110 passed, 0 failed in ~500s
-```
+The actual chain was: the poll loop's deadline elapsed before the first GET even had a chance to retry. The single connect error in the log was libhv's first attempt timing out as the loop quit. Everything past that ("requires order", "fork_debug_agent_context", "libhv client init quirk on macOS") was downstream of misreading the symptom.
 
-# Why not "just fix it in daslang"
+# Verification
 
-The bug is in dasLiveHost or libhv's interaction with [_macro]-driven debug-agent install ordering. Filed as #2677 for triage, but the fix isn't trivial — for now reorder requires in the consumer/wrapper, document the ordering as load-bearing, and move on.
+After the fix, a clean local run of the dasImgui integration suite passes on macOS in seconds-per-test rather than 120s-test-timeout-per-test.
 
 ## Questions
 - why does my dastest integration test hang at "readiness gate FAILED" when external curl to /status works fine — is it a require-order issue in daslang-live?
+- on macOS / Linux, ref_time_ticks() returns nanoseconds — does any of my code assume microseconds?
+- what units does ref_time_ticks() return per platform?

From 371c6d72788b55ab32ad88d9180cad3ab0cc368e Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 16:21:36 -0700
Subject: [PATCH 10/18] mouse-data/docs: 5 cards from dasImgui PR #38 (CI
 matrix resurrection)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Five cards captured while landing dasImgui PR #38 — the dastest
integration suite running across the 3-OS GitHub Actions matrix.
Docs-only — no code changes.

  - imgui-playwright-windows-ci-16-post-libhv-stall — dasHV/libhv on
    GitHub-hosted windows-latest stalls after exactly 16 POST
    /command per subprocess. Empirically counted from
    DASLIVE_HV_LOG=stderr logs. Local Win11 unaffected. Workaround:
    1-POST polling helpers + Windows-only --exclude of 7 high-POST
    tests with a 4-call safety margin.

  - imgui-harness-headless-timeout-sec-cascade-guard — new
    --headless-timeout-sec=N CLI flag on the imgui_harness 5-helper
    surface. Playwright passes (test_timeout - 5) so a panicked test
    can't leave a zombie daslang-live holding port 9090
    (daslang's `finally` is skipped on panic). Cascade-prevention
    pattern generalizes to any spawned-subprocess-owning-a-port
    layout.

  - imgui-macos-configmacosxbehaviors-shortcut-is-super-not-ctrl —
    ImGui sets io.ConfigMacOSXBehaviors = true on macOS; the
    "shortcut key" becomes Super (Cmd), not Ctrl. Synth-IO tests
    must branch on snap?["io"]?["config_macos_behaviors"] and use
    ["Super"] mods on macOS. Surfaced after the cascade-fix made
    the symptom observable; Copilot diagnosed it via
    config_macos_behaviors snapshot extension.

  - why-does-daslang-live-s-post-shutdown-return-200-ok-but-the-
    subprocess-never-actually-exits-on-linux-macos — libhv v1.3.4
    `pathHandlers` is std::unordered_map; ANY("*") catch-all
    enumerates BEFORE specific paths on Linux libstdc++,
    intermittently on Windows MSVC. Every request hits the help
    handler, /shutdown's request_exit() never runs. Workaround
    landed via daslang PR #2688 (drops ANY, serves help from
    GET("/")). Upstream libhv bug unreported as of 2026-05-16.

  - why-does-my-lint-macro-fire-on-the-wrapper-module-that-
    legitimately-uses-the-forbidden-symbols-even-though-i-scope-
    visit-module — [lint_macro] runs PER MODULE during the require
    chain; getThisModule rebinds per per-module pass, so
    visit_module(prog.getThisModule) ALSO walks the wrapper that
    legitimately uses the forbidden symbols. Wrapper modules must
    carry the defensive opt-out (options _allow_xxx_calls = true),
    same convention as imgui_lint.das's _allow_imgui_legacy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 ...ness-headless-timeout-sec-cascade-guard.md | 57 ++++++++++++++
 ...osxbehaviors-shortcut-is-super-not-ctrl.md | 39 ++++++++++
 ...aywright-windows-ci-16-post-libhv-stall.md | 38 ++++++++++
 ...ess-never-actually-exits-on-linux-macos.md | 41 ++++++++++
 ...ymbols-even-though-i-scope-visit-module.md | 74 +++++++++++++++++++
 5 files changed, 249 insertions(+)
 create mode 100644 mouse-data/docs/imgui-harness-headless-timeout-sec-cascade-guard.md
 create mode 100644 mouse-data/docs/imgui-macos-configmacosxbehaviors-shortcut-is-super-not-ctrl.md
 create mode 100644 mouse-data/docs/imgui-playwright-windows-ci-16-post-libhv-stall.md
 create mode 100644 mouse-data/docs/why-does-daslang-live-s-post-shutdown-return-200-ok-but-the-subprocess-never-actually-exits-on-linux-macos.md
 create mode 100644 mouse-data/docs/why-does-my-lint-macro-fire-on-the-wrapper-module-that-legitimately-uses-the-forbidden-symbols-even-though-i-scope-visit-module.md

diff --git a/mouse-data/docs/imgui-harness-headless-timeout-sec-cascade-guard.md b/mouse-data/docs/imgui-harness-headless-timeout-sec-cascade-guard.md
new file mode 100644
index 0000000000..cbe4a813a3
--- /dev/null
+++ b/mouse-data/docs/imgui-harness-headless-timeout-sec-cascade-guard.md
@@ -0,0 +1,57 @@
+---
+slug: imgui-harness-headless-timeout-sec-cascade-guard
+title: How do I add a wall-clock self-exit timer to a daslang-live harness so a panicked test doesn't leave a zombie subprocess on the live-API port?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+**Problem**: daslang's `finally` block is skipped on panic. A test that does `with_imgui_app(F) $(d) { ... expect_value(...) ... }` and panics inside `expect_value`'s timeout never reaches the `/shutdown` POST that `with_imgui_app` would have sent in cleanup. The spawned `daslang-live` subprocess keeps running, holding port 9090. The next test that spawns errors out with "another instance of daslang-live is already running" (macOS) or hangs at drain for the full popen watchdog (Ubuntu) because its `/status` polls hit the zombie instead.
+
+**Fix shape**: give the subprocess a wall-clock self-exit timer. Even if the parent never sends `/shutdown`, the script self-exits before the popen parent gives up, the port releases, and the next test starts cleanly.
+
+**Implementation in dasImgui PR #38** (`widgets/imgui_harness.das` + `widgets/imgui_playwright.das`):
+
+1. New CLI flag `--headless-timeout-sec=N` parsed via clargs alongside `--headless-frames=N`:
+   ```daslang
+   let raw_timeout = find_flag_raw_value(args, "--headless-timeout-sec")
+   g_headless_max_uptime_sec = to_float(raw_timeout |> unwrap_or("0"))
+   ```
+   Default 0 = disabled (preserves standalone `daslang.exe foo.das -- --headless` usage).
+
+2. Wall-clock check inside `harness_begin_frame()`, right next to the existing `--headless-frames` cap:
+   ```daslang
+   let now = get_uptime()
+   if (g_headless_first_uptime < 0.0f) {
+       g_headless_first_uptime = now
+   }
+   if (g_headless_max_uptime_sec > 0.0f && (now - g_headless_first_uptime) >= g_headless_max_uptime_sec) {
+       print("[harness] headless timeout {g_headless_max_uptime_sec}s reached at uptime {now - g_headless_first_uptime}s — request_exit()\n")
+       request_exit()
+       return false
+   }
+   ```
+   The print is the **only** log line kept in the cleaned-up harness — it fires at most once per subprocess and only when the safety net actually trips. Healthy runs are silent.
+
+3. Playwright's `with_imgui_app_opt` appends `--headless-timeout-sec=(test_timeout_sec - 5)` to the spawned argv whenever `--headless` is forwarded. The −5 s margin gives the script time to finish the current frame, run `shutdown()`, and close the live-API port before the parent's popen watchdog fires:
+   ```daslang
+   if (playwright_wants_headless()) {
+       argv |> push("--")
+       argv |> push("--headless")
+       let harness_budget = test_timeout_sec - 5.0f
+       if (harness_budget > 5.0f) {
+           argv |> push("--headless-timeout-sec={harness_budget}")
+       }
+   }
+   ```
+
+**Why it works on the cascade**: even when `expect_value` panics inside the body block, the daslang-live subprocess continues running its update loop. The next `harness_begin_frame()` call (called every frame from `update()`) notices `uptime > budget`, calls `request_exit()`, and the main loop terminates cleanly via `while (!exit_requested())`. `shutdown()` runs, port 9090 releases, popen's parent reads EOF, exit code is 0 — no cascade for the next test.
+
+**Sizing rule**: set timeout less than the popen watchdog by enough margin to cover one frame + shutdown. 5 seconds is comfortable. If the popen watchdog is 120 s, harness timeout = 115 s.
+
+**Limitation**: only fires from `harness_begin_frame`. If the script's `update()` is stuck inside something that never returns to the main loop, harness timeout never gets a chance. This is the right tradeoff — a stuck `update()` is a different bug class (real deadlock), and popen still kills the process at its watchdog.
+
+**Cascade-guard pattern generalizes**: any long-running subprocess that owns a port (HTTP server, RPC endpoint, anything) and is spawned for a bounded test/check should have a wall-clock self-exit set slightly below the parent's kill-timeout. Cleanup-via-script always beats cleanup-via-SIGKILL.
+
+## Questions
+- How do I add a wall-clock self-exit timer to a daslang-live harness so a panicked test doesn't leave a zombie subprocess on the live-API port?
diff --git a/mouse-data/docs/imgui-macos-configmacosxbehaviors-shortcut-is-super-not-ctrl.md b/mouse-data/docs/imgui-macos-configmacosxbehaviors-shortcut-is-super-not-ctrl.md
new file mode 100644
index 0000000000..44a09202b3
--- /dev/null
+++ b/mouse-data/docs/imgui-macos-configmacosxbehaviors-shortcut-is-super-not-ctrl.md
@@ -0,0 +1,39 @@
+---
+slug: imgui-macos-configmacosxbehaviors-shortcut-is-super-not-ctrl
+title: Why does synth Ctrl+A do nothing on macOS but works on Linux/Windows in an ImGui InputText test?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+**ImGui sets `io.ConfigMacOSXBehaviors = true` automatically on macOS** (`#ifdef __APPLE__` in `ImGuiIO` ctor). When that flag is true, the **"shortcut key" is Super (Cmd), not Ctrl**.
+
+So on macOS:
+- `Ctrl+A` → "move to start of line" (text-editing convention)
+- `Cmd+A` → "select all"
+- `Ctrl+Backspace` → does not delete word
+- `Cmd+Backspace` → "delete to start of line"
+
+A synth-IO test that drives `Ctrl+A` to clear an InputText buffer **fails silently on macOS** — the assertion message will look like "NAME_INPUT.value == "" after Ctrl+A then Backspace" with the buffer still holding `"abc"` (because Ctrl+A moved cursor to start, then Backspace deleted nothing).
+
+**Detection from a playwright test**: snapshot `io.config_macos_behaviors` and branch the chord. dasImgui's `widgets/imgui_boost_runtime.das` exposes the flag via the `io_jv()` snapshot key `"config_macos_behaviors"`:
+
+```daslang
+let macos = snap0?["io"]?["config_macos_behaviors"] ?? false
+if (macos) {
+    post_command(d, "imgui_key_chord", JV((mods = ["Super"], key = "A")))
+} else {
+    post_command(d, "imgui_key_chord", JV((mods = ["Ctrl"], key = "A")))
+}
+```
+
+**Why we don't unconditionally use Super**: on Linux/Windows there's no Super-key binding for select-all in InputText; the wrong choice silently fails on those platforms instead. Branch on the actual `config_macos_behaviors` flag — works everywhere.
+
+Discovered in dasImgui PR #38 `test_click_then_ctrl_a_clears_input` failing only on macOS CI after the playwright cascade fix made other failures observable. Copilot diagnosed and fixed in commit `42b7292` + the IO-snapshot extension. Caveat: Copilot drafted the fix with `if macos { ... }` (gen1 syntax) — gen2 requires `if (macos) { ... }`, watch for that pattern when ferrying AI suggestions through.
+
+**Related ImGui flags worth knowing**:
+- `io.KeyCtrl` reflects either Ctrl OR (Cmd on macOS when `ConfigMacOSXBehaviors`) — the "shortcut" lookups go through `io.KeySuper` instead on macOS.
+- `ImGui::Shortcut(ImGuiMod_Ctrl | ImGuiKey_A)` automatically maps to Cmd on macOS — but `Shortcut(ImGuiMod_Super | ImGuiKey_A)` does NOT remap, only `ImGuiMod_Ctrl` is the platform-aware "primary modifier".
+
+## Questions
+- Why does synth Ctrl+A do nothing on macOS but works on Linux/Windows in an ImGui InputText test?
diff --git a/mouse-data/docs/imgui-playwright-windows-ci-16-post-libhv-stall.md b/mouse-data/docs/imgui-playwright-windows-ci-16-post-libhv-stall.md
new file mode 100644
index 0000000000..0b5afc9660
--- /dev/null
+++ b/mouse-data/docs/imgui-playwright-windows-ci-16-post-libhv-stall.md
@@ -0,0 +1,38 @@
+---
+slug: imgui-playwright-windows-ci-16-post-libhv-stall
+title: Why do imgui playwright tests hang at 120 seconds on Windows CI when they pass locally and on POSIX?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+**dasHV's libhv build on Windows CI stalls after exactly 16 POST /command connections per subprocess.** Local Windows works fine; only the GitHub-hosted `windows-latest` runner trips it. Discovered while resurrecting dasImgui integration tests in PR #38 (May 2026).
+
+**Symptom**: a test does `with_imgui_app` → spawns daslang-live → fast burst of 1 GET /status + 16 POST /command (all 200 OK in <300 ms), then the libhv event loop **stops accepting new connections**. The 17th HTTP request hangs ~60 s, the test body never makes progress, popen kills the subprocess at the 120 s watchdog (`DAS_POPEN_TIMEOUT = 0x7FFFFF01 = 2147483393`).
+
+**Confirm**: run with `DASLIVE_HV_LOG=stderr DASLIVE_HV_LOG_LEVEL=DEBUG` env vars. Count `[POST /command]=>[200 OK]` lines per subprocess pid in the captured stderr — if exactly 16, you've hit it. Healthy paths show many more.
+
+**Verified-locally counterproof**: same dasImgui suite under `D:\Work\daScript\bin\Release\daslang.exe ... dastest.das -- --test ... --headless` on Win11 box runs 96/96 in ~6 min, including the tests that hang on CI. So it's CI-runner-specific (Windows Server, different scheduler, different IOCP / TCP loopback tuning, possibly Defender). NOT a libhv bug per se — the upstream libhv code on master is byte-identical to v1.3.4 and works fine on `libDaScriptDyn` linkage everywhere except GitHub `windows-latest`.
+
+**Workarounds in dasImgui PR #38**:
+
+1. **Halve HTTP traffic in polling helpers**. Old idiom `wait_until(d, 240) $(var snap) { let s = post_command(d, "imgui_key_status", null); return !(s?["playing"] ?? true) }` does **2 HTTP requests per iteration** (snapshot inside `wait_until` + the inner `post_command`). New helpers `wait_for_key_idle(d, 4.0f)` and `wait_for_mouse_idle(d, 4.0f)` in `widgets/imgui_playwright.das` do **1 POST per iteration** (status only). Use them whenever you only need "playing == false", not a full snapshot.
+
+2. **Exclude high-POST tests on Windows-only** in `.github/workflows/tests.yml`. Conservative cutoff: any test estimated at ≥12 POSTs over its lifetime, leaves a 4-call safety margin under the 16-connection limit. Pattern:
+   ```yaml
+   EXTRA_EXCLUDES=""
+   if [ "${{ matrix.os }}" = "windows-latest" ]; then
+     EXTRA_EXCLUDES="--exclude inputs_drag --exclude inputs_numeric --exclude inputs_slider \
+                     --exclude inputs_color --exclude inputs_choice --exclude inputs_text \
+                     --exclude indexed_dynamic"
+   fi
+   ```
+
+**Heuristic for "POST count"**: each `set_value(...)` is 1 POST. Each `wait_for_payload_value(...)` / `wait_for_int_value(...)` / `wait_until { post_command }` is 1-3 POSTs depending on how fast the answer converges. Tests with ≥10 `set_value + wait` pairs typically exceed 16 POSTs.
+
+**Pre-existing "finally skipped on panic" cascade**: a panicking test in the body block was already known to skip `with_imgui_app`'s `/shutdown` cleanup → zombie subprocess on port 9090 → next test cascades. PR #38 also added `--headless-timeout-sec=N` self-exit to `imgui_harness` (see related card `imgui-harness-headless-timeout-sec-cascade-guard`) so a panicked subprocess can't outlive the popen watchdog.
+
+**Proper fix is upstream** in daslang's libhv build for Windows IOCP. Track + re-include all 7 excluded tests when it lands.
+
+## Questions
+- Why do imgui playwright tests hang at 120 seconds on Windows CI when they pass locally and on POSIX?
diff --git a/mouse-data/docs/why-does-daslang-live-s-post-shutdown-return-200-ok-but-the-subprocess-never-actually-exits-on-linux-macos.md b/mouse-data/docs/why-does-daslang-live-s-post-shutdown-return-200-ok-but-the-subprocess-never-actually-exits-on-linux-macos.md
new file mode 100644
index 0000000000..e05565324f
--- /dev/null
+++ b/mouse-data/docs/why-does-daslang-live-s-post-shutdown-return-200-ok-but-the-subprocess-never-actually-exits-on-linux-macos.md
@@ -0,0 +1,41 @@
+---
+slug: why-does-daslang-live-s-post-shutdown-return-200-ok-but-the-subprocess-never-actually-exits-on-linux-macos
+title: Why does daslang-live's POST /shutdown return 200 OK but the subprocess never actually exits on Linux/macOS?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+**Root cause is in vendored libhv (v1.3.4), not daslang.** `live_api.das` registers an `ANY("*")` catch-all alongside specific routes like `GET /status` and `POST /shutdown`. libhv's `Any(path)` (`include/hv/HttpService.h:268-277`) expands internally to one `Handle("METHOD", path, h)` per HTTP verb on the same path key.
+
+libhv stores all `(path -> method_handlers)` pairs in `pathHandlers`, declared as **`std::unordered_map<std::string, ...>`** at `include/hv/HttpService.h:105`. `HttpService::GetRoute` (`http/server/HttpService.cpp:72-127`) iterates that map in **container order** and takes the first wildcard or path match it finds.
+
+Because `std::unordered_map`'s iteration order is implementation- and bucket-defined, `"*"` happens to enumerate BEFORE specific paths like `/status` on Linux libstdc++ (deterministic), and intermittently on Windows MSVC depending on rehash timing. When that happens, every request — including `POST /shutdown` — hits the wildcard handler, which returns the help JSON (200 OK). The real `/shutdown` handler that calls `request_exit()` is never invoked. The main loop spins forever, and the parent times out the subprocess with `popen_exit_code = DAS_POPEN_TIMEOUT = 0x7FFFFF01`.
+
+**How to confirm you're hitting this:** while the daslang-live subprocess is running, `curl -s http://127.0.0.1:9090/status`. If the body contains `"endpoints"` (the help JSON), the wildcard is winning. If it returns real status JSON with `"has_error"` / `"paused"` / `"fps"` fields, routing works.
+
+**Workaround (landed as daslang PR #2688, `bbatkin/live-api-drop-any-catchall`):** drop `ANY("*")` from `live_api.das` and serve the help JSON from a specific `GET("/")` handler instead. `/` is an exact-match path, so it never collides with `/status` etc. — the ordering hazard is gone regardless of how libhv stores routes. Unknown paths return libhv's default 404; programmatic clients (playwright, MCP `live_*` tools) never relied on the catch-all.
+
+**Follow-up work (deferred until libhv-side fix lands):**
+1. File the libhv upstream issue — unreported per 2026-05-16 web research; we'd be first.
+2. Wire libhv's built-in `errorHandler` hook through dasHV (`WebServer_Adapter` setter), rewrite `live_api.das` to drop the `GET("/")` workaround and use `errorHandler` for the help fallback. That's the idiomatic shape on libhv's terms.
+3. Address Copilot review comments on PR #2688 as part of that rewrite: (a) `live_api.das:25` module-level `//!` endpoint list should mention `GET /`; (b) `live_api.das:217` "curl :9090/" example — actually valid curl shorthand for localhost, but rephrasing to `curl http://localhost:9090/` is friendlier.
+4. Consider a lint rule against `ANY("*")` registered alongside specific paths.
+
+**Upstream fix status (researched 2026-05-16):**
+- **Unreported** — exhaustive search of `ithewei/libhv` issues for `route|wildcard|Any|GetRoute|HttpService|pathHandlers` returned zero hits on this bug. We'd be the first to file.
+- **Not fixed in master** — libhv's `http/server/HttpService.cpp` last touched 2023-07-29 (before v1.3.4); same `std::unordered_map<std::string, ...>` for `pathHandlers` at `http/server/HttpService.h:107`; same first-match-wins loop in `GetRoute` at line ~72. v1.3.4 IS the latest release (2025-10-25).
+- **Known wart upstream**: `docs/PLAN.md` lists "Path router: optimized matching via trie?" as an open question — maintainers know the router is suboptimal, just haven't prioritized it.
+- **Industry comparison**: Crow uses a trie; cpp-httplib preserves registration order so static paths always precede regex catch-alls. This is a libhv-specific defect, not industry norm.
+- **Documented behavior**: libhv docs (`README.md`, `docs/API.md`, `docs/cn/HttpServer.md`) say NOTHING about route precedence or wildcard semantics. Behavior is implementation-defined.
+
+**Better script-side option (not yet exercised): `errorHandler` hook.** libhv has a built-in fallback handler that fires after the processor chain runs and status ≥ 400 with empty body (`http/server/HttpService.h:133` + `http/server/HttpHandler.cpp:476-486`). Official example wiring at `examples/httpd/router.cpp:15` + `examples/httpd/handler.cpp:46`. Drop `ANY("*")`, assign `errorHandler` instead — `/status` / `/shutdown` win deterministically, everything else falls through to the help dump. **dasHV does NOT currently expose `errorHandler`**, so this needs C++ glue (a setter on `WebServer_Adapter`) before live_api can use it. PR #2688's `GET("/")` workaround is the interim fix.
+
+**Also worth knowing**: official libhv examples never register bare `Any("*")` alongside specific paths — they use prefix wildcards like `GET("/wildcard*", ...)` (see `examples/httpd/router.cpp:70`). The maintainers don't exercise our usage pattern, so the bug has presumably never surfaced for upstream users.
+
+**Why Windows seems to work most of the time:** MSVC's `std::unordered_map` bucket layout for the specific paths daslang-live registers happens to enumerate specific paths before `"*"`. It's not a guarantee — different route counts (e.g. adding more endpoints) trigger rehashes and can flip the order. Not "Windows-correct, Linux-broken" — both are subject to the same hazard.
+
+**Symptom to watch for in CI logs:** `popen_exit_code=2147483393` (0x7FFFFF01 = `DAS_POPEN_TIMEOUT`) on POSIX, with the playwright drain step taking the full `DEFAULT_TEST_TIMEOUT_SEC` (120s default). Subprocess gets SIGKILLed at the watchdog.
+
+## Questions
+- Why does daslang-live's POST /shutdown return 200 OK but the subprocess never actually exits on Linux/macOS?
diff --git a/mouse-data/docs/why-does-my-lint-macro-fire-on-the-wrapper-module-that-legitimately-uses-the-forbidden-symbols-even-though-i-scope-visit-module.md b/mouse-data/docs/why-does-my-lint-macro-fire-on-the-wrapper-module-that-legitimately-uses-the-forbidden-symbols-even-though-i-scope-visit-module.md
new file mode 100644
index 0000000000..50e1854e6b
--- /dev/null
+++ b/mouse-data/docs/why-does-my-lint-macro-fire-on-the-wrapper-module-that-legitimately-uses-the-forbidden-symbols-even-though-i-scope-visit-module.md
@@ -0,0 +1,74 @@
+---
+slug: why-does-my-lint-macro-fire-on-the-wrapper-module-that-legitimately-uses-the-forbidden-symbols-even-though-i-scope-visit-module
+title: Why does my [lint_macro] fire on the wrapper module that legitimately uses the forbidden symbols, even though I scope visit_module to prog.getThisModule?
+created: 2026-05-16
+last_verified: 2026-05-16
+links: []
+---
+
+# `[lint_macro]` runs per-module — wrapper modules need a defensive opt-out
+
+`[lint_macro]` doesn't fire once per *program* compile. It fires once per *module* compile in the require chain. When `foo.das` requires `imgui_harness` which requires `imgui_harness_lint`, the lint runs:
+
+1. Once with `prog.getThisModule()` = `imgui_harness` (during the harness's own compile pass).
+2. Once with `prog.getThisModule()` = the consumer (during `foo.das`'s compile pass).
+
+The `visit_module(prog, adapter, prog.getThisModule)` line scopes the walk correctly each time — but the visited module for pass 1 is the *harness*, whose body legitimately calls into the very surface the lint is supposed to forbid (because that's what the wrapper does).
+
+## Symptom
+
+A clean consumer file fails to compile with diagnostics pointing INTO the wrapper module:
+
+```
+error[50503]: HARNESS001: glfw_live::live_create_window is forbidden ...
+modules/dasImgui/widgets/imgui_harness.das:102:4
+    live_create_window(title, width, height)
+    ^^^^^^^^^^^^^^^^^^
+```
+
+You stare at the consumer file and there's no `live_create_window` in it. The diagnostic is fired from the *wrapper's* compile pass, then propagates up as the consumer's compile failure.
+
+## Fix — defensive opt-out at the top of the wrapper
+
+Every wrapper / sibling module that the lint touches transitively needs to carry the per-file escape:
+
+```das
+options gen2
+options _allow_glfw_calls = true        // <-- defensive opt-out
+
+module imgui_harness shared public
+
+require imgui_app           // legitimately uses GLFW/GL — that's the point
+require glfw/glfw_boost
+require live/glfw_live
+// ...
+```
+
+This is the same pattern `widgets/imgui_lint.das` documents in its header:
+
+> Wrapper modules under `widgets/` also carry `options _allow_imgui_legacy = true` defensively so they compile cleanly when targeted directly (MCP compile_check, lint, format_file).
+
+It's not just for `compile_check` / `lint` direct-targeting. It also kicks in during the consumer's require chain — every time the wrapper's module is compiled as a dependency.
+
+## Why this is non-obvious
+
+`visit_module(prog, adapter, prog.getThisModule)` reads like "walk only the consumer's functions" — and it does, for the consumer's pass. But the same macro registration also fires for the wrapper's pass, where `getThisModule` *is* the wrapper. The scoping is correct; the surprise is that the lint executes N times across the require chain, not once at the top.
+
+## Don't try to fix in the visitor
+
+Adding a "skip if `current_function._module != this_module`" filter (where `this_module = prog.getThisModule` captured in `apply()`) does *nothing* — the per-pass `getThisModule` already equals `current_function._module` during the wrapper's pass. The wrapper IS thisModule from its own pass's perspective.
+
+The opt-out at the wrapper's source IS the fix. Match the existing convention.
+
+## Reference
+
+- `modules/dasImgui/widgets/imgui_lint.das` header — has the canonical paragraph documenting the convention for `_allow_imgui_legacy`.
+- `modules/dasImgui/widgets/imgui_harness.das` line ~5 — carries `_allow_glfw_calls = true` defensively for the same reason.
+
+## Questions
+- Why does a [lint_macro] error appear at a line in the wrapper module instead of my consumer file?
+- Do I need to filter calls inlined from a different module in my [lint_macro] AST visitor?
+- Why does my [lint_macro] run multiple times for one consumer compile?
+
+## Questions
+- Why does my [lint_macro] fire on the wrapper module that legitimately uses the forbidden symbols, even though I scope visit_module to prog.getThisModule?

From 6aa21105c30ffc57b1ddcddd8fc50dbaf16bcb0d Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 16:27:56 -0700
Subject: [PATCH 11/18] has_sideeffects: blacklist mutation ops, trust func
 flags over op allowlist
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two correctness fixes from Copilot review on PR #2691:

1. **Mutation operators bypass.** `++`, `--`, and compound-assignment ops
   (`+=`, `-=`, `*=`, `/=`, `%=`, `&=`, `|=`, `^=`, `<<=`, `>>=`,
   `&&=`, `||=`, `^^=`) fall through the `is_safe_op{1,2}` allowlist
   check, but the fallback through `func_has_sideeffects` only catches
   them if the resolved C++ builtin happens to carry the right flag. If
   the builtin missed the flag (or there is no resolved builtin), `x++`
   classifies as pure. Add `is_mutation_op1` / `is_mutation_op2`
   blacklists invoked up front, before any flag check. Note the AST
   op-string convention: postfix `++` / `--` are `"+++"` / `"---"`.

2. **User operator overloads bypass.** When `e2.op` is in the safe
   allowlist (`+`, `-`, `*`, `==`, etc.), the old code skipped the
   `func_has_sideeffects(e2.func)` check entirely. A user-defined
   `def operator +(...)` on a custom type would then classify as pure
   regardless of side effects. Restructure: `func != null` → trust the
   func flags (non-builtin defaults to unsafe via `func_has_sideeffects`);
   `func == null` → fall back to op-name allowlist for partial-folding
   artifacts.

Tests:
- `test_postfix_increment_unsafe`, `test_postfix_decrement_unsafe`
- `test_user_op_overload_unsafe` (defines `operator +` on a private
  struct with a global-counter side effect)

CI fix: register `tests/macro_boost/` in `tests/aot/CMakeLists.txt`
(missed when the test directory was created in the parent commit).
Mirrors the `tests/linq/` pattern: a test-files glob + a module-files
list for the `_has_sideeffects_probe.das` helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 daslib/macro_boost.das                     | 56 ++++++++++++++++------
 tests/aot/CMakeLists.txt                   | 24 ++++++++++
 tests/macro_boost/test_has_sideeffects.das | 43 +++++++++++++++++
 3 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/daslib/macro_boost.das b/daslib/macro_boost.das
index 1fa5452b6c..bac8e2c2ff 100644
--- a/daslib/macro_boost.das
+++ b/daslib/macro_boost.das
@@ -219,27 +219,36 @@ def public has_sideeffects(expr : Expression?) : bool {
         return false
     }
     // Function-call-shaped expressions: ExprCall (regular call) and ExprOp1/ExprOp2/ExprOp3
-    // (operators, which also resolve to a function). All carry a resolved `func` field
-    // when typing completed. But the typer sometimes leaves `func` null on operator
-    // expressions (e.g. after partial constant folding), so we also keep an op-name
-    // allowlist for the common pure operators on workhorse types — that bypasses
-    // resolution-timing artifacts. `/` and `%` stay UNSAFE (div-by-zero panic; design
-    // decision). Compound-assignment ops are not in the allowlist (mutation).
+    // (operators, which also resolve to a function). Two-layer check:
+    //
+    //   1. Mutation ops (`++`, `--`, `+=`, `-=`, …) are unconditionally unsafe —
+    //      blacklisted up front, regardless of how the resolved builtin happens to be
+    //      flagged. Catches builtins that the C++ side forgot to mark with
+    //      `knownSideEffects`/`unsafeOperation`.
+    //   2. Trust `func.flags` when `func != null` — covers user-defined operator
+    //      overloads (e.g. `struct Foo { def operator +(...) }`), which fall through
+    //      `func_has_sideeffects` as non-builtin → unsafe. Fall back to the op-name
+    //      allowlist only when `func == null` (typer left it unresolved, e.g. after
+    //      partial constant folding). `/` and `%` stay UNSAFE (div-by-zero panic;
+    //      design decision).
     //
     // `is`/`as` on handled types is EXACT-rtti (see CLAUDE.md), so each shape needs its
     // own branch — can't cast ExprOp2 to ExprCallFunc even though the C++ class inherits.
     if (expr is ExprOp1) {
         let e1 = expr as ExprOp1
-        if (!is_safe_op1(e1.op) && func_has_sideeffects(e1.func)) return true
+        // func != null → trust func flags (catches user overloads); func == null → fall
+        // back to op-name allowlist (handles partial-folding artifacts). Mutation ops
+        // are unconditionally unsafe (in case a C++ builtin missed the side-effect flag).
+        if (is_mutation_op1(e1.op)
+                || (e1.func != null && func_has_sideeffects(e1.func))
+                || (e1.func == null && !is_safe_op1(e1.op))) return true
         return has_sideeffects(e1.subexpr)
     }
     if (expr is ExprOp2) {
         let e2 = expr as ExprOp2
-        // Unsafe: division/modulo (div-by-zero panic, design decision); or op not in the
-        // safe allowlist AND the resolved func indicates side effects. The allowlist also
-        // bypasses func==null artifacts from partial folding.
-        if (e2.op == "/" || e2.op == "%"
-                || (!is_safe_op2(e2.op) && func_has_sideeffects(e2.func))) return true
+        if (is_mutation_op2(e2.op) || e2.op == "/" || e2.op == "%"
+                || (e2.func != null && func_has_sideeffects(e2.func))
+                || (e2.func == null && !is_safe_op2(e2.op))) return true
         return has_sideeffects(e2.left) || has_sideeffects(e2.right)
     }
     if (expr is ExprOp3) {
@@ -270,17 +279,36 @@ def private func_has_sideeffects(f : Function?) : bool {
 [macro_function]
 def private is_safe_op1(op : das_string) : bool {
     //! Unary operators that are pure on workhorse types — no overflow trap, no mutation.
-    //! Excludes `++` / `--` (mutation).
+    //! Excludes `++` / `--` (handled by is_mutation_op1).
     return op == "-" || op == "!" || op == "~" || op == "+"
 }
 
 [macro_function]
 def private is_safe_op2(op : das_string) : bool {
     //! Binary operators that are pure on workhorse types. Excludes `/`, `%` (div-by-zero
-    //! panic — design decision) and all compound-assignment ops (mutation).
+    //! panic — design decision) and all compound-assignment ops (handled by is_mutation_op2).
     return (op == "+" || op == "-" || op == "*"
         || op == "==" || op == "!=" || op == "<" || op == "<=" || op == ">" || op == ">="
         || op == "&" || op == "|" || op == "^" || op == "<<" || op == ">>"
         || op == "&&" || op == "||")
 }
 
+[macro_function]
+def private is_mutation_op1(op : das_string) : bool {
+    //! Unary operators that mutate their operand. Unconditionally unsafe — bypasses any
+    //! flag check on the resolved builtin (in case the C++ side forgot to mark it).
+    //! `++` / `--` are prefix; `+++` / `---` are the daslang AST op-strings for postfix
+    //! increment/decrement (the trailing-plus / trailing-minus naming).
+    return op == "++" || op == "--" || op == "+++" || op == "---"
+}
+
+[macro_function]
+def private is_mutation_op2(op : das_string) : bool {
+    //! Compound-assignment operators (mutate the left operand). Same unconditional-unsafe
+    //! treatment as is_mutation_op1.
+    return (op == "+=" || op == "-=" || op == "*=" || op == "/=" || op == "%="
+        || op == "&=" || op == "|=" || op == "^="
+        || op == "<<=" || op == ">>="
+        || op == "&&=" || op == "||=" || op == "^^=")
+}
+
diff --git a/tests/aot/CMakeLists.txt b/tests/aot/CMakeLists.txt
index 3d08dcdbcb..b19f27d842 100644
--- a/tests/aot/CMakeLists.txt
+++ b/tests/aot/CMakeLists.txt
@@ -270,6 +270,17 @@ FILE(GLOB AOT_MACRO_CALL_FILES RELATIVE ${PROJECT_SOURCE_DIR} CONFIGURE_DEPENDS
 # by the actual test files and don't need standalone AOT entries.
 list(FILTER AOT_MACRO_CALL_FILES EXCLUDE REGEX "/_")
 
+# AOT for macro_boost test files
+FILE(GLOB AOT_MACRO_BOOST_FILES RELATIVE ${PROJECT_SOURCE_DIR} CONFIGURE_DEPENDS "tests/macro_boost/*.das")
+# Exclude the call_macro probe helper (prefixed with `_`); it's required transitively
+# by the actual test file.
+list(FILTER AOT_MACRO_BOOST_FILES EXCLUDE REGEX "/_")
+
+# Macro_boost test module files (probe call_macro required transitively by tests)
+SET(AOT_MACRO_BOOST_MODULE_FILES
+    tests/macro_boost/_has_sideeffects_probe.das
+)
+
 # AOT for match test files
 FILE(GLOB AOT_MATCH_FILES RELATIVE ${PROJECT_SOURCE_DIR} CONFIGURE_DEPENDS "tests/match/*.das")
 
@@ -547,6 +558,14 @@ add_custom_target(test_aot_macro_call)
 SET(MACRO_CALL_AOT_GENERATED_SRC)
 DAS_AOT("${AOT_MACRO_CALL_FILES}" MACRO_CALL_AOT_GENERATED_SRC test_aot_macro_call daslang)
 
+add_custom_target(test_aot_macro_boost)
+SET(MACRO_BOOST_AOT_GENERATED_SRC)
+DAS_AOT("${AOT_MACRO_BOOST_FILES}" MACRO_BOOST_AOT_GENERATED_SRC test_aot_macro_boost daslang)
+
+add_custom_target(test_aot_macro_boost_modules)
+SET(MACRO_BOOST_MODULES_AOT_GENERATED_SRC)
+DAS_AOT_LIB("${AOT_MACRO_BOOST_MODULE_FILES}" MACRO_BOOST_MODULES_AOT_GENERATED_SRC test_aot_macro_boost_modules daslang)
+
 add_custom_target(test_aot_match)
 SET(MATCH_AOT_GENERATED_SRC)
 DAS_AOT("${AOT_MATCH_FILES}" MATCH_AOT_GENERATED_SRC test_aot_match daslang)
@@ -680,6 +699,8 @@ SOURCE_GROUP_FILES("aot generated" JSON_AOT_GENERATED_SRC)
 SOURCE_GROUP_FILES("aot generated" LINQ_AOT_GENERATED_SRC)
 SOURCE_GROUP_FILES("aot generated" LINQ_MODULES_AOT_GENERATED_SRC)
 SOURCE_GROUP_FILES("aot generated" MACRO_CALL_AOT_GENERATED_SRC)
+SOURCE_GROUP_FILES("aot generated" MACRO_BOOST_AOT_GENERATED_SRC)
+SOURCE_GROUP_FILES("aot generated" MACRO_BOOST_MODULES_AOT_GENERATED_SRC)
 SOURCE_GROUP_FILES("aot generated" MATCH_AOT_GENERATED_SRC)
 SOURCE_GROUP_FILES("aot generated" MATH_AOT_GENERATED_SRC)
 SOURCE_GROUP_FILES("aot generated" MATH_MODULES_AOT_GENERATED_SRC)
@@ -748,6 +769,8 @@ add_executable(test_aot ${DAS_DASCRIPT_MAIN_SRC}
     ${LINQ_AOT_GENERATED_SRC}
     ${LINQ_MODULES_AOT_GENERATED_SRC}
     ${MACRO_CALL_AOT_GENERATED_SRC}
+    ${MACRO_BOOST_AOT_GENERATED_SRC}
+    ${MACRO_BOOST_MODULES_AOT_GENERATED_SRC}
     ${MATCH_AOT_GENERATED_SRC}
     ${MATH_AOT_GENERATED_SRC}
     ${MATH_MODULES_AOT_GENERATED_SRC}
@@ -799,6 +822,7 @@ ADD_DEPENDENCIES(test_aot libDaScriptAot
     test_aot_jobque test_aot_json test_aot_jsonrpc
     test_aot_linq test_aot_linq_modules
     test_aot_macro_call
+    test_aot_macro_boost test_aot_macro_boost_modules
     test_aot_match
     test_aot_math test_aot_math_modules test_aot_module_tests
     test_aot_option
diff --git a/tests/macro_boost/test_has_sideeffects.das b/tests/macro_boost/test_has_sideeffects.das
index 7651eeb91f..68dfefed74 100644
--- a/tests/macro_boost/test_has_sideeffects.das
+++ b/tests/macro_boost/test_has_sideeffects.das
@@ -159,6 +159,49 @@ def test_string_builder_unsafe_part(t : T?) {
     t |> equal(_test_has_sideeffects("hello {side_effect_fn(_x)}"), true)
 }
 
+// ── Mutation operators — must be unsafe regardless of resolved-builtin flags ──
+//
+// `++`/`--` (ExprOp1) and compound-assignment ops (`+=`/`-=`/… ExprOp2) mutate
+// their operand. has_sideeffects blacklists these up front so a builtin
+// mistakenly missing `knownSideEffects`/`unsafeOperation` can't classify them
+// as pure. Covers Copilot review concern from PR #2691.
+
+[test]
+def test_postfix_increment_unsafe(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(_x ++), true)
+}
+
+[test]
+def test_postfix_decrement_unsafe(t : T?) {
+    var _x = 5
+    t |> equal(_test_has_sideeffects(_x --), true)
+}
+
+// ── User-defined operator overload — must be unsafe (non-builtin func) ───
+//
+// Overloads of `+`, `*`, etc. on user types can carry arbitrary side effects.
+// The op-name allowlist must NOT bypass the func-flag check when func != null
+// — otherwise a custom `def operator +` slips through as pure. Covers Copilot
+// review concern from PR #2691.
+
+var g_op_overload_hits = 0
+
+struct private SideEffectingNumber {
+    v : int
+}
+
+def operator +(a : SideEffectingNumber; b : int) : SideEffectingNumber {
+    g_op_overload_hits ++
+    return SideEffectingNumber(v = a.v + b)
+}
+
+[test]
+def test_user_op_overload_unsafe(t : T?) {
+    var _c = SideEffectingNumber(v = 5)
+    t |> equal(_test_has_sideeffects(_c + 1), true)
+}
+
 // ── Conservative-unsafe cases — daslang-generic helpers fall through ─────
 //
 // `length`, `key_exists`, etc. are defined as daslang generics in builtin.das

From 0d842c9400b6a24f0673a1b4a75c5468609a3735 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 16:35:25 -0700
Subject: [PATCH 12/18] Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---
 ...-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md b/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md
index cc8f58dc2a..b0371ef2b7 100644
--- a/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md
+++ b/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md
@@ -25,7 +25,7 @@ error[31013]: '__::builtin`each`...' is unsafe, when not source of the for-loop;
 
 **Heuristic:** if the chain ends in a `_fold(...)` / `_old_fold(...)` wrapper or a for-loop, `each(arr)` works as the source. If the chain produces a value (or array) that escapes the expression — a `let`, a function return, the second arg to a macro — drop the `each()` and pass the array directly.
 
-The lint at runtime points at the **specific** `each(arr)` call that escapes, so for multi-each chains (`_join`, `_zip`), check which side is the issue.
+The compiler error points at the **specific** `each(arr)` call that escapes, so for multi-each chains (`_join`, `_zip`), check which side is the issue.
 
 ## Questions
 - Why does `each(arr)` fail with "unsafe when not source of for-loop" outside a for, and what's the alternative in a linq chain?

From c24e4b5dc26d5fe52bfb8231d082d9b98684fc6b Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Sat, 16 May 2026 23:37:12 +0000
Subject: [PATCH 13/18] docs(mouse-data): update ref_time_ticks Windows row and
 narrative for post-PR #2685 ns normalization

Agent-Logs-Url: https://github.com/GaijinEntertainment/daScript/sessions/97224dec-45d1-4968-a3dd-8e5f37274983

Co-authored-by: borisbat <272689+borisbat@users.noreply.github.com>
---
 ...to-status-works-fine-is-it-a-require-order.md | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md b/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md
index 9a9e930854..ff75ae379b 100644
--- a/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md
+++ b/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md
@@ -15,11 +15,11 @@ links: []
 [imgui_playwright] readiness gate FAILED
 ```
 
-External `curl http://localhost:9090/status` from a sibling shell returns 200 with proper status JSON throughout — only the popen parent's poll loop "can't see it". Reproduces on macOS and Linux; appears to NOT reproduce on Windows (which is the trap — see below).
+External `curl http://localhost:9090/status` from a sibling shell returns 200 with proper status JSON throughout — only the popen parent's poll loop "can't see it". Reproduces on macOS and Linux; pre-PR #2685 appeared to NOT reproduce on Windows (which was the trap — raw QPC tick math accidentally masked the bug; post-PR #2685 Windows also returns nanoseconds and shows the same failure).
 
 # Root cause
 
-**`ref_time_ticks()` returns nanoseconds on POSIX, but the wait-loop math assumes microseconds.**
+**`ref_time_ticks()` returns nanoseconds on all platforms (post-PR #2685), but the wait-loop math assumed microseconds.**
 
 `src/hal/performance_time.cpp` defines `ref_time_ticks()` per platform:
 
@@ -27,7 +27,7 @@ External `curl http://localhost:9090/status` from a sibling shell returns 200 wi
 |---|---|
 | Linux  | `tv_sec * 1e9 + tv_nsec` — **nanoseconds** |
 | macOS  | `clock_gettime_nsec_np(CLOCK_MONOTONIC_RAW)` — **nanoseconds** |
-| Windows | `QueryPerformanceCounter().QuadPart` — counter ticks, freq depends on hardware (often ~10 MHz, accidentally close to 1 MHz / microsecond scaling) |
+| Windows | QPC ticks converted to **nanoseconds** via `ticks × (1e9 / freq)`; fast path at 10 MHz QPF = 100 ns/tick multiply (PR #2685). Pre-PR #2685: raw `QuadPart` ticks, ~10 MHz, accidentally close to μs scaling. |
 
 `imgui_playwright`'s `wait_until_ready` (and other deadline loops) used:
 
@@ -41,7 +41,8 @@ while (ref_time_ticks() < deadline) {
 
 That `* 1000000.0f` assumes ref-time is in microseconds. So:
 - **Linux/macOS**: a "30s" deadline is `30 * 1e6 = 30 million nanoseconds = 30 milliseconds`. Loop fires 0-1 polls and exits. The `connect 127.0.0.1:9090 failed!` line is the one in-flight libhv connect attempt timing out — server health is fine; the loop just budgeted itself out of existence.
-- **Windows**: QPC freq is hardware-dependent but on common runners works out near enough to 1 MHz that `* 1e6` lands in the "seconds" ballpark by accident, masking the bug.
+- **Windows (post-PR #2685)**: `ref_time_ticks()` now also returns nanoseconds on Windows, so the same 30 ms budget applies — the bug is equally visible.
+- **Windows (pre-PR #2685)**: raw QPC `QuadPart` ticks at ~10 MHz worked out near enough to 1 MHz that `* 1e6` landed in the "seconds" ballpark by accident, masking the bug on Windows CI.
 
 # The Windows-only "require order" workaround is misleading
 
@@ -61,9 +62,10 @@ while (get_time_usec(t_start) < timeout_us) {
     ...
 }
 
-// Option B — compute deadline in nanoseconds, on POSIX
+// Option B — compute deadline in nanoseconds (safe on all platforms after PR #2685)
 let deadline = ref_time_ticks() + int64(timeout_sec * 1000000000.0f)
-// (DON'T do this without a per-platform branch — breaks Windows)
+// (On pre-PR #2685 Windows builds, ref_time_ticks() returned raw QPC ticks, so this
+//  would be wrong there. Prefer Option A if you need to support older builds.)
 ```
 
 **Option A is the right one.** `get_time_usec(reft)` is defined per-platform in `performance_time.cpp` and always returns microseconds. Audit any other `ref_time_ticks() + ... * 1000000.0f` patterns in your codebase the same way.
@@ -72,7 +74,7 @@ let deadline = ref_time_ticks() + int64(timeout_sec * 1000000000.0f)
 
 - Test hangs at `readiness gate FAILED` (not at `body did not converge` or similar).
 - External `curl` to `localhost:9090/status` works while the test hangs.
-- Reproduces on macOS / Linux; "works" on Windows (deceptive — see above).
+- Reproduces on macOS / Linux; on current code also reproduces on Windows (post-PR #2685 normalization). Pre-PR #2685, Windows masked the bug via raw QPC tick math — see the platform table above.
 - Suspect any deadline loop using `ref_time_ticks() + ... * 1e6` — that's the smoking gun.
 
 # Why this took a while to spot

From 6711570d2d1f5bebeada40b46484b28a63d68803 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 17:51:03 -0700
Subject: [PATCH 14/18] daslang-live: add -project_root flag (mirror
 daslang.exe)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

daslang.exe accepts -project_root <path> to override the script-dir
fallback used to scan <root>/modules/*/.das_module. daslang-live was
silently lacking the same flag — it only had -project (project file).

Workflow that surfaced this: running a tutorial via daslang-live from
inside a module clone at D:\DASPKG\dasImgui where the script path was
"../../../../examples/tutorial/X.das". With no -project_root, the
fallback at lines 722-724 sets project_root = examples/tutorial — which
has no modules/ underneath, so `require imgui` fails. Same scenario
works for daslang.exe with -project_root .

3-line patch: arg-parse arm + help line. The project_root static is
already declared at line 37 and consumed at line 727; the script-dir
fallback still kicks in when the flag isn't passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 utils/daslang-live/main.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/utils/daslang-live/main.cpp b/utils/daslang-live/main.cpp
index 126f82f043..e5cb20c10e 100644
--- a/utils/daslang-live/main.cpp
+++ b/utils/daslang-live/main.cpp
@@ -568,6 +568,7 @@ static void print_help() {
     tout << "daslang-live - live-reloading application host for daScript\n";
     tout << "Usage: daslang-live [options] <script.das> [-- script arguments]\n";
     tout << "  -project <file>    - project file (.das_project)\n";
+    tout << "  -project_root <path> - project root (parent of modules/, default: script's dir)\n";
     tout << "  -dasroot <path>    - override DAS_ROOT\n";
     tout << "  -cwd               - change working directory to script's folder\n";
     tout << "  -v1syntax          - use v1 syntax (default: v2)\n";
@@ -654,6 +655,8 @@ int main(int argc, char * argv[]) {
         string arg = argv[i];
         if (arg == "-project" && i + 1 < argc) {
             projectFile = argv[++i];
+        } else if (arg == "-project_root" && i + 1 < argc) {
+            project_root = argv[++i];
         } else if (arg == "-dasroot" && i + 1 < argc) {
             setDasRoot(argv[++i]);
         } else if (arg == "-cwd") {

From 78405f0f892a1dcdad557d166cd175bdbfc03dda Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 18:01:14 -0700
Subject: [PATCH 15/18] site/blog: roadmap update + are-we-there-yet post

---
 ...6-05-16-roadmap-update-are-we-there-yet.md |  6 ++
 site/blog/_posts/are-we-there-yet.md          | 57 +++++++++++++++++++
 site/blog/build_blog.py                       |  2 +-
 3 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 site/_news/2026-05-16-roadmap-update-are-we-there-yet.md
 create mode 100644 site/blog/_posts/are-we-there-yet.md

diff --git a/site/_news/2026-05-16-roadmap-update-are-we-there-yet.md b/site/_news/2026-05-16-roadmap-update-are-we-there-yet.md
new file mode 100644
index 0000000000..353584f0ce
--- /dev/null
+++ b/site/_news/2026-05-16-roadmap-update-are-we-there-yet.md
@@ -0,0 +1,6 @@
+---
+date: 2026-05-16
+tag: blog
+title: Roadmap updated. Are we there yet? See blog for details.
+link: /blog/are-we-there-yet.html
+---
diff --git a/site/blog/_posts/are-we-there-yet.md b/site/blog/_posts/are-we-there-yet.md
new file mode 100644
index 0000000000..a90999a272
--- /dev/null
+++ b/site/blog/_posts/are-we-there-yet.md
@@ -0,0 +1,57 @@
+---
+title: Are we there yet?
+date: 2026-05-16 17:01:24
+tags:
+    - daScript
+    - Claude
+---
+
+Yes and No.
+
+<!-- more -->
+
+Like a good little shaman I've been praying the technical debt away. Prompts are new incantations. Prompts are new prayers. Mine been about quality gates, tests, tools, cleanup, refactor, tools, documentation, tutorials, integration, better syntax, and tools.
+
+I managed to squeeze some features along the way, but far too few to my liking. Strudel is awesome. LINQ is amazing. It will get better. But not yet.
+
+We prayed the smart_ptr away from the ast - and now everyone can write macros care-free. There are more demons left to slay. The big ones are
+
+* 32-bit arrays. yes, in 2026. tables too. but not for long. i'll try not to break your code too much. u'll get length32, count32, 64-bit index; u get to fix some warnings here and there.
+* proper generic resolution for fixed array dimensions. u probably are not going to notice (hi Profelis). one day it goes from it just works to it just works with less generics, and thats it.
+* lambda should be copyable. delete becomes unsafe, but thats ok - we have GC. we might even squeeze clone for it - we already do it for the job-que.
+
+There will be 0.6.3 and likely 0.6.4 before there will be 0.7.
+
+But did u know
+
+* daslang -exe outputs standalone binary. and daspkg can package it
+* did u even know there is daspkg, its like npm but for das
+* there is an MCP server, which helps with das. and cpp. like if u work with cpp - it can lookup symbols and do the works. fast.
+* there is also 'blind mouse' MCP server, which is very experimental - but Claude says it helps already
+* there is lint. lint + Claude is synergetic. lint + Claude + Boris is something I'm figuring out
+* there is detect-dup and find-dup. detect-dup is very local. when Claude writes a lot of Code, it says 'and these 3 functions are too similar' (btw, Boris wrote two of them - one in 2022). find-dup is how u do it on a large scale.
+* dastest and benchctl. one is tests. and benchmarks. the other is to remind u that performance regression is a regression
+* llvm support is getting better. way better. loop annotations are now first class thing.
+
+But did u try
+
+* daslang-live and the dasLiveHost. Claude loves it. MCP talks to it. My dasIMGUI tutorial recorder uses it. Its getting stable. Don't restart.
+* LINQ. because its awesome
+* dasSQLITE. because its excellent. there will be more on how it plays with LINQ and DECS.
+* Strudel and the rest of dasAudio. It has SF2, which happens to be excellent. It has MIDI. Strudel can do SF2 and HRTF like it belongs (because it does).
+* dasIMGUI. well almost, u could not have tried that - and maybe for another week u should only peek. but its shaping. Claude can talk to it out of the box - don't write an MCP to your editor, just be.
+
+So what about that 0.7
+
+Easy. Technical debt free Boris sits there, fixes bugs, and works on performance - while addressing everyone's troubles.
+
+* Compiler will get faster. Somewhat. Its getting hard
+* Interpreter will get faster - I got more tricks (I know, hard to imagine)
+* JIT will get faster
+* GC will get faster
+* LINQ will get even faster
+* DECS will get ... u know the drill
+
+I gonna get few more things hooked up along the way, but Vulkan is for sure. We have DAS->GLSL macro, and we have LLVM. I just don't know when. There are basic bindings already.
+
+So much to pray for. Like a good shaman me.
diff --git a/site/blog/build_blog.py b/site/blog/build_blog.py
index e399532634..8baf264626 100644
--- a/site/blog/build_blog.py
+++ b/site/blog/build_blog.py
@@ -553,7 +553,7 @@ def main():
     (out / 'files' / 'blog.json').write_text(
         json.dumps(blog_data, indent=2), encoding='utf-8')
 
-    print(f"built {len(posts)} posts, {len(news)} news entries → {out}/")
+    print(f"built {len(posts)} posts, {len(news)} news entries -> {out}/")
 
 
 if __name__ == '__main__':

From 9b5b9170c09fb26e625bf6e475dd213fc7c36304 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 18:07:31 -0700
Subject: [PATCH 16/18] daslang-live: accept -project-root (dashed) alias for
 symmetry
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Mirrors daslang.exe's silent alias at utils/daScript/main.cpp:190 where
both -project_root and -project-root are accepted by the same arm.
Help text still only shows the underscore form (same convention as
daslang.exe — alias is silent).

Addresses PR #2693 review feedback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 utils/daslang-live/main.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utils/daslang-live/main.cpp b/utils/daslang-live/main.cpp
index e5cb20c10e..0b9ca0896a 100644
--- a/utils/daslang-live/main.cpp
+++ b/utils/daslang-live/main.cpp
@@ -655,7 +655,7 @@ int main(int argc, char * argv[]) {
         string arg = argv[i];
         if (arg == "-project" && i + 1 < argc) {
             projectFile = argv[++i];
-        } else if (arg == "-project_root" && i + 1 < argc) {
+        } else if ((arg == "-project_root" || arg == "-project-root") && i + 1 < argc) {
             project_root = argv[++i];
         } else if (arg == "-dasroot" && i + 1 < argc) {
             setDasRoot(argv[++i]);

From 380455c34cd3a5e60534de07e16775edc0b24b2c Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 18:13:28 -0700
Subject: [PATCH 17/18] Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---
 site/blog/_posts/are-we-there-yet.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/blog/_posts/are-we-there-yet.md b/site/blog/_posts/are-we-there-yet.md
index a90999a272..b1fef6ac41 100644
--- a/site/blog/_posts/are-we-there-yet.md
+++ b/site/blog/_posts/are-we-there-yet.md
@@ -29,7 +29,7 @@ But did u know
 * there is an MCP server, which helps with das. and cpp. like if u work with cpp - it can lookup symbols and do the works. fast.
 * there is also 'blind mouse' MCP server, which is very experimental - but Claude says it helps already
 * there is lint. lint + Claude is synergetic. lint + Claude + Boris is something I'm figuring out
-* there is detect-dup and find-dup. detect-dup is very local. when Claude writes a lot of Code, it says 'and these 3 functions are too similar' (btw, Boris wrote two of them - one in 2022). find-dup is how u do it on a large scale.
+* there is detect-dupe and find-dupe. detect-dupe is very local. when Claude writes a lot of Code, it says 'and these 3 functions are too similar' (btw, Boris wrote two of them - one in 2022). find-dupe is how u do it on a large scale.
 * dastest and benchctl. one is tests. and benchmarks. the other is to remind u that performance regression is a regression
 * llvm support is getting better. way better. loop annotations are now first class thing.
 

From c8ad4504e0f2072e9acdaf9aef4f1940e2ead389 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Sat, 16 May 2026 18:41:57 -0700
Subject: [PATCH 18/18] examples/graphics: modernize Fourier viz to dasImgui
 boost-v2 + harness
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The example pre-dated PR #33's default-on imgui_lint, PR #38's headless
harness, and PR #39's daslang theme. It hand-rolled its own imgui_app()
shim, called raw Begin/End/Checkbox/InputFloat with addr-of dance, and
set FontGlobalScale=1.0 with a "BBATKIN: note - my monitor is HUGE"
comment.

Rewrite onto the current surface:

* require imgui/imgui_harness — single import for the boost-v2 widget
  stack, daslang theme + JetBrains Mono via live_imgui_init, harness
  lifecycle (harness_init / harness_begin_frame / harness_new_frame /
  harness_shutdown).
* window(SETUP_WIN, ...) { ... } container instead of raw Begin/End.
* edit_checkbox / edit_input_float / edit_input_float2 against
  safe_addr(global) instead of unsafe(addr(field)). Collapsed C0.x/C0.y
  → edit_input_float2(safe_addr(c0), ...) — same applies to C1/C-1/C2/C-2.
* text("...") narrative widget instead of raw Text().
* separator(SEP_C0/C1/C2) instead of raw Separator().
* Drop FontGlobalScale shim — theme picks a sensible 14px default.

Per-frame loop splits harness_end_frame manually so custom OpenGL draws
between glClear and ImGui_ImplOpenGL3_RenderDrawData. options
_allow_glfw_calls = true opts out of imgui_harness_lint for the GLFW/GL
calls the example legitimately owns (live_get_framebuffer_size,
glViewport, glClear, draw_fourier()).

To run: `daspkg install` from examples/graphics/ to fetch dasImgui into
the local modules/, then `daslang.exe -project_root .
furier_opengl_imgui_example.das`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../graphics/furier_opengl_imgui_example.das  | 243 ++++++++----------
 1 file changed, 112 insertions(+), 131 deletions(-)

diff --git a/examples/graphics/furier_opengl_imgui_example.das b/examples/graphics/furier_opengl_imgui_example.das
index 4063ccea03..e25f05f7c7 100644
--- a/examples/graphics/furier_opengl_imgui_example.das
+++ b/examples/graphics/furier_opengl_imgui_example.das
@@ -1,15 +1,22 @@
 options gen2
+options indenting = 4
 options persistent_heap = true
+options _allow_glfw_calls = true
 
-require imgui_app
-require glfw/glfw_boost
-require imgui/imgui_boost
+require imgui/imgui_harness
 require opengl/opengl_boost
-require daslib/math_boost
+require glfw/glfw_boost
+require live/glfw_live
+require imgui_app
 require daslib/safe_addr
+require daslib/math_boost
 require daslib/strings
 
-var window : GLFWwindow?
+// Custom 3D + ImGui hybrid: the harness gives us the boost-v2 widget surface
+// (window/edit_*/text) and the daslang theme via live_imgui_init, but we
+// split harness_end_frame manually because our Fourier viz draws between
+// the GL clear and ImGui's RenderDrawData. _allow_glfw_calls = true opts
+// out of imgui_harness_lint for the OpenGL/GLFW calls we own.
 
 let NGRAPH = 1000
 
@@ -17,149 +24,124 @@ var {
     rotating : bool = true
     rps : float = 0.1f
     tt : float = 0.0f
-// c0
     c0 : float2 = float2(-0.2f, 0.05f)
-// cp1, cn1
     enable_1 : bool = true
     cp1 : float2 = float2(0.27f, 0.0f)
     cn1 : float2 = float2(0.0f, 0.10f)
-// cp2, cn2
     enable_2 : bool = false
     cp2 : float2 = float2(-0.07f, 0.08f)
     cn2 : float2 = float2(0.03f, -0.02f)
 }
 
-def imgui_app(title : string; blk : block) {
-    if (glfwInit() == 0) {
-        panic("can't init glfw")
-    }
-    glfwInitOpenGL(3, 3, false, false)
-    window = glfwCreateWindow(1024, 1024, title, null, null)
-    if (window == null) {
-        panic("can't create window")
-    }
-    glfwMakeContextCurrent(window)
-    glfwSwapInterval(1)
-    CreateContext(null)
-    var io & = unsafe(GetIO())
-    io.FontGlobalScale = 1.0   // BBATKIN: note - my monitor is HUGE
-    StyleColorsDark(null)
-    ImGui_ImplGlfw_InitForOpenGL(window, true)
-    ImGui_ImplOpenGL3_Init("#version 330")
-    var clear_color = float4(0.85f, 0.85f, 0.90f, 1.00f)
-    create_gl_objects()
-    while (glfwWindowShouldClose(window) == 0) {
-        glfwPollEvents()
-        ImGui_ImplOpenGL3_NewFrame()
-        ImGui_ImplGlfw_NewFrame()
-        invoke(blk)
-        var display_w, display_h : int
-        glfwGetFramebufferSize(window, display_w, display_h)
-        glViewport(0, 0, display_w, display_h)
-        glClearColor(clear_color.x, clear_color.y, clear_color.z, clear_color.w)
-        glClear(GL_COLOR_BUFFER_BIT)
-        let time = rotating ? glfwGetTime() * double(rps) % 1.0lf : double(tt)
-        let t = float(time)
-        // compute vectors
-        let p0 = c0
-        let pp1 = mul_complex(cp1, rot_complex(1.0 * t * 2.0 * PI))
-        let pn1 = mul_complex(cn1, rot_complex(-1.0 * t * 2.0 * PI))
-        let pp2 = mul_complex(cp2, rot_complex(2.0 * t * 2.0 * PI))
-        let pn2 = mul_complex(cn2, rot_complex(-2.0 * t * 2.0 * PI))
-        // c0
-        draw_arrow(float2(0.0f, 0.0f), p0, float3(1.0f, 0.0f, 0.0f))
-        if (enable_1) {
-            // cp1
-            var p = p0
-            draw_circle(p, length(cp1), float3(0.0f, 0.0f, 0.0f))
-            draw_arrow(p, pp1, float3(1.0f, 0.0f, 0.0f))
-            p += pp1
-            // cn1
-            draw_circle(p, length(cn1), float3(0.0f, 0.0f, 0.0f))
-            draw_arrow(p, pn1, float3(1.0f, 0.0f, 0.0f))
-            p += pn1
-            if (enable_2) {
-                // cp2
-                draw_circle(p, length(cp2), float3(0.0f, 0.0f, 0.0f))
-                draw_arrow(p, pp2, float3(1.0f, 0.0f, 0.0f))
-                p += pp2
-                // cn2
-                draw_circle(p, length(cn2), float3(0.0f, 0.0f, 0.0f))
-                draw_arrow(p, pn2, float3(1.0f, 0.0f, 0.0f))
-                p += pn2
-            }
-        }
-        // graph
-        draw_graph(float3(1.0f, 1.0f, 0.0f))
-        // and done
-        ImGui_ImplOpenGL3_RenderDrawData(GetDrawData())
-        glfwMakeContextCurrent(window)
-        glfwSwapBuffers(window)
-    }
-    ImGui_ImplOpenGL3_Shutdown()
-    ImGui_ImplGlfw_Shutdown()
-    DestroyContext(null)
-    glfwDestroyWindow(window)
-    glfwTerminate()
-}
-
 def angle(c : float2) {
     let C = normalize(c)
     return atan2(C.y, C.x)
 }
 
-def format_angle(c : float2) {
+def format_angle(c : float2) : string {
     return fmt(":.2f", angle(c) / PI) + "*PI"
 }
 
-def format_length(c : float2) {
+def format_length(c : float2) : string {
     return fmt(":.2f", length(c))
 }
 
-def editCoefficientsWindow(p_open : bool? implicit) {
-    if (!Begin("Setup vectors and circles", p_open, ImGuiWindowFlags.None)) {
-        End()
-        return
+def edit_coefficients_window() {
+    window(SETUP_WIN, (text = "Setup vectors and circles", closable = false,
+                       flags = ImGuiWindowFlags.None)) {
+        text("Speed of PHI, rotations per second")
+        edit_checkbox(safe_addr(rotating), (id = "ROTATING", text = "Rotating"))
+        edit_input_float(safe_addr(tt), (id = "TIME", text = "time", step = 0.1f))
+        edit_input_float(safe_addr(rps), (id = "RPS", text = "RPS", step = 0.1f))
+        separator(SEP_C0)
+        text("C0 A={format_length(c0)} PHI={format_angle(c0)}")
+        edit_input_float2(safe_addr(c0), (id = "C0", text = "C0 (x,y)"))
+        separator(SEP_C1)
+        edit_checkbox(safe_addr(enable_1), (id = "ENABLE_1", text = "Enable C1,C-1"))
+        if (enable_1) {
+            text("C1  A={format_length(cp1)} PHI={format_angle(cp1)}")
+            edit_input_float2(safe_addr(cp1), (id = "CP1", text = "C1 (x,y)"))
+            text("C-1 A={format_length(cn1)} PHI={format_angle(cn1)}")
+            edit_input_float2(safe_addr(cn1), (id = "CN1", text = "C-1 (x,y)"))
+            separator(SEP_C2)
+            edit_checkbox(safe_addr(enable_2), (id = "ENABLE_2", text = "Enable C2,C-2"))
+            if (enable_2) {
+                text("C2  A={format_length(cp2)} PHI={format_angle(cp2)}")
+                edit_input_float2(safe_addr(cp2), (id = "CP2", text = "C2 (x,y)"))
+                text("C-2 A={format_length(cn2)} PHI={format_angle(cn2)}")
+                edit_input_float2(safe_addr(cn2), (id = "CN2", text = "C-2 (x,y)"))
+            }
+        }
     }
-    Text("Speed of PHI, rotations per second")
-    Checkbox("Rotating", unsafe(addr(rotating)))
-    InputFloat("time", unsafe(addr(tt)), 0.1f)
-    InputFloat("RPS", unsafe(addr(rps)), 0.1f)
-    Separator()
-    Text("C0 A={format_length(c0)} PHI={format_angle(c0)}")
-    InputFloat("C0.x", unsafe(addr(c0.x)), 0.1f)
-    InputFloat("C0.y", unsafe(addr(c0.y)), 0.1f)
-    Separator()
-    Checkbox("Enable C1,C-1", unsafe(addr(enable_1)))
+}
+
+def draw_fourier() {
+    let time = rotating ? glfwGetTime() * double(rps) % 1.0lf : double(tt)
+    let t = float(time)
+    let p0 = c0
+    let pp1 = mul_complex(cp1, rot_complex(1.0 * t * 2.0 * PI))
+    let pn1 = mul_complex(cn1, rot_complex(-1.0 * t * 2.0 * PI))
+    let pp2 = mul_complex(cp2, rot_complex(2.0 * t * 2.0 * PI))
+    let pn2 = mul_complex(cn2, rot_complex(-2.0 * t * 2.0 * PI))
+    draw_arrow(float2(0.0f, 0.0f), p0, float3(1.0f, 0.0f, 0.0f))
     if (enable_1) {
-        Text("C1  A={format_length(cp1)} PHI={format_angle(cp1)}")
-        InputFloat("C1.x", unsafe(addr(cp1.x)), 0.1f)
-        InputFloat("C1.y", unsafe(addr(cp1.y)), 0.1f)
-        Text("C-1 A={format_length(cn1)} PHI={format_angle(cn1)}")
-        InputFloat("C-1.x", unsafe(addr(cn1.x)), 0.1f)
-        InputFloat("C-1.y", unsafe(addr(cn1.y)), 0.1f)
-        Separator()
-        Checkbox("Enable C2,C-2", unsafe(addr(enable_2)))
+        var p = p0
+        draw_circle(p, length(cp1), float3(0.0f, 0.0f, 0.0f))
+        draw_arrow(p, pp1, float3(1.0f, 0.0f, 0.0f))
+        p += pp1
+        draw_circle(p, length(cn1), float3(0.0f, 0.0f, 0.0f))
+        draw_arrow(p, pn1, float3(1.0f, 0.0f, 0.0f))
+        p += pn1
         if (enable_2) {
-            Text("C2  A={format_length(cp2)} PHI={format_angle(cp2)}")
-            InputFloat("C2.x", unsafe(addr(cp2.x)), 0.1f)
-            InputFloat("C2.y", unsafe(addr(cp2.y)), 0.1f)
-            Text("C-2 A={format_length(cn2)} PHI={format_angle(cn2)}")
-            InputFloat("C-2.x", unsafe(addr(cn2.x)), 0.1f)
-            InputFloat("C-2.y", unsafe(addr(cn2.y)), 0.1f)
+            draw_circle(p, length(cp2), float3(0.0f, 0.0f, 0.0f))
+            draw_arrow(p, pp2, float3(1.0f, 0.0f, 0.0f))
+            p += pp2
+            draw_circle(p, length(cn2), float3(0.0f, 0.0f, 0.0f))
+            draw_arrow(p, pn2, float3(1.0f, 0.0f, 0.0f))
+            p += pn2
         }
     }
-    End()
+    draw_graph(float3(1.0f, 1.0f, 0.0f))
+}
+
+[export]
+def init() {
+    harness_init("Vectors & Circles", 1024, 1024)
+    create_gl_objects()
+}
+
+[export]
+def update() {
+    if (!harness_begin_frame()) return
+    harness_new_frame()
+
+    edit_coefficients_window()
+
+    var display_w, display_h : int
+    live_get_framebuffer_size(display_w, display_h)
+    glViewport(0, 0, display_w, display_h)
+    glClearColor(0.85f, 0.85f, 0.90f, 1.0f)
+    glClear(GL_COLOR_BUFFER_BIT)
+    draw_fourier()
+
+    end_of_frame()
+    Render()
+    ImGui_ImplOpenGL3_RenderDrawData(GetDrawData())
+    live_end_frame()
 }
 
 [export]
-def main {
-    var f = 0.0
-    imgui_app("Vectors & Circles") {
-        NewFrame()
-        editCoefficientsWindow(null)
-        Render()
+def shutdown() {
+    harness_shutdown()
+}
+
+[export]
+def main() {
+    init()
+    while (!exit_requested()) {
+        update()
     }
+    shutdown()
 }
 
 var @in @location = 0 v_position : float2
@@ -214,7 +196,7 @@ def draw_arrow(origin : float2; c : float2; color : float3) {
     f_color = color
     v_offset = origin
     v_rot = c
-    v_scale = float2(1., 1.)
+    v_scale = float2(1.0f, 1.0f)
     vs_main_bind_uniform(program)
     fs_main_bind_uniform(program)
     glBindVertexArray(vao_arrow)
@@ -260,16 +242,16 @@ def create_gl_objects {
         glBindVertexArray(vao_arrow)
         glGenBuffers(1, safe_addr(vbo_arrow))
         glBindBuffer(GL_ARRAY_BUFFER, vbo_arrow)
-        var vertices <- [Vertex(
-            xy = float2(0.0f, 0.0f)), Vertex(
-            xy = float2(1.0f, 0.0f)), Vertex(
-            xy = float2(0.95f, 0.025f)), Vertex(
-            xy = float2(0.95f, -0.025f)), Vertex(
-            xy = float2(1.0f, 0.0f)
-        )]
+        var vertices <- [
+            Vertex(xy = float2(0.0f, 0.0f)),
+            Vertex(xy = float2(1.0f, 0.0f)),
+            Vertex(xy = float2(0.95f, 0.025f)),
+            Vertex(xy = float2(0.95f, -0.025f)),
+            Vertex(xy = float2(1.0f, 0.0f))
+        ]
         glBufferData(GL_ARRAY_BUFFER, vertices, GL_STATIC_DRAW)
         bind_vertex_buffer(null, type<Vertex>)
-    // graph
+        // graph
         glGenVertexArrays(1, safe_addr(vao_graph))
         glBindVertexArray(vao_graph)
         glGenBuffers(1, safe_addr(vbo_graph))
@@ -280,7 +262,6 @@ def create_gl_objects {
 }
 
 def mul_complex(a, b : float2) {
-    // (a.x+i*a.y)*(b.x+i*b.y) = a.x*b.x - a.y*b.y + i*(a.x*b.y + a.y*b.x)
     return float2(a.x * b.x - a.y * b.y, a.x * b.y + a.y * b.x)
 }
 
@@ -288,7 +269,7 @@ def rot_complex(phi : float) {
     return float2(cos(phi), sin(phi))
 }
 
-def compute_fn(t : float) {// 0..1
+def compute_fn(t : float) {
     let p0 = c0
     let pp1 = mul_complex(cp1, rot_complex(1.0 * t * 2.0 * PI))
     let pn1 = mul_complex(cn1, rot_complex(-1.0 * t * 2.0 * PI))
@@ -312,4 +293,4 @@ def compute_graph {
         vertices |> push(Vertex(xy = p))
     }
     glBufferData(GL_ARRAY_BUFFER, vertices, GL_DYNAMIC_DRAW)
-}
\ No newline at end of file
+}