From 5b10e644ebaf8ca2d2f25550e193deb991193bc7 Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 10:33:01 -0700 Subject: [PATCH 01/18] linq_fold: Phase 2A loop planner (where|select array + counter lanes) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces _fold's comprehension emitter with a planner that walks the chain and emits a plain for-loop inside invoke($block, $src). Two terminator lanes: - array lane: [_where*][_select?] → loop + push_clone (identity) or emplace-of-bound-projection (workhorse choice made at macro time from the projection's _type.isWorkhorseType, no runtime static_if). - counter lane: same intermediates + _count → counter loop with `n++`. Chained _where|_where fuse into a single && predicate; chained _select|_select fall through (needs ExprRef2Value-aware substitution, deferred to Phase 2B). Anything outside the two lanes (_select|_where, _sum, _min, _max, _first, _any, _all, _long_count, _order, _distinct, _take, _skip, _zip, _reverse, ...) returns the raw chain unfolded — no dispatch to _old_fold or fold_linq_default. _old_fold and fold_linq_default are untouched; the comprehension contract now lives solely on _old_fold (10 AST tests retargeted; 8 new AST tests + 6 behavioral tests cover the new loop emission). Benchmark deltas (100K, INTERP, ns/op per element): count_aggregate (where|count): 5 → 5 parity chained_where (where|where|count): 17 → 8 2.1× faster select_count (select|count): 15 → 2 7.5× faster to_array_filter (where|select): 11 → 13 ~18% slower vs comprehension Out-of-scope shapes regress to m3 (plain linq) — accepted as the forcing function for Phase 2B (sum/min/max/first/any/all + chained selects + take/skip). Co-Authored-By: Claude Opus 4.7 (1M context) --- benchmarks/sql/LINQ.md | 30 ++- benchmarks/sql/select_count.das | 73 ++++++ daslib/linq_fold.das | 143 +++++++++- tests/linq/test_linq_fold_ast.das | 419 ++++++++++++++++++++++-------- 4 files changed, 554 insertions(+), 111 deletions(-) create mode 100644 benchmarks/sql/select_count.das diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md index 5a956110fe..afca309bce 100644 --- a/benchmarks/sql/LINQ.md +++ b/benchmarks/sql/LINQ.md @@ -22,8 +22,9 @@ See `~/.claude/plans/keen-hopping-balloon.md` for the long-form plan. |---|---|---| | 0 | Rename `_fold` → `_old_fold` in linq_boost; extract `_fold` and `_old_fold` into new `daslib/linq_fold.das` module; `linq_boost` `require linq_fold public` for re-export | ✅ done | | 1 | Benchmark suite: 24 files under `benchmarks/sql/`, each 4-way (m1 `_sql` / m3 plain linq / m3f_old `_old_fold` / m3f `_fold`) at 100K rows; baseline numbers captured | ✅ done | -| 2 | Splice planner + initial operators (`count`, `sum`, `to_array`, `where` with literal-lambda inlining); pattern tests for "spliced" vs "fell back" | ⏳ next | -| 3+ | Per-operator splice PRs: `select`, terminal aggregates with early-exit (`first`, `any`, `all`, `min`, `max`, `average`), `take`/`skip`/chained `where`, then buffer-required ops (`distinct`, `sort`, `groupby`, `zip`, `join`) | ⏳ | +| 2A | Loop planner — `_fold` emits explicit for-loops for `[where_*][select?]` (array lane) and `[where_*][select?] |> count` (counter lane); anything else falls through unfolded. No comprehensions, no dispatch back to `_old_fold`. | ✅ done | +| 2B | Aggregate accumulators: `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`. Also `take`/`skip` in counter/array lane and chained-`_select|_select` fusion (needs `ExprRef2Value`-aware projection substitution) | ⏳ next | +| 3+ | Buffer-required operators: `distinct`, `sort`, `reverse`, `groupby`, `zip`, `join`. Once we go array, we stay array | ⏳ | | 4 | Final coverage pass + docs; full 4-way comparison table refresh; parity-test sweep | ⏳ | ## Baselines (100K rows, INTERP mode) @@ -69,7 +70,30 @@ Notation: `—` means the variant is not applicable for this benchmark (operator - **m1 vs m3** shows the SQLite-vs-in-memory-LINQ cost gap. SQL wins on `indexed_lookup` (b-tree) and on sorted-take patterns (engine partial-sort + LIMIT). Arrays win on raw aggregates where the SQL overhead exceeds the in-memory work. - **m3 vs m3f_old** shows what the *current* `_fold` macro already achieves. Big wins on the patterns it explicitly recognizes (`where+count` 6×, `where+select+to_array` ~4×, `chained_where+count` 2.6×). Negligible difference where it falls through to the default emitter. -- **m3f vs m3f_old** is the target of Phase 2+. Currently identical by construction. Each PR in the splice series adds a splice path for one operator family and updates this table with the new ratio. +- **m3f vs m3f_old** is the target of Phase 2+. Each PR in the splice series adds a path for one operator family and updates this table with the new ratio. + +## Phase 2A — Loop planner (2026-05-16) + +`_fold` now emits explicit for-loops for two narrow shape families instead of comprehensions. Anything outside scope falls through unfolded to raw linq (no dispatch to `_old_fold` or `fold_linq_default`). + +**In scope:** `[where_*][select?]` (array lane) and `[where_*][select?] |> count` (counter lane). Chained `_where|_where|...` fuses via `&&`; single `_select` composes; chained `_select|_select` falls through (needs ExprRef2Value-aware substitution, deferred to Phase 2B). + +**Out of scope (falls through):** `_select|_where`, `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`, `_order`, `_distinct`, `_take`, `_skip`, `_zip`, `_reverse`, etc. + +### Phase 2A deltas (100K rows, INTERP) + +| Benchmark | Shape | m3f_old | m3f (Phase 2A) | Delta | +|---|---|---:|---:|---| +| count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) | +| chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) | +| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane ignores projection; no array materialization) | +| to_array_filter | `where → select → to_array` | 11 | 13 | ~18% slower (explicit loop vs comprehension lowering) | + +Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch. + +### Why `to_array_filter` regressed + +Comprehensions `[for (it in src) where p; expr]` lower through the compiler's dedicated `ExprArrayComprehension` path, which appears to compose more aggressively with array growth than an emitted-by-macro explicit loop with `static_if (is_workhorse) var val = expr; arr.emplace(val)`. The 18% gap is small relative to the 2-7× wins elsewhere; Phase 2B can profile and tune (likely pre-reserving the result array or switching to `push` for workhorse). ## Operator-coverage checklist (parity tests) diff --git a/benchmarks/sql/select_count.das b/benchmarks/sql/select_count.das new file mode 100644 index 0000000000..63c0e5cc9a --- /dev/null +++ b/benchmarks/sql/select_count.das @@ -0,0 +1,73 @@ +options gen2 +options persistent_heap + +require _common public + +// _select |> count — projection followed by counter. The projection has no effect on count +// semantics, but on the array path m3 materializes the projected array before counting. +// Phase-2A `_fold` recognizes the counter lane and emits a bare-loop counter that ignores +// the projection entirely (no allocation). `_old_fold` lacks a [select, count] pattern in +// g_foldSeq so it falls to the default nested-pass form (pass_0 = select(...); count(pass_0)) +// — materializing the same way m3 does. + +def run_m1(b : B?; n : int) { + with_sqlite(":memory:") $(db) { + fixture_db(db, n) + b |> run("m1_sql/{n}", n) { + let c = _sql(db |> select_from(type) |> count()) + if (c == 0) { + b->failNow() + } + } + } +} + +def run_m3(b : B?; n : int) { + let arr <- fixture_array(n) + b |> run("m3_array/{n}", n) { + let c = arr |> _select(_.price * 2) |> count() + if (c == 0) { + b->failNow() + } + } +} + +def run_m3f_old(b : B?; n : int) { + let arr <- fixture_array(n) + b |> run("m3f_old_array_fold/{n}", n) { + let c = _old_fold(each(arr)._select(_.price * 2).count()) + if (c == 0) { + b->failNow() + } + } +} + +def run_m3f(b : B?; n : int) { + let arr <- fixture_array(n) + b |> run("m3f_array_fold/{n}", n) { + let c = _fold(each(arr)._select(_.price * 2).count()) + if (c == 0) { + b->failNow() + } + } +} + +[benchmark] +def select_count_m1(b : B?) { + run_m1(b, 100000) +} + +[benchmark] +def select_count_m3(b : B?) { + run_m3(b, 100000) +} + +[benchmark] +def select_count_m3f_old(b : B?) { + run_m3f_old(b, 100000) +} + +[benchmark] +def select_count_m3f(b : B?) { + run_m3f(b, 100000) +} diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das index 8975aad43d..38264fcbb9 100644 --- a/daslib/linq_fold.das +++ b/daslib/linq_fold.das @@ -522,6 +522,140 @@ def private fold_linq_default(var expr : Expression?; recursiveMacroName : strin return res } +[macro_function] +def private plan_loop_or_count(var expr : Expression?) : Expression? { + // Phase-2A loop planner. Recognizes chains of shape `[where_*][select?]` (array lane) + // and `[where_*][select?] |> count` (counter lane). Fuses chained wheres into `&&` and + // chained selects via expression composition; emits one inline `invoke($block, $src)` + // with a plain for-loop. Returns null for anything else — caller falls through unfolded. + var (top, calls) = flatten_linq(expr) + if (empty(calls)) return null + let lastName = calls.back()._1.name + if (lastName != "count" && lastName != "where_" && lastName != "select") return null + let counterLane = lastName == "count" + let intermediateCount = counterLane ? length(calls) - 1 : length(calls) + let at = calls[0]._0.at + let srcName = "`source`{at.line}`{at.column}" + let itName = "`it`{at.line}`{at.column}" + let accName = "`acc`{at.line}`{at.column}" + let valName = "`val`{at.line}`{at.column}" + var whereCond : Expression? + var projection : Expression? + var seenSelect = false + var elementType = clone_type(top._type.firstType) + for (i in 0 .. intermediateCount) { + var cll & = unsafe(calls[i]) + let opName = cll._1.name + if (opName == "where_") { + if (seenSelect) return null // where-after-select not in Phase 2A + var predicate = fold_linq_cond(cll._0.arguments[1], itName) + if (whereCond == null) { + whereCond = predicate + } else { + whereCond = qmacro($e(whereCond) && $e(predicate)) + } + } elif (opName == "select") { + if (projection != null) return null // chained _select|_select needs ExprRef2Value-aware + // substitution; deferred to Phase 2B. + projection = fold_linq_cond(cll._0.arguments[1], itName) + elementType = clone_type(cll._0._type.firstType) + seenSelect = true + } else { + return null + } + } + // Build the per-element loop body. + var loopBody : Expression? + if (counterLane) { + if (whereCond != null) { + loopBody = qmacro_expr() { + if ($e(whereCond)) { + $i(accName) ++ + } + } + } else { + loopBody = qmacro_expr() { + $i(accName) ++ + } + } + } else { + // array lane + if (projection != null) { + // Pick copy- vs move-init at macro time using the projection's resolved type. + // Workhorse values copy cheaply; non-workhorse must move out of the temporary + // returned by the projection. Mirrors the `_old_fold` `fold_select_where` + // shape for parity (intermediate `val` binding then emplace). + let workhorseProj = projection._type != null && projection._type.isWorkhorseType + var perElem : Expression? + if (workhorseProj) { + perElem = qmacro_block() { + var $i(valName) = $e(projection) + $i(accName) |> emplace($i(valName)) + } + } else { + perElem = qmacro_block() { + var $i(valName) <- $e(projection) + $i(accName) |> emplace($i(valName)) + } + } + if (whereCond != null) { + loopBody = qmacro_expr() { + if ($e(whereCond)) { + $e(perElem) + } + } + } else { + loopBody = perElem + } + } elif (whereCond != null) { + loopBody = qmacro_expr() { + if ($e(whereCond)) { + $i(accName) |> push_clone($i(itName)) + } + } + } else { + // identity chain — nothing to fuse; let the caller fall through. + return null + } + } + var topExpr = clone_expression(top) + topExpr.genFlags.alwaysSafe = true + var res : Expression? + if (counterLane) { + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { + var $i(accName) = 0 + for ($i(itName) in $i(srcName)) { + $e(loopBody) + } + return $i(accName) + }, $e(topExpr))) + } else { + let isIter = expr._type.isIterator + if (isIter) { + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { + var $i(accName) : array<$t(elementType)> + for ($i(itName) in $i(srcName)) { + $e(loopBody) + } + return <- $i(accName).to_sequence_move() + }, $e(topExpr))) + } else { + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { + var $i(accName) : array<$t(elementType)> + for ($i(itName) in $i(srcName)) { + $e(loopBody) + } + return <- $i(accName) + }, $e(topExpr))) + } + } + res.force_at(at) + res.force_generated(true) + let blk = (res as ExprInvoke).arguments[0] as ExprMakeBlock + (blk._block as ExprBlock).arguments[0].flags.can_shadow = true + return res +} + [call_macro(name="_fold")] class private LinqFold : AstCallMacro { //! implements _fold(expression) that folds LINQ expressions into optimized sequnences @@ -534,12 +668,9 @@ class private LinqFold : AstCallMacro { //! Visits the _fold macro call and folds LINQ expressions into optimized sequences. macro_verify(call.arguments |> length == 1, prog, call.at, "expecting _fold(expression)") macro_verify(call.arguments[0]._type != null, prog, call.at, "expecting linq expression") - var res : Expression? = fold_linq_default(call.arguments[0], "_fold") - if (res == null) { - prog |> macro_error(call.at, "cannot fold LINQ expression\n{describe(call.arguments[0])}") - return res - } - return res + var res : Expression? = plan_loop_or_count(call.arguments[0]) + if (res != null) return res + return clone_expression(call.arguments[0]) } } diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das index 638eb7e226..d1a7cd6d17 100644 --- a/tests/linq/test_linq_fold_ast.das +++ b/tests/linq/test_linq_fold_ast.das @@ -13,6 +13,7 @@ require dastest/testing_boost public // ── Target functions (fold happens at macro time) ────────────────────── +// `_fold` targets — used by behavioral *_fold_result tests + new loop-AST tests. [export, marker(no_coverage)] def target_where_fold() : array { return <- [1, 2, 3, 4, 5]._where(_ > 3)._fold() @@ -53,16 +54,59 @@ def target_zip3_predicate_fold() : array { return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1), [100, 200, 300]._select(_ / 10), $(a, b, c : int) => a + b + c)._fold() } -// ── Tests: fold_where — comprehension with where ─────────────────────── +// `_old_fold` targets — used by retargeted AST tests that document the frozen comprehension contract. +[export, marker(no_coverage)] +def target_where_old_fold() : array { + return <- [1, 2, 3, 4, 5]._where(_ > 3)._old_fold() +} + +[export, marker(no_coverage)] +def target_select_old_fold() : array { + return <- [1, 2, 3, 4, 5]._select(_ * 2)._old_fold() +} + +[export, marker(no_coverage)] +def target_where_select_old_fold() : array { + return <- [1, 2, 3, 4, 5]._where(_ > 3)._select(_ * 2)._old_fold() +} + +[export, marker(no_coverage)] +def target_select_where_old_fold() : array { + return <- [1, 2, 3, 4, 5]._select(_ * 2)._where(_ > 6)._old_fold() +} + +[export, marker(no_coverage)] +def target_reverse_where_old_fold() : array { + return <- [1, 2, 3, 4, 5].to_sequence().reverse()._where(_ > 3).to_array()._old_fold() +} + +[export, marker(no_coverage)] +def target_zip_old_fold() : array> { + return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1))._old_fold() +} + +[export, marker(no_coverage)] +def target_zip3_old_fold() : array> { + return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1), [100, 200, 300]._select(_ / 10))._old_fold() +} + +[export, marker(no_coverage)] +def target_zip3_predicate_old_fold() : array { + return <- [1, 2, 3]._select(_ * 2).zip([10, 20, 30]._select(_ + 1), [100, 200, 300]._select(_ / 10), $(a, b, c : int) => a + b + c)._old_fold() +} + +// ── Tests: _old_fold contract — comprehension emission (frozen baseline) ── +// These tests retain the pre-rewrite AST shape that `_fold` used to emit. +// `_fold` itself has diverged (Phase 2A loop planner); see test_*_fold_emits_loop +// below for the current `_fold` shape contract. The pair documents the +// comprehension-vs-loop split between the two macros. [test] -def test_where_fold_produces_comprehension(t : T?) { +def test_where_old_fold_produces_comprehension(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_where_fold) - t |> success(func != null, "should find target_where_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_where_old_fold) + t |> success(func != null, "should find target_where_old_fold") + if (func == null) return // fold_where output: invoke($(var source) .. var pass_0 <- COMP; return <- pass_0 .., src) var comp_expr : ExpressionPtr var source_expr : ExpressionPtr @@ -73,27 +117,21 @@ def test_where_fold_produces_comprehension(t : T?) { }, $e(source_expr)) } t |> success(r.matched, "should match fold invoke structure, error={int(r.error)}") - if (!r.matched) { - return - } + if (!r.matched) return // Verify the captured expression is a comprehension with where var resolved <- qm_resolve_comprehension(comp_expr) t |> success(resolved != null, "inner expression should be a comprehension") - if (resolved == null) { - return - } + if (resolved == null) return let ac = resolved as ExprArrayComprehension t |> success(ac.exprWhere != null, "comprehension should have where clause") } } [test] -def test_where_fold_comprehension_pattern(t : T?) { +def test_where_old_fold_comprehension_pattern(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_where_fold) - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_where_old_fold) + if (func == null) return // Match the full structure including comprehension pattern var where_cond : ExpressionPtr var source_expr : ExpressionPtr @@ -107,16 +145,12 @@ def test_where_fold_comprehension_pattern(t : T?) { } } -// ── Tests: fold_select — comprehension without where ─────────────────── - [test] -def test_select_fold_produces_comprehension(t : T?) { +def test_select_old_fold_produces_comprehension(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_select_fold) - t |> success(func != null, "should find target_select_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_select_old_fold) + t |> success(func != null, "should find target_select_old_fold") + if (func == null) return var comp_expr : ExpressionPtr var source_expr : ExpressionPtr let r = qmatch_function(func) $() { @@ -126,26 +160,20 @@ def test_select_fold_produces_comprehension(t : T?) { }, $e(source_expr)) } t |> success(r.matched, "should match fold structure, error={int(r.error)}") - if (!r.matched) { - return - } + if (!r.matched) return var resolved <- qm_resolve_comprehension(comp_expr) t |> success(resolved != null, "inner should be a comprehension") - if (resolved == null) { - return - } + if (resolved == null) return let ac = resolved as ExprArrayComprehension t |> success(ac.exprWhere == null, "select-only comprehension should have no where clause") } } [test] -def test_select_fold_comprehension_pattern(t : T?) { +def test_select_old_fold_comprehension_pattern(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_select_fold) - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_select_old_fold) + if (func == null) return var select_expr : ExpressionPtr var source_expr : ExpressionPtr let r = qmatch_function(func) $() { @@ -155,25 +183,19 @@ def test_select_fold_comprehension_pattern(t : T?) { }, $e(source_expr)) } t |> success(r.matched, "should match comprehension without where, error={int(r.error)}") - if (!r.matched) { - return - } + if (!r.matched) return // Verify the select expression is a multiplication: it * 2 let r2 = qmatch(select_expr, it * 2) t |> success(r2.matched, "select expression should be it * 2") } } -// ── Tests: fold_where_select — comprehension with both ───────────────── - [test] -def test_where_select_fold_comprehension(t : T?) { +def test_where_select_old_fold_comprehension(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_where_select_fold) - t |> success(func != null, "should find target_where_select_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_where_select_old_fold) + t |> success(func != null, "should find target_where_select_old_fold") + if (func == null) return var select_expr : ExpressionPtr var where_cond : ExpressionPtr var source_expr : ExpressionPtr @@ -184,9 +206,7 @@ def test_where_select_fold_comprehension(t : T?) { }, $e(source_expr)) } t |> success(r.matched, "should match comprehension with where+select, error={int(r.error)}") - if (!r.matched) { - return - } + if (!r.matched) return // Verify select is multiplication and where is comparison let r_sel = qmatch(select_expr, it * 2) t |> success(r_sel.matched, "select should be it * 2") @@ -195,16 +215,12 @@ def test_where_select_fold_comprehension(t : T?) { } } -// ── Tests: fold_select_where — not a simple comprehension ────────────── - [test] -def test_select_where_fold_structure(t : T?) { +def test_select_where_old_fold_structure(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_select_where_fold) - t |> success(func != null, "should find target_select_where_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_select_where_old_fold) + t |> success(func != null, "should find target_select_where_old_fold") + if (func == null) return // select_where fold produces an invoke with a lambda that has a for loop + if // It is NOT a simple comprehension - verify the fold still happened var inner_expr : ExpressionPtr @@ -216,57 +232,43 @@ def test_select_where_fold_structure(t : T?) { }, $e(source_expr)) } t |> success(r.matched, "should match fold invoke structure, error={int(r.error)}") - if (!r.matched) { - return - } + if (!r.matched) return // The inner expression should NOT be a comprehension (select_where uses a different strategy) var resolved <- qm_resolve_comprehension(inner_expr) t |> success(resolved == null, "select_where should not produce a simple comprehension") } } -// ── Tests: multi-step fold (reverse + where) ─────────────────────────── - [test] -def test_reverse_where_fold_structure(t : T?) { +def test_reverse_where_old_fold_structure(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_reverse_where_fold) - t |> success(func != null, "should find target_reverse_where_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_reverse_where_old_fold) + t |> success(func != null, "should find target_reverse_where_old_fold") + if (func == null) return // Multi-step fold: reverse_to_array + where comprehension var body_expr : ExpressionPtr let r = qmatch_function(func) $() { return <- $e(body_expr) } t |> success(r.matched, "should have a return expression") - if (!r.matched) { - return - } + if (!r.matched) return t |> success(body_expr is ExprInvoke, "fold should produce invoke wrapper") } } -// ── Tests: zip fold with recursive subexpression folding ─────────────── - [test] -def test_zip_fold_structure(t : T?) { +def test_zip_old_fold_structure(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_zip_fold) - t |> success(func != null, "should find target_zip_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_zip_old_fold) + t |> success(func != null, "should find target_zip_old_fold") + if (func == null) return // zip fold recursively folds the second argument var body_expr : ExpressionPtr let r = qmatch_function(func) $() { return <- $e(body_expr) } t |> success(r.matched, "should match return expression") - if (!r.matched) { - return - } + if (!r.matched) return t |> success(body_expr is ExprInvoke, "fold should produce invoke wrapper") } } @@ -332,42 +334,34 @@ def test_zip_fold_result(t : T?) { // ── Tests: zip3 fold — all 3 subexpressions fold ────────────────────── [test] -def test_zip3_fold_structure(t : T?) { +def test_zip3_old_fold_structure(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_fold) - t |> success(func != null, "should find target_zip3_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_old_fold) + t |> success(func != null, "should find target_zip3_old_fold") + if (func == null) return // zip3 fold: all three sources should be folded into invoke wrappers var body_expr : ExpressionPtr let r = qmatch_function(func) $() { return <- $e(body_expr) } t |> success(r.matched, "should match return expression") - if (!r.matched) { - return - } + if (!r.matched) return t |> success(body_expr is ExprInvoke, "zip3 fold should produce invoke wrapper") } } [test] -def test_zip3_predicate_fold_structure(t : T?) { +def test_zip3_predicate_old_fold_structure(t : T?) { ast_gc_guard() { - var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_predicate_fold) - t |> success(func != null, "should find target_zip3_predicate_fold") - if (func == null) { - return - } + var func = find_module_function_via_rtti(compiling_module(), @@target_zip3_predicate_old_fold) + t |> success(func != null, "should find target_zip3_predicate_old_fold") + if (func == null) return var body_expr : ExpressionPtr let r = qmatch_function(func) $() { return <- $e(body_expr) } t |> success(r.matched, "should match return expression") - if (!r.matched) { - return - } + if (!r.matched) return t |> success(body_expr is ExprInvoke, "zip3 predicate fold should produce invoke wrapper") } } @@ -403,3 +397,224 @@ def test_zip3_predicate_fold_result(t : T?) { t |> equal(result[2], 67) } } + +// ── Targets for `_fold` Phase-2A loop planner ────────────────────────── + +[export, marker(no_coverage)] +def target_chained_where_fold() : array { + return <- [1, 2, 3, 4, 5]._where(_ > 1)._where(_ < 5)._fold() +} + +[export, marker(no_coverage)] +def target_chained_select_fold() : array { + return <- [1, 2, 3, 4, 5]._select(_ * 2)._select(_ + 1)._fold() +} + +[export, marker(no_coverage)] +def target_where_count_fold() : int { + return _fold(each([1, 2, 3, 4, 5])._where(_ > 2).count()) +} + +[export, marker(no_coverage)] +def target_chained_where_count_fold() : int { + return _fold(each([1, 2, 3, 4, 5])._where(_ > 1)._where(_ < 5).count()) +} + +[export, marker(no_coverage)] +def target_count_fold() : int { + return _fold(each([1, 2, 3, 4, 5]).count()) +} + +[export, marker(no_coverage)] +def target_select_count_fold() : int { + return _fold(each([1, 2, 3, 4, 5])._select(_ * 2).count()) +} + +// ── Tests: `_fold` Phase-2A loop emission ────────────────────────────── +// Phase-2A `_fold` emits explicit for-loops inside an `invoke($block, $src)` wrapper +// (no `ExprArrayComprehension` nodes). Each test asserts the invoke wrapper exists +// and the inner body is NOT a comprehension. Out-of-scope shapes fall through +// unfolded — body is the raw chain, not an invoke. + +[test] +def test_where_fold_emits_loop(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_where_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return <- $e(body_expr) + } + t |> success(r.matched, "should have return expression") + t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper") + if (!(body_expr is ExprInvoke)) return + let inv = body_expr as ExprInvoke + var arg0 = clone_expression(inv.arguments[0]) + var maybe_comp <- qm_resolve_comprehension(arg0) + t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension") + } +} + +[test] +def test_select_fold_emits_loop(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_select_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return <- $e(body_expr) + } + t |> success(r.matched, "should have return expression") + t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper") + if (!(body_expr is ExprInvoke)) return + let inv = body_expr as ExprInvoke + var arg0 = clone_expression(inv.arguments[0]) + var maybe_comp <- qm_resolve_comprehension(arg0) + t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension") + } +} + +[test] +def test_chained_where_fold_emits_loop(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_chained_where_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return <- $e(body_expr) + } + t |> success(r.matched, "should have return expression") + // Phase 2A fuses chained _where|_where into a single loop with && predicate + t |> success(body_expr is ExprInvoke, "chained where should fuse into single invoke loop") + } +} + +[test] +def test_chained_select_fold_falls_through(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_chained_select_fold) + if (func == null) return + // Chained _select|_select needs ExprRef2Value-aware projection substitution; the + // Phase-2A planner bails out and `_fold` returns the raw chain unfolded. Phase 2B + // will lift this restriction by adding a substitution-aware composition pass. + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return <- $e(body_expr) + } + t |> success(r.matched, "should have return expression") + t |> success(!(body_expr is ExprInvoke), "chained _select|_select should fall through (no invoke wrapper)") + } +} + +[test] +def test_where_count_fold_emits_counter(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_where_count_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched, "should have return expression") + t |> success(body_expr is ExprInvoke, "_where|_count should fuse into counter invoke") + } +} + +[test] +def test_chained_where_count_fold_emits_counter(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_chained_where_count_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched, "should have return expression") + t |> success(body_expr is ExprInvoke, "chained where + count should fuse into single counter invoke") + } +} + +[test] +def test_count_fold_emits_counter(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_count_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched, "should have return expression") + t |> success(body_expr is ExprInvoke, "bare count should fuse into unconditional counter invoke") + } +} + +[test] +def test_select_where_fold_falls_through(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_select_where_fold) + if (func == null) return + // _select |> _where is out of Phase 2A scope (where-after-select) — chain falls + // through unfolded. The function body is the raw `where_(select(...), ...)` call, + // NOT a generated invoke wrapper. + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return <- $e(body_expr) + } + t |> success(r.matched, "should have return expression") + t |> success(!(body_expr is ExprInvoke), "select_where should fall through unfolded (no invoke wrapper)") + } +} + +// ── Behavioral parity: results of new shapes ─────────────────────────── + +[test] +def test_chained_where_fold_result(t : T?) { + t |> run("chained where _fold produces correct values") @(t : T?) { + let result <- target_chained_where_fold() + t |> equal(length(result), 3) + t |> equal(result[0], 2) + t |> equal(result[1], 3) + t |> equal(result[2], 4) + } +} + +[test] +def test_chained_select_fold_result(t : T?) { + t |> run("chained select _fold produces correct values") @(t : T?) { + let result <- target_chained_select_fold() + t |> equal(length(result), 5) + // [1,2,3,4,5] * 2 = [2,4,6,8,10] + 1 = [3,5,7,9,11] + let expected = [3, 5, 7, 9, 11] + for (i, v in 0..5, result) { + t |> equal(expected[i], v) + } + } +} + +[test] +def test_where_count_fold_result(t : T?) { + t |> run("where _count _fold produces correct count") @(t : T?) { + t |> equal(target_where_count_fold(), 3) + } +} + +[test] +def test_chained_where_count_fold_result(t : T?) { + t |> run("chained where _count _fold produces correct count") @(t : T?) { + t |> equal(target_chained_where_count_fold(), 3) + } +} + +[test] +def test_count_fold_result(t : T?) { + t |> run("bare _count _fold produces source length") @(t : T?) { + t |> equal(target_count_fold(), 5) + } +} + +[test] +def test_select_count_fold_result(t : T?) { + t |> run("select _count _fold produces correct count (projection ignored by counter)") @(t : T?) { + t |> equal(target_select_count_fold(), 5) + } +} From 41d8ce129b0290b319e392e59bf6a6ba9f27df63 Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 10:39:19 -0700 Subject: [PATCH 02/18] =?UTF-8?q?linq=5Ffold:=20peel=20each()=20+?= =?UTF-8?q?=20reserve=20+=20workhorse=20push=20=E2=80=94=20to=5Farray=5Ffi?= =?UTF-8?q?lter=20parity?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The first Phase-2A cut was ~18% slower than the _old_fold comprehension on where|select|to_array. Four small fixes brought it to 11 ns/op parity: 1. Workhorse decision at macro time, not runtime. The projection's _type is resolved when the planner runs, so the macro reads projection._type.isWorkhorseType directly and emits exactly one branch instead of a runtime static_if. 2. Pre-reserve when the source has a known length. The planner emits acc |> reserve(length(src)) when top._type isn't an iterator — matches what ExprArrayComprehension lowering does internally. 3. Peel each() at macro time. each(arr) reports as iterator so (2) wouldn't fire on benchmark sources like each(arr)._where(...). The planner now detects each() where the inner has length and unwraps it — the emitted loop iterates the array directly. 4. Drop the intermediate var binding for workhorse projections. Workhorse values copy cheaply, so the planner emits acc |> push(projection) directly. Non-workhorse keeps the bind-then-emplace dance because <- is a statement, not an expression. Phase 2A benchmark deltas (100K, INTERP, ns/op per element): count_aggregate (where|count): 5 → 5 parity chained_where (where|where|count): 17 → 8 2.1× faster select_count (select|count): 15 → 2 7.5× faster to_array_filter (where|select): 11 → 11 parity (was 13 pre-fix) Co-Authored-By: Claude Opus 4.7 (1M context) --- benchmarks/sql/LINQ.md | 12 +++++++++--- daslib/linq_fold.das | 38 +++++++++++++++++++++++++++++++------- 2 files changed, 40 insertions(+), 10 deletions(-) diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md index afca309bce..599b3641db 100644 --- a/benchmarks/sql/LINQ.md +++ b/benchmarks/sql/LINQ.md @@ -87,13 +87,19 @@ Notation: `—` means the variant is not applicable for this benchmark (operator | count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) | | chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) | | select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane ignores projection; no array materialization) | -| to_array_filter | `where → select → to_array` | 11 | 13 | ~18% slower (explicit loop vs comprehension lowering) | +| to_array_filter | `where → select → to_array` | 11 | 11 | parity (after `each()` peel + reserve + workhorse `push`) | Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch. -### Why `to_array_filter` regressed +### Three small things that closed the to_array_filter gap -Comprehensions `[for (it in src) where p; expr]` lower through the compiler's dedicated `ExprArrayComprehension` path, which appears to compose more aggressively with array growth than an emitted-by-macro explicit loop with `static_if (is_workhorse) var val = expr; arr.emplace(val)`. The 18% gap is small relative to the 2-7× wins elsewhere; Phase 2B can profile and tune (likely pre-reserving the result array or switching to `push` for workhorse). +The first cut was 18% slower than the comprehension. Three independent fixes brought it to parity: + +1. **Workhorse decision at macro time, not runtime.** The first emission used `static_if (typeinfo is_workhorse(projection))` inside the qmacro so the compiler picked copy- vs move-init. The projection's `_type` is already resolved when the planner runs, so the macro now reads `projection._type.isWorkhorseType` directly and emits exactly one branch — less AST, no static_if to fold away. +2. **Pre-reserve when the source has a known length.** ExprArrayComprehension lowering reserves the result array to the source's length to avoid growth reallocs; the explicit loop has to do the same explicitly. The planner emits `acc |> reserve(length(src))` when the source isn't an iterator. +3. **Peel `each()` at macro time.** The benchmark source `each(arr)` reports as `iterator`, so the reserve from (2) wouldn't fire. The planner now detects `each()` where the inner expression has length and unwraps it — the emitted loop iterates the array directly. `for (it in arr)` and `for (it in each(arr))` yield the same element refs; the wrapper iterator is incidental in fold context. + +A fourth simplification dropped the intermediate `var val = projection; emplace(val)` for workhorse types — comprehension lowering pushes the projection expression directly, so the planner now emits `acc |> push(projection)` in that case (no temp binding). Non-workhorse projections still need the bind-then-emplace dance because `<-` is a statement, not an expression. ## Operator-coverage checklist (parity tests) diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das index 38264fcbb9..2ac058e140 100644 --- a/daslib/linq_fold.das +++ b/daslib/linq_fold.das @@ -530,6 +530,18 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { // with a plain for-loop. Returns null for anything else — caller falls through unfolded. var (top, calls) = flatten_linq(expr) if (empty(calls)) return null + // Peel `each()` so the emitted loop iterates the array directly and the + // array-lane reserve below has a length to use. Iteration semantics are unchanged — + // `for (it in each(arr))` and `for (it in arr)` yield the same element refs. + if (top is ExprCall) { + var topCall = top as ExprCall + if (topCall.func != null && topCall.func.name == "each" + && topCall.arguments |> length == 1 + && topCall.arguments[0]._type != null + && !topCall.arguments[0]._type.isIterator) { + top = topCall.arguments[0] + } + } let lastName = calls.back()._1.name if (lastName != "count" && lastName != "where_" && lastName != "select") return null let counterLane = lastName == "count" @@ -581,16 +593,15 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { } else { // array lane if (projection != null) { - // Pick copy- vs move-init at macro time using the projection's resolved type. - // Workhorse values copy cheaply; non-workhorse must move out of the temporary - // returned by the projection. Mirrors the `_old_fold` `fold_select_where` - // shape for parity (intermediate `val` binding then emplace). + // Workhorse projections copy cheaply — push the expression directly with no + // intermediate binding (matches ExprArrayComprehension lowering). Non-workhorse + // values must move out of the temporary returned by the projection, which `<-` + // can only do via an intermediate `var v` and then `emplace(v)`. let workhorseProj = projection._type != null && projection._type.isWorkhorseType var perElem : Expression? if (workhorseProj) { - perElem = qmacro_block() { - var $i(valName) = $e(projection) - $i(accName) |> emplace($i(valName)) + perElem = qmacro_expr() { + $i(accName) |> push($e(projection)) } } else { perElem = qmacro_block() { @@ -631,6 +642,10 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { }, $e(topExpr))) } else { let isIter = expr._type.isIterator + // Pre-reserve the accumulator to the source's length when the source has a known + // length (array, table, range — anything that isn't an iterator). Avoids realloc + // walks during growth; matches what ExprArrayComprehension lowering does. + let sourceHasLength = top._type != null && !top._type.isIterator if (isIter) { res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { var $i(accName) : array<$t(elementType)> @@ -639,6 +654,15 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { } return <- $i(accName).to_sequence_move() }, $e(topExpr))) + } elif (sourceHasLength) { + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { + var $i(accName) : array<$t(elementType)> + $i(accName) |> reserve(length($i(srcName))) + for ($i(itName) in $i(srcName)) { + $e(loopBody) + } + return <- $i(accName) + }, $e(topExpr))) } else { res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { var $i(accName) : array<$t(elementType)> From d4586a103298978c7aa5ac9474eb62c39298aedd Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 10:49:35 -0700 Subject: [PATCH 03/18] linq_fold: fuse chained workhorse selects + drop emplace from emission MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two follow-up improvements on top of the Phase-2A loop planner: 1. Chained _select|_select|... now fuses (for workhorse projections). The planner emits intermediate `var v_N = projection_N` let-bindings inside the loop body; each next lambda's `_` is renamed straight to the prior binding's name via fold_linq_cond. No expression substitution = no ExprRef2Value-wrapper trap. Non-workhorse chained selects still fall through (needs `:=` clone semantics — Phase 2B). 2. Drop emplace from emission. emplace moves out of its argument and can corrupt the source when the projection returns a ref into it (e.g. `_._field`). The planner now emits `push` for workhorse and `push_clone` for non-workhorse — no intermediate `var v <- proj; emplace(v)` dance, which both simplifies the AST and is safer. The chained-select AST test (previously asserting fall-through) now asserts invoke emission. All 118 fold + ast tests pass; benchmark deltas held vs the previous commit: count_aggregate: 5 parity chained_where: 8 2.1× faster select_count: 2 7.5× faster to_array_filter: 11 parity Co-Authored-By: Claude Opus 4.7 (1M context) --- benchmarks/sql/LINQ.md | 4 +- daslib/linq_fold.das | 72 ++++++++++++++++++++++++------- tests/linq/test_linq_fold_ast.das | 10 ++--- 3 files changed, 64 insertions(+), 22 deletions(-) diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md index 599b3641db..4dcee49b34 100644 --- a/benchmarks/sql/LINQ.md +++ b/benchmarks/sql/LINQ.md @@ -76,7 +76,7 @@ Notation: `—` means the variant is not applicable for this benchmark (operator `_fold` now emits explicit for-loops for two narrow shape families instead of comprehensions. Anything outside scope falls through unfolded to raw linq (no dispatch to `_old_fold` or `fold_linq_default`). -**In scope:** `[where_*][select?]` (array lane) and `[where_*][select?] |> count` (counter lane). Chained `_where|_where|...` fuses via `&&`; single `_select` composes; chained `_select|_select` falls through (needs ExprRef2Value-aware substitution, deferred to Phase 2B). +**In scope:** `[where_*][select*]` (array lane) and `[where_*][select*] |> count` (counter lane). Chained `_where|_where|...` fuses via `&&`. Chained `_select|_select|...` fuses via intermediate `var v_N = projection_N` let-bindings — each next lambda's `_` is renamed straight to the prior binding's name, no expression substitution needed (which would have hit the ExprRef2Value-wrapper problem documented in `skills/das_macros.md`). Chained selects currently require all projections to be workhorse; non-workhorse intermediates would need `:=` (clone) since `<-` (move) can corrupt source for lvalue projections — deferred to Phase 2B. **Out of scope (falls through):** `_select|_where`, `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`, `_order`, `_distinct`, `_take`, `_skip`, `_zip`, `_reverse`, etc. @@ -99,7 +99,7 @@ The first cut was 18% slower than the comprehension. Three independent fixes bro 2. **Pre-reserve when the source has a known length.** ExprArrayComprehension lowering reserves the result array to the source's length to avoid growth reallocs; the explicit loop has to do the same explicitly. The planner emits `acc |> reserve(length(src))` when the source isn't an iterator. 3. **Peel `each()` at macro time.** The benchmark source `each(arr)` reports as `iterator`, so the reserve from (2) wouldn't fire. The planner now detects `each()` where the inner expression has length and unwraps it — the emitted loop iterates the array directly. `for (it in arr)` and `for (it in each(arr))` yield the same element refs; the wrapper iterator is incidental in fold context. -A fourth simplification dropped the intermediate `var val = projection; emplace(val)` for workhorse types — comprehension lowering pushes the projection expression directly, so the planner now emits `acc |> push(projection)` in that case (no temp binding). Non-workhorse projections still need the bind-then-emplace dance because `<-` is a statement, not an expression. +A fourth simplification dropped `emplace` from the emission entirely. emplace **moves** out of its argument and can corrupt the source when the projection returns a ref into it (e.g. `_._field`). The safe pattern is `push` for workhorse (cheap copy) and `push_clone` for non-workhorse (deep clone). No intermediate `var v = projection; emplace(v)` is needed in either case — the planner pushes the projection expression directly. ## Operator-coverage checklist (parity tests) diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das index 2ac058e140..246a76a824 100644 --- a/daslib/linq_fold.das +++ b/daslib/linq_fold.das @@ -550,11 +550,12 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { let srcName = "`source`{at.line}`{at.column}" let itName = "`it`{at.line}`{at.column}" let accName = "`acc`{at.line}`{at.column}" - let valName = "`val`{at.line}`{at.column}" var whereCond : Expression? var projection : Expression? + var intermediateBinds : array var seenSelect = false var elementType = clone_type(top._type.firstType) + var lastBindName = itName for (i in 0 .. intermediateCount) { var cll & = unsafe(calls[i]) let opName = cll._1.name @@ -567,9 +568,23 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { whereCond = qmacro($e(whereCond) && $e(predicate)) } } elif (opName == "select") { - if (projection != null) return null // chained _select|_select needs ExprRef2Value-aware - // substitution; deferred to Phase 2B. - projection = fold_linq_cond(cll._0.arguments[1], itName) + // Chained selects: bind the previous projection to a fresh local now so the next + // lambda's `_` can be renamed straight to that name — avoids the + // ExprRef2Value-substitution trap that plain `Template.replaceVariable` hits when + // splicing a typed expression into another typed expression. Phase 2A only + // chains workhorse projections; a non-workhorse intermediate binding would need + // a clone (`:=`) since `<-` (move) can corrupt source for lvalue projections + // like `_._field`. Deferred to Phase 2B. + if (projection != null) { + let prevWorkhorse = projection._type != null && projection._type.isWorkhorseType + if (!prevWorkhorse) return null // chained non-workhorse selects — Phase 2B + let bindName = "`v`{at.line}`{at.column}`{length(intermediateBinds)}" + intermediateBinds |> push <| qmacro_expr() { + var $i(bindName) = $e(projection) + } + lastBindName = bindName + } + projection = fold_linq_cond(cll._0.arguments[1], lastBindName) elementType = clone_type(cll._0._type.firstType) seenSelect = true } else { @@ -593,20 +608,35 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { } else { // array lane if (projection != null) { - // Workhorse projections copy cheaply — push the expression directly with no - // intermediate binding (matches ExprArrayComprehension lowering). Non-workhorse - // values must move out of the temporary returned by the projection, which `<-` - // can only do via an intermediate `var v` and then `emplace(v)`. + // push for workhorse (cheap copy), push_clone for non-workhorse (deep clone, + // never mutates source). emplace would move out of the projection's value, + // which is unsafe when the projection returns a ref into the source. + // For chained selects, `intermediateBinds` carries N-1 prior bindings; splice + // them in before the push so each lambda body can resolve its renamed parameter + // to the correct binding name. let workhorseProj = projection._type != null && projection._type.isWorkhorseType - var perElem : Expression? + var pushStmt : Expression? if (workhorseProj) { - perElem = qmacro_expr() { + pushStmt = qmacro_expr() { $i(accName) |> push($e(projection)) } } else { + pushStmt = qmacro_expr() { + $i(accName) |> push_clone($e(projection)) + } + } + var perElem : Expression? + if (empty(intermediateBinds)) { + perElem = pushStmt + } else { + var perElemStmts : array + perElemStmts |> reserve(length(intermediateBinds) + 1) + for (b in intermediateBinds) { + perElemStmts |> push(b) + } + perElemStmts |> push(pushStmt) perElem = qmacro_block() { - var $i(valName) <- $e(projection) - $i(accName) |> emplace($i(valName)) + $b(perElemStmts) } } if (whereCond != null) { @@ -619,9 +649,21 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { loopBody = perElem } } elif (whereCond != null) { - loopBody = qmacro_expr() { - if ($e(whereCond)) { - $i(accName) |> push_clone($i(itName)) + // Identity case (no projection): `it` aliases the source element. Workhorse + // types can `push` (cheap copy); non-workhorse needs `push_clone` to avoid + // mutating the source via a move. + let elemWorkhorse = elementType != null && elementType.isWorkhorseType + if (elemWorkhorse) { + loopBody = qmacro_expr() { + if ($e(whereCond)) { + $i(accName) |> push($i(itName)) + } + } + } else { + loopBody = qmacro_expr() { + if ($e(whereCond)) { + $i(accName) |> push_clone($i(itName)) + } } } } else { diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das index d1a7cd6d17..cb067a3175 100644 --- a/tests/linq/test_linq_fold_ast.das +++ b/tests/linq/test_linq_fold_ast.das @@ -490,19 +490,19 @@ def test_chained_where_fold_emits_loop(t : T?) { } [test] -def test_chained_select_fold_falls_through(t : T?) { +def test_chained_select_fold_emits_loop(t : T?) { ast_gc_guard() { var func = find_module_function_via_rtti(compiling_module(), @@target_chained_select_fold) if (func == null) return - // Chained _select|_select needs ExprRef2Value-aware projection substitution; the - // Phase-2A planner bails out and `_fold` returns the raw chain unfolded. Phase 2B - // will lift this restriction by adding a substitution-aware composition pass. + // Chained _select|_select fuses via intermediate `var v_N = projection_N` bindings + // — the next lambda's `_` is renamed straight to the prior binding's name so no + // expression-substitution (and no ExprRef2Value-wrapping headaches) is needed. var body_expr : ExpressionPtr let r = qmatch_function(func) $() { return <- $e(body_expr) } t |> success(r.matched, "should have return expression") - t |> success(!(body_expr is ExprInvoke), "chained _select|_select should fall through (no invoke wrapper)") + t |> success(body_expr is ExprInvoke, "chained _select|_select should fuse into single invoke loop") } } From 6226a1e47a822bfd56a4bf50eb5254fb43529e59 Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 11:34:46 -0700 Subject: [PATCH 04/18] linq_fold: counter lane evaluates projection per iteration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #2689 review fixes (Copilot): 1. Counter lane drop-projection bug. `_fold(src._select(f).count())` was skipping the projection entirely, which diverges from raw LINQ `count(select(src, f))` when `f` has side effects. Counter lane now binds the final projection to a discardable local per matched element so user-visible side effects fire. The optimizer dead-code-eliminates the binding for pure projections (the common case — `_.x * 2`, `_.price` etc.), so the 7.5× select_count speedup is preserved. 2. Vacuous comprehension assertion in two AST tests. Pass `body_expr` (the full ExprInvoke wrapper) to `qm_resolve_comprehension` instead of `inv.arguments[0]` (the inner ExprMakeBlock, which can never match either branch of the resolver). The fixed form actually verifies the loop output is not the `fromComprehension=true` shape. Adds 2 behavioral tests for the side-effects invariant (single `select|count` and `where|select|count`). All Phase 2A benchmarks held: count_aggregate 5/5, chained_where 8/17 (2.1×), select_count 2/15 (7.5×), to_array_filter 11/11. Co-Authored-By: Claude Opus 4.7 (1M context) --- benchmarks/sql/LINQ.md | 2 +- daslib/linq_fold.das | 32 +++++++++++++++++++++++++++---- tests/linq/test_linq_fold.das | 28 +++++++++++++++++++++++++++ tests/linq/test_linq_fold_ast.das | 10 ++++------ 4 files changed, 61 insertions(+), 11 deletions(-) diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md index 4dcee49b34..8868addf04 100644 --- a/benchmarks/sql/LINQ.md +++ b/benchmarks/sql/LINQ.md @@ -86,7 +86,7 @@ Notation: `—` means the variant is not applicable for this benchmark (operator |---|---|---:|---:|---| | count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) | | chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) | -| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane ignores projection; no array materialization) | +| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane evaluates projection per iteration to preserve side effects; optimizer DCEs pure projections, no array materialization) | | to_array_filter | `where → select → to_array` | 11 | 11 | parity (after `each()` peel + reserve + workhorse `push`) | Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch. diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das index 246a76a824..66a0bbf2ad 100644 --- a/daslib/linq_fold.das +++ b/daslib/linq_fold.das @@ -594,16 +594,40 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { // Build the per-element loop body. var loopBody : Expression? if (counterLane) { + // Counter lane must evaluate the projection (and any chained intermediates) per + // matched element so user-visible side effects fire — `count(select(src, f))` in + // plain LINQ invokes f per element, and our fold must match. Bind the final + // projection to a discardable local; daslang macro output bypasses LINT002. + var sideEffectStmts : array + sideEffectStmts |> reserve(length(intermediateBinds) + 2) + for (b in intermediateBinds) { + sideEffectStmts |> push(b) + } + if (projection != null) { + let finalBindName = "`vfinal`{at.line}`{at.column}" + sideEffectStmts |> push <| qmacro_expr() { + var $i(finalBindName) = $e(projection) + } + } + sideEffectStmts |> push <| qmacro_expr() { + $i(accName) ++ + } + var incBlock : Expression? + if (length(sideEffectStmts) == 1) { + incBlock = sideEffectStmts[0] + } else { + incBlock = qmacro_block() { + $b(sideEffectStmts) + } + } if (whereCond != null) { loopBody = qmacro_expr() { if ($e(whereCond)) { - $i(accName) ++ + $e(incBlock) } } } else { - loopBody = qmacro_expr() { - $i(accName) ++ - } + loopBody = incBlock } } else { // array lane diff --git a/tests/linq/test_linq_fold.das b/tests/linq/test_linq_fold.das index 180e96ceda..929037be93 100644 --- a/tests/linq/test_linq_fold.das +++ b/tests/linq/test_linq_fold.das @@ -754,3 +754,31 @@ def test_where_count_fold(t : T?) { } } +var g_proj_hits = 0 + +def projection_with_side_effect(x : int) : int { + g_proj_hits ++ + return x * 2 +} + +[test] +def test_counter_lane_projection_side_effects(t : T?) { + // Counter lane must evaluate the projection per matched element so user-visible + // side effects fire — matches raw `count(select(src, f))` semantics. Tests guard + // the projection-is-evaluated invariant after the Phase-2A planner fix. + t |> run("select|count fires projection once per element") @(t : T?) { + let arr <- [1, 2, 3, 4, 5] + g_proj_hits = 0 + let c = _fold(each(arr)._select(projection_with_side_effect(_)).count()) + t |> equal(5, c) + t |> equal(5, g_proj_hits) + } + t |> run("where|select|count fires projection only on matches") @(t : T?) { + let arr <- [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + g_proj_hits = 0 + let c = _fold(each(arr)._where(_ > 5)._select(projection_with_side_effect(_)).count()) + t |> equal(5, c) + t |> equal(5, g_proj_hits) + } +} + diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das index cb067a3175..8c54608ac4 100644 --- a/tests/linq/test_linq_fold_ast.das +++ b/tests/linq/test_linq_fold_ast.das @@ -448,9 +448,8 @@ def test_where_fold_emits_loop(t : T?) { t |> success(r.matched, "should have return expression") t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper") if (!(body_expr is ExprInvoke)) return - let inv = body_expr as ExprInvoke - var arg0 = clone_expression(inv.arguments[0]) - var maybe_comp <- qm_resolve_comprehension(arg0) + var body_clone = clone_expression(body_expr) + var maybe_comp <- qm_resolve_comprehension(body_clone) t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension") } } @@ -467,9 +466,8 @@ def test_select_fold_emits_loop(t : T?) { t |> success(r.matched, "should have return expression") t |> success(body_expr is ExprInvoke, "_fold should produce invoke wrapper") if (!(body_expr is ExprInvoke)) return - let inv = body_expr as ExprInvoke - var arg0 = clone_expression(inv.arguments[0]) - var maybe_comp <- qm_resolve_comprehension(arg0) + var body_clone = clone_expression(body_expr) + var maybe_comp <- qm_resolve_comprehension(body_clone) t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension") } } From 6cda3c763dfcb02071fac3e45990fd17b32b4f4f Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 11:35:18 -0700 Subject: [PATCH 05/18] linq_fold: update select_count benchmark header comment Reflect counter-lane semantics fix: projection is now evaluated per matched element (side effects fire); optimizer DCEs pure projections. Co-Authored-By: Claude Opus 4.7 (1M context) --- benchmarks/sql/select_count.das | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/benchmarks/sql/select_count.das b/benchmarks/sql/select_count.das index 63c0e5cc9a..84e2253422 100644 --- a/benchmarks/sql/select_count.das +++ b/benchmarks/sql/select_count.das @@ -3,12 +3,14 @@ options persistent_heap require _common public -// _select |> count — projection followed by counter. The projection has no effect on count -// semantics, but on the array path m3 materializes the projected array before counting. -// Phase-2A `_fold` recognizes the counter lane and emits a bare-loop counter that ignores -// the projection entirely (no allocation). `_old_fold` lacks a [select, count] pattern in -// g_foldSeq so it falls to the default nested-pass form (pass_0 = select(...); count(pass_0)) -// — materializing the same way m3 does. +// _select |> count — projection followed by counter. The final count value doesn't depend +// on the projection, but plain LINQ `count(select(src, f))` still evaluates `f` per element +// so user-visible side effects fire. Phase-2A `_fold` matches that: the counter lane binds +// the final projection to a discardable local per matched element (side effects preserved) +// and skips array materialization. The optimizer DCEs the binding for pure projections +// like `_.price * 2`, leaving a bare-loop counter for the common case. `_old_fold` lacks a +// [select, count] pattern in g_foldSeq so it falls to the default nested-pass form +// (pass_0 = select(...); count(pass_0)) — materializing the same way m3 does. def run_m1(b : B?; n : int) { with_sqlite(":memory:") $(db) { From 52a2d4089ed0be0756e9cebcc5b1439578770227 Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 11:55:26 -0700 Subject: [PATCH 06/18] linq_fold: extract peel helper + tighten length check MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #2689 review fixes (Copilot, round 2): 1. Peel-each + reserve guard. The inline `each()` peel + `sourceHasLength` gate previously accepted any non-iterator inner type, including `each(lambda)` (a lambda iterable per builtin.das:1351). That would peel to a lambda, then emit `reserve(length(lambda))` which has no overload and would fail to compile inside the macro output. Phase 2A never hit this in practice because the test suite only uses array sources, but it's a latent trap. Extracted `peel_each_length_source` and `type_has_length` helpers. Peel now triggers only when the inner type satisfies `isGoodArrayType || isGoodTableType || isString || isArray (T[N]) || isRange`. Same predicate gates the array-lane reserve emission, so the two stay in sync. Lambdas / custom user iterables fall through unfolded. 2. Reworded `test_select_count_fold_result` assertion message: the old "(projection ignored by counter)" wording was outdated after the counter-lane fix in 6226a1e47 — the planner now evaluates the projection per iteration (for side effects); only the value is discarded. Reads "(projection does not affect count value)" now. select_count benchmark held at 2 ns/op (vs 15 for old fold), to_array_filter held at 11/11 parity. AST + behavioral tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) --- daslib/linq_fold.das | 41 +++++++++++++++++++++---------- tests/linq/test_linq_fold_ast.das | 2 +- 2 files changed, 29 insertions(+), 14 deletions(-) diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das index 66a0bbf2ad..d4eaa83e6b 100644 --- a/daslib/linq_fold.das +++ b/daslib/linq_fold.das @@ -522,6 +522,32 @@ def private fold_linq_default(var expr : Expression?; recursiveMacroName : strin return res } +[macro_function] +def private type_has_length(t : TypeDecl?) : bool { + // True for types where `length()` is statically resolvable: arrays, tables, + // strings, fixed-arrays (T[N]), and the range family. Lambdas (`def each(lam : + // lambda<...>)`) and custom user iterables are excluded — they have no length() + // overload and would make a macro-emitted `reserve(length(src))` fail to compile. + if (t == null) return false + return (t.isGoodArrayType || t.isGoodTableType || t.isString + || t.isArray || t.isRange) +} + +[macro_function] +def private peel_each_length_source(var top : Expression?) : Expression? { + // If `top` is `each()` and `` has a length-supporting type, return `` so + // the emitted loop iterates the underlying container directly — lets the array-lane + // reserve fire and avoids the iterator wrapper. Iteration semantics are preserved + // (`for (it in each(arr))` and `for (it in arr)` yield the same element refs). + // Restricted to length-supporting types to keep `reserve(length(src))` valid. + if (!(top is ExprCall)) return top + var topCall = top as ExprCall + if (topCall.func == null || topCall.func.name != "each" + || topCall.arguments |> length != 1 + || !type_has_length(topCall.arguments[0]._type)) return top + return clone_expression(topCall.arguments[0]) +} + [macro_function] def private plan_loop_or_count(var expr : Expression?) : Expression? { // Phase-2A loop planner. Recognizes chains of shape `[where_*][select?]` (array lane) @@ -530,18 +556,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { // with a plain for-loop. Returns null for anything else — caller falls through unfolded. var (top, calls) = flatten_linq(expr) if (empty(calls)) return null - // Peel `each()` so the emitted loop iterates the array directly and the - // array-lane reserve below has a length to use. Iteration semantics are unchanged — - // `for (it in each(arr))` and `for (it in arr)` yield the same element refs. - if (top is ExprCall) { - var topCall = top as ExprCall - if (topCall.func != null && topCall.func.name == "each" - && topCall.arguments |> length == 1 - && topCall.arguments[0]._type != null - && !topCall.arguments[0]._type.isIterator) { - top = topCall.arguments[0] - } - } + top = peel_each_length_source(top) let lastName = calls.back()._1.name if (lastName != "count" && lastName != "where_" && lastName != "select") return null let counterLane = lastName == "count" @@ -711,7 +726,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { // Pre-reserve the accumulator to the source's length when the source has a known // length (array, table, range — anything that isn't an iterator). Avoids realloc // walks during growth; matches what ExprArrayComprehension lowering does. - let sourceHasLength = top._type != null && !top._type.isIterator + let sourceHasLength = type_has_length(top._type) if (isIter) { res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { var $i(accName) : array<$t(elementType)> diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das index 8c54608ac4..e47cf2e785 100644 --- a/tests/linq/test_linq_fold_ast.das +++ b/tests/linq/test_linq_fold_ast.das @@ -612,7 +612,7 @@ def test_count_fold_result(t : T?) { [test] def test_select_count_fold_result(t : T?) { - t |> run("select _count _fold produces correct count (projection ignored by counter)") @(t : T?) { + t |> run("select _count _fold produces correct count (projection does not affect count value)") @(t : T?) { t |> equal(target_select_count_fold(), 5) } } From 3f0f8907e9b954a665ecd43da5696f0ec079e63f Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 12:30:04 -0700 Subject: [PATCH 07/18] tests/fio: regression coverage for ref_time_ticks() ns normalization MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #2685 normalized ref_time_ticks() to nanoseconds across every platform (Windows used to return raw QPC ticks at the underlying counter's frequency — typically 10 MHz). The fix shipped without a unit test that would have caught a units regression. Add four tests under tests/fio/perf_time.das (sleep() lives in fio, so this is the right neighborhood): - monotonic — 1000 successive reads never go backwards. Catches any signed/unsigned mixup or wrap-around bug in the ns conversion arithmetic. - sleep_roundtrip — sleep(100 ms) -> delta_ns must land in [80 ms, 500 ms]. The 80 ms lower bound is the load-bearing assertion: if Windows reverted to raw QPC ticks (10 MHz counter on the typical box -> a 100 ms wall-clock sleep would surface as 1000000 "ticks" interpreted as ns, i.e. 1 ms), the test would trip. Wide upper bound covers CI runner scheduler jitter. - get_time_usec_agrees — the get_time_usec(t0) helper agrees with (ref_time_ticks() - t0) / 1000 within 5 ms. Two helpers reading the same underlying clock should not drift; if one ever ends up on a different code path, this notices. - units_are_nanoseconds — three back-to-back sleep(100 ms) deltas stay within 200 ms spread. If the unit accidentally changed mid-run (think: thread-local frequency cache going stale), the deltas would diverge wildly. The test runs cleanly in both interpreter and AOT mode on Windows (Win11 local): sleep(100 ms) -> 102-109 ms delta, get_time_usec agrees to within microseconds. tests/aot/CMakeLists.txt:224 already covers tests/fio/*.das via FILE(GLOB CONFIGURE_DEPENDS); cmake reconfigure picks the new file up automatically. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/fio/perf_time.das | 89 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 tests/fio/perf_time.das diff --git a/tests/fio/perf_time.das b/tests/fio/perf_time.das new file mode 100644 index 0000000000..938d06e627 --- /dev/null +++ b/tests/fio/perf_time.das @@ -0,0 +1,89 @@ +options gen2 +require dastest/testing_boost public + +require daslib/fio +require math + +//! Regression coverage for `ref_time_ticks()` and `get_time_usec(int64)`. +//! +//! The cross-platform normalization landed via PR #2685 (Windows now returns +//! nanoseconds via QPC, matching POSIX clock_gettime — previously Windows +//! returned raw QPC ticks). These tests pin the post-fix contract: +//! - `ref_time_ticks()` returns a monotonic non-decreasing nanosecond +//! timestamp on every platform. +//! - `get_time_usec(t0)` returns elapsed time in microseconds since `t0` +//! and stays consistent with the `ref_time_ticks` delta. +//! - sleep(N ms) round-trip lands close to N ms (with slack for OS +//! scheduler granularity — Windows is the worst at ~16 ms ticks). + +[test] +def test_ref_time_ticks_monotonic(t : T?) { + //! Successive calls never go backwards. Catches QPC wrap-around + //! arithmetic bugs and any future signed/unsigned mixups in the + //! nanosecond conversion path. + var prev = ref_time_ticks() + for (i in range(1000)) { + let now = ref_time_ticks() + if (now < prev) { + t |> success(false, "ref_time_ticks went backwards at iter {i}: prev={prev} now={now}") + return + } + prev = now + } + t |> success(true, "1000 successive ref_time_ticks reads stayed monotonic") +} + +[test] +def test_ref_time_ticks_sleep_roundtrip(t : T?) { + //! sleep(100ms) → delta_ns should be in [80ms, 500ms]. Wide upper + //! bound covers CI runner jitter (GitHub-hosted Windows scheduling + //! can balloon to 200 ms even for a 100 ms sleep). Lower bound + //! catches a units bug: if Windows still reported raw QPC ticks + //! (10 MHz → 10× short for the same delta), delta_ns would land + //! around 10 ms and we'd trip. + let before = ref_time_ticks() + sleep(100u) + let after = ref_time_ticks() + let delta_ns = after - before + let delta_ms = delta_ns / 1_000_000l + t |> success(delta_ms >= 80l, + "sleep(100ms) elapsed only {delta_ms}ms — ref_time_ticks may not be in ns") + t |> success(delta_ms <= 500l, + "sleep(100ms) elapsed {delta_ms}ms — way over budget") +} + +[test] +def test_get_time_usec_agrees_with_ref_delta(t : T?) { + //! `get_time_usec(t0)` and `(ref_time_ticks() - t0) / 1000` should + //! agree to within a few µs (the two calls happen sequentially, + //! so a small drift is normal). + let t0 = ref_time_ticks() + sleep(50u) + let usec_via_helper = int64(get_time_usec(t0)) + let usec_via_delta = (ref_time_ticks() - t0) / 1_000l + let drift_us = int(abs(usec_via_delta - usec_via_helper)) + t |> success(drift_us <= 5_000, + "get_time_usec={usec_via_helper} vs delta/1000={usec_via_delta} drift={drift_us}us") +} + +[test] +def test_ref_time_ticks_units_are_nanoseconds(t : T?) { + //! Sanity check that two sleep(100ms) calls in a row produce roughly + //! the same delta. If one platform reports µs and another reports ns, + //! repeated calls would diverge wildly. Same-platform tick uniformity + //! is also expected. + var deltas : array + deltas |> reserve(3) + for (_i in range(3)) { + let a = ref_time_ticks() + sleep(100u) + let b = ref_time_ticks() + deltas |> push(b - a) + } + let lo = min(deltas[0], min(deltas[1], deltas[2])) + let hi = max(deltas[0], max(deltas[1], deltas[2])) + let spread_ms = int((hi - lo) / 1_000_000l) + t |> success(spread_ms <= 200, + "sleep(100ms) deltas span {spread_ms}ms — non-uniform tick rate") + delete deltas +} From c6a9d799c038b9134fc0f7c43c173d416a7c2c2c Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 15:23:24 -0700 Subject: [PATCH 08/18] macro_boost: add has_sideeffects + counter-lane elision MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a reusable conservative `has_sideeffects(expr) : bool` predicate to daslib/macro_boost. Returns true if an expression has — or might have — side effects; false ONLY when provably pure. Intended for macro-time elision of discardable evaluations. Classification: - Safe leaves: ExprVar, all ExprConst*, ExprAddr, ExprTypeInfo/Decl/Tag. - Safe via recursion: ExprField/SafeField/Swizzle, ExprRef2Value/Ptr, ExprPtr2Ref, ExprAddr, ExprIs/IsVariant/AsVariant/SafeAsVariant, ExprCast, ExprNullCoalescing, ExprStringBuilder (string heap is no-op per compiler), ExprKeyExists (pure container read). - ExprAt: safe when subexpr type is NOT isGoodTableType (tables auto- insert on missing key); ExprSafeAt always safe. - ExprOp1/Op2/Op3: op-name allowlist for pure ops on workhorse types (bypasses func==null artifacts from partial folding); falls back to the function-flag check. `/` and `%` blacklisted (div-by-zero panic). - ExprCall: allowlist `func.flags.builtIn && !knownSideEffects && !unsafeOperation`, recurse args. - Everything else: conservative true. Counter-lane integration in daslib/linq_fold.das: 1. Discardable `var vfinal = projection` bind is now emitted only when `has_sideeffects(projection)` returns true. Pure projections like `_._field * 2` produce a bare-loop counter at macro time, no optimizer DCE required. 2. count→length shortcut: when the counter lane has no where-filter AND every projection in the chain is pure AND the source has a known length (array/table/string/range/fixed-array), the planner emits `length(src)` directly — the loop is elided entirely. select_count benchmark drops from 2 ns/op to 0 ns/op. 3. peel_each fix: `each` is a daslang generic, so the resolved `func.name` on a typed call is the mangled instance. The original peel only matched `func.name == "each"` and never fired for typed chains. Now also checks `func.fromGeneric.name == "each"`. Gated to array-shaped arguments (isGoodArrayType || isArray) so iterator- yielding sources like `each(range(10))` keep their wrapper. 4. Block-parameter typedecl branched on source shape: iterator sources keep `-const` (rvalue, must be consumable); array sources keep the source's `const&` modifier (peeled `let arr <-` is const-ref). Tests: - tests/macro_boost/test_has_sideeffects.das — 24 cases (17 safe + 5 unsafe + 2 conservative-unsafe) wired via a `_test_has_sideeffects` probe call_macro that emits ExprConstBool at macro time. - tests/linq/test_linq_fold_ast.das — 5 new tests: * test_pure_projection_uses_length_shortcut — invoke body returns `length(src)` directly, no for loop. * test_bare_count_uses_length_shortcut — same for `each(arr).count()`. * test_impure_projection_keeps_bind — for-body has bind + ++acc. * test_peel_each_on_array_source / _on_bare_count — assert peel fires. * test_peel_each_skips_non_array_source — `each(range(...))` keeps its wrapper (gate prevents iterator-source peeling). * test_target_each_range_count_runs — behavioral check for iterator-source chains. Benchmarks (100K rows, INTERP, vs Phase 2A baseline): - select_count: 2 → 0 ns/op (length shortcut elides loop entirely) - chained_where: 8 → 6 ns/op (peel + const-ref param) - count_aggregate: 5 → 4 ns/op (1ns from peel) - to_array_filter: 11 → 10 ns/op (1ns from peel) 569/569 linq tests + 51/51 fold-AST + 24/24 has_sideeffects pass. Co-Authored-By: Claude Opus 4.7 (1M context) --- benchmarks/sql/LINQ.md | 8 +- daslib/linq_fold.das | 96 ++++++++-- daslib/macro_boost.das | 135 +++++++++++++ tests/linq/test_linq_fold_ast.das | 188 +++++++++++++++++++ tests/macro_boost/_has_sideeffects_probe.das | 32 ++++ tests/macro_boost/test_has_sideeffects.das | 181 ++++++++++++++++++ 6 files changed, 616 insertions(+), 24 deletions(-) create mode 100644 tests/macro_boost/_has_sideeffects_probe.das create mode 100644 tests/macro_boost/test_has_sideeffects.das diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md index 8868addf04..54506d59c2 100644 --- a/benchmarks/sql/LINQ.md +++ b/benchmarks/sql/LINQ.md @@ -84,10 +84,10 @@ Notation: `—` means the variant is not applicable for this benchmark (operator | Benchmark | Shape | m3f_old | m3f (Phase 2A) | Delta | |---|---|---:|---:|---| -| count_aggregate | `where → count` | 5 | 5 | parity (same counter loop) | -| chained_where | `where → where → count` | 17 | 8 | **2.1× faster** (fuses chained wheres into single `&&` predicate) | -| select_count | `select → count` | 15 | 2 | **7.5× faster** (counter lane evaluates projection per iteration to preserve side effects; optimizer DCEs pure projections, no array materialization) | -| to_array_filter | `where → select → to_array` | 11 | 11 | parity (after `each()` peel + reserve + workhorse `push`) | +| count_aggregate | `where → count` | 5 | 4 | parity-ish (1ns improvement from `each()` peel) | +| chained_where | `where → where → count` | 17 | 6 | **2.8× faster** (fuses chained wheres into single `&&` predicate; small gain from peel + const-ref param) | +| select_count | `select → count` | 15 | 0 | **∞ faster** — when the projection is pure (`has_sideeffects == false`) and the source has length, the counter lane shortcuts to `length(src)` and elides the loop entirely. See [macro_boost::has_sideeffects](../../daslib/macro_boost.das) and `linq_fold.das:plan_loop_or_count` | +| to_array_filter | `where → select → to_array` | 11 | 10 | parity (after `each()` peel + reserve + workhorse `push`) | Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch. diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das index d4eaa83e6b..39ebbde3f3 100644 --- a/daslib/linq_fold.das +++ b/daslib/linq_fold.das @@ -534,18 +534,30 @@ def private type_has_length(t : TypeDecl?) : bool { } [macro_function] -def private peel_each_length_source(var top : Expression?) : Expression? { - // If `top` is `each()` and `` has a length-supporting type, return `` so - // the emitted loop iterates the underlying container directly — lets the array-lane - // reserve fire and avoids the iterator wrapper. Iteration semantics are preserved - // (`for (it in each(arr))` and `for (it in arr)` yield the same element refs). - // Restricted to length-supporting types to keep `reserve(length(src))` valid. +def private is_each_call(call : ExprCall?) : bool { + //! `each` in daslib/builtin.das is generic, so the resolved `func.name` on a typed + //! call is the mangled instance name (e.g. `builtin\`each\`30908...`). The generic's + //! original name lives in `func.fromGeneric.name`. Match either. + if (call == null || call.func == null) return false + return (call.func.name == "each" + || (call.func.fromGeneric != null && call.func.fromGeneric.name == "each")) +} + +[macro_function] +def private peel_each(var top : Expression?) : Expression? { + // Unwrap `each()` to `` when `` is a true array (or fixed-size array). + // Iteration semantics are preserved: `for it in ` implicitly re-wraps via the + // same `each` overload. We gate on array-ness because peeling an iterator-typed + // argument (e.g. `each(range(10))`, `each(generator())`) would put the iterator in + // place — the downstream length shortcut and reserve-by-length hints assume an + // indexable source. Only peel when we can prove that's true. if (!(top is ExprCall)) return top var topCall = top as ExprCall - if (topCall.func == null || topCall.func.name != "each" - || topCall.arguments |> length != 1 - || !type_has_length(topCall.arguments[0]._type)) return top - return clone_expression(topCall.arguments[0]) + if (!is_each_call(topCall) || topCall.arguments |> length != 1) return top + let argExpr = topCall.arguments[0] + if ((argExpr == null || argExpr._type == null) + || (!argExpr._type.isGoodArrayType && !argExpr._type.isArray)) return top + return clone_expression(argExpr) } [macro_function] @@ -556,7 +568,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { // with a plain for-loop. Returns null for anything else — caller falls through unfolded. var (top, calls) = flatten_linq(expr) if (empty(calls)) return null - top = peel_each_length_source(top) + top = peel_each(top) let lastName = calls.back()._1.name if (lastName != "count" && lastName != "where_" && lastName != "select") return null let counterLane = lastName == "count" @@ -569,6 +581,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { var projection : Expression? var intermediateBinds : array var seenSelect = false + var allProjectionsPure = true var elementType = clone_type(top._type.firstType) var lastBindName = itName for (i in 0 .. intermediateCount) { @@ -593,6 +606,9 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { if (projection != null) { let prevWorkhorse = projection._type != null && projection._type.isWorkhorseType if (!prevWorkhorse) return null // chained non-workhorse selects — Phase 2B + if (has_sideeffects(projection)) { + allProjectionsPure = false + } let bindName = "`v`{at.line}`{at.column}`{length(intermediateBinds)}" intermediateBinds |> push <| qmacro_expr() { var $i(bindName) = $e(projection) @@ -606,6 +622,26 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { return null } } + if (projection != null && has_sideeffects(projection)) { + allProjectionsPure = false + } + // Counter-lane shortcut: when there's no filter and every projection in the chain is + // pure, the count is simply `length(source)`. Skip the loop entirely — no per-element + // increments, no per-element side-effect evaluation. Gated on `type_has_length` so we + // only emit `length(src)` when it's statically resolvable. + if (counterLane && whereCond == null && allProjectionsPure + && type_has_length(top._type)) { + var topExpr = clone_expression(top) + topExpr.genFlags.alwaysSafe = true + var res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) { + return length($i(srcName)) + }, $e(topExpr))) + res.force_at(at) + res.force_generated(true) + let blk = (res as ExprInvoke).arguments[0] as ExprMakeBlock + (blk._block as ExprBlock).arguments[0].flags.can_shadow = true + return res + } // Build the per-element loop body. var loopBody : Expression? if (counterLane) { @@ -618,7 +654,10 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { for (b in intermediateBinds) { sideEffectStmts |> push(b) } - if (projection != null) { + // Bind the final projection only when it might have side effects. Pure projections + // (the common case — `_._field * 2`) can be elided entirely; no need to rely on + // the optimizer to DCE a dead store afterwards. + if (projection != null && has_sideeffects(projection)) { let finalBindName = "`vfinal`{at.line}`{at.column}" sideEffectStmts |> push <| qmacro_expr() { var $i(finalBindName) = $e(projection) @@ -713,14 +752,31 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { var topExpr = clone_expression(top) topExpr.genFlags.alwaysSafe = true var res : Expression? + // Pick the block-parameter typedecl modifier by source shape: + // - iterator (rvalue, e.g. `each(range(10))`) — strip `-const` so the body can + // consume the iterator. Without the strip, daslang's typer reports + // "can't iterate over const iterator". + // - container with length (array/table/string/range/fixed-array) — keep modifiers + // so a `const&` source (e.g. `let arr <-`) matches the param exactly. + let topIsIter = top._type != null && top._type.isIterator if (counterLane) { - res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { - var $i(accName) = 0 - for ($i(itName) in $i(srcName)) { - $e(loopBody) - } - return $i(accName) - }, $e(topExpr))) + if (topIsIter) { + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { + var $i(accName) = 0 + for ($i(itName) in $i(srcName)) { + $e(loopBody) + } + return $i(accName) + }, $e(topExpr))) + } else { + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) { + var $i(accName) = 0 + for ($i(itName) in $i(srcName)) { + $e(loopBody) + } + return $i(accName) + }, $e(topExpr))) + } } else { let isIter = expr._type.isIterator // Pre-reserve the accumulator to the source's length when the source has a known @@ -736,7 +792,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? { return <- $i(accName).to_sequence_move() }, $e(topExpr))) } elif (sourceHasLength) { - res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) { var $i(accName) : array<$t(elementType)> $i(accName) |> reserve(length($i(srcName))) for ($i(itName) in $i(srcName)) { diff --git a/daslib/macro_boost.das b/daslib/macro_boost.das index 02cb2923bf..1fa5452b6c 100644 --- a/daslib/macro_boost.das +++ b/daslib/macro_boost.das @@ -149,3 +149,138 @@ def public collect_labels(expr : ExpressionPtr) { return <- res } +[macro_function] +def public has_sideeffects(expr : Expression?) : bool { + //! Conservative side-effect detection. Returns true when the expression has — or + //! might have — side effects. Returns false ONLY when provably pure (no function + //! calls, no heap allocation, no container mutation). + //! + //! Intended for macro-time elision of discardable evaluations. + //! Callers treat false as a promise; true is the safe default — when in doubt, true. + // null / compiler-tagged-pure / variable reads / constant literals — leaf, safe. + if (expr == null || expr.flags.noSideEffects + || expr is ExprVar + || expr is ExprConstInt || expr is ExprConstInt8 || expr is ExprConstInt16 + || expr is ExprConstInt64 || expr is ExprConstUInt || expr is ExprConstUInt8 + || expr is ExprConstUInt16 || expr is ExprConstUInt64 || expr is ExprConstFloat + || expr is ExprConstDouble || expr is ExprConstBool || expr is ExprConstString + || expr is ExprConstPtr || expr is ExprConstRange || expr is ExprConstURange + || expr is ExprConstRange64 || expr is ExprConstURange64 + || expr is ExprConstEnumeration || expr is ExprConstBitfield) return false + // Member access — recurse into operand. + if (expr is ExprField) return has_sideeffects((expr as ExprField).value) + if (expr is ExprSafeField) return has_sideeffects((expr as ExprSafeField).value) + if (expr is ExprSwizzle) return has_sideeffects((expr as ExprSwizzle).value) + // Pointer / reference artifacts. + if (expr is ExprRef2Value) return has_sideeffects((expr as ExprRef2Value).subexpr) + if (expr is ExprRef2Ptr) return has_sideeffects((expr as ExprRef2Ptr).subexpr) + if (expr is ExprPtr2Ref) return has_sideeffects((expr as ExprPtr2Ref).subexpr) + if (expr is ExprAddr) return false + // Type / variant checks. + if (expr is ExprIs) return has_sideeffects((expr as ExprIs).subexpr) + if (expr is ExprIsVariant) return has_sideeffects((expr as ExprIsVariant).value) + if (expr is ExprAsVariant) return has_sideeffects((expr as ExprAsVariant).value) + if (expr is ExprSafeAsVariant) return has_sideeffects((expr as ExprSafeAsVariant).value) + // Cast — recurse. + if (expr is ExprCast) return has_sideeffects((expr as ExprCast).subexpr) + // Compile-time meta. + if (expr is ExprTypeInfo || expr is ExprTypeDecl || expr is ExprTag) return false + // Subscripts. + if (expr is ExprAt) { + let at_e = expr as ExprAt + // tables auto-insert on missing key — unsafe; arrays/strings safe (read-only). + if (at_e.subexpr == null || at_e.subexpr._type == null + || at_e.subexpr._type.isGoodTableType) return true + return has_sideeffects(at_e.subexpr) || has_sideeffects(at_e.index) + } + if (expr is ExprSafeAt) { + let sat = expr as ExprSafeAt + return has_sideeffects(sat.subexpr) || has_sideeffects(sat.index) + } + // Null coalescing. + if (expr is ExprNullCoalescing) { + let nc = expr as ExprNullCoalescing + return has_sideeffects(nc.subexpr) || has_sideeffects(nc.defaultValue) + } + // String builder — string heap allocation is no-op by compiler; recurse into operands. + if (expr is ExprStringBuilder) { + let sb = expr as ExprStringBuilder + for (e in sb.elements) { + if (has_sideeffects(e)) return true + } + return false + } + // key_exists is a pure container read. + if (expr is ExprKeyExists) { + let ke = expr as ExprKeyExists + for (a in ke.arguments) { + if (has_sideeffects(a)) return true + } + return false + } + // Function-call-shaped expressions: ExprCall (regular call) and ExprOp1/ExprOp2/ExprOp3 + // (operators, which also resolve to a function). All carry a resolved `func` field + // when typing completed. But the typer sometimes leaves `func` null on operator + // expressions (e.g. after partial constant folding), so we also keep an op-name + // allowlist for the common pure operators on workhorse types — that bypasses + // resolution-timing artifacts. `/` and `%` stay UNSAFE (div-by-zero panic; design + // decision). Compound-assignment ops are not in the allowlist (mutation). + // + // `is`/`as` on handled types is EXACT-rtti (see CLAUDE.md), so each shape needs its + // own branch — can't cast ExprOp2 to ExprCallFunc even though the C++ class inherits. + if (expr is ExprOp1) { + let e1 = expr as ExprOp1 + if (!is_safe_op1(e1.op) && func_has_sideeffects(e1.func)) return true + return has_sideeffects(e1.subexpr) + } + if (expr is ExprOp2) { + let e2 = expr as ExprOp2 + // Unsafe: division/modulo (div-by-zero panic, design decision); or op not in the + // safe allowlist AND the resolved func indicates side effects. The allowlist also + // bypasses func==null artifacts from partial folding. + if (e2.op == "/" || e2.op == "%" + || (!is_safe_op2(e2.op) && func_has_sideeffects(e2.func))) return true + return has_sideeffects(e2.left) || has_sideeffects(e2.right) + } + if (expr is ExprOp3) { + let e3 = expr as ExprOp3 + // ExprOp3 is the only ternary `?:` in daslang — pure if operands pure. + return has_sideeffects(e3.subexpr) || has_sideeffects(e3.left) || has_sideeffects(e3.right) + } + if (expr is ExprCall) { + let ec = expr as ExprCall + if (func_has_sideeffects(ec.func)) return true + for (a in ec.arguments) { + if (has_sideeffects(a)) return true + } + return false + } + // Default: unknown → unsafe. + return true +} + +[macro_function] +def private func_has_sideeffects(f : Function?) : bool { + //! True when calling `f` may have side effects. Allowlists builtins + //! (`flags.builtIn`) without `knownSideEffects` or `unsafeOperation`. + return (f == null || !f.flags.builtIn + || f.flags.knownSideEffects || f.flags.unsafeOperation) +} + +[macro_function] +def private is_safe_op1(op : das_string) : bool { + //! Unary operators that are pure on workhorse types — no overflow trap, no mutation. + //! Excludes `++` / `--` (mutation). + return op == "-" || op == "!" || op == "~" || op == "+" +} + +[macro_function] +def private is_safe_op2(op : das_string) : bool { + //! Binary operators that are pure on workhorse types. Excludes `/`, `%` (div-by-zero + //! panic — design decision) and all compound-assignment ops (mutation). + return (op == "+" || op == "-" || op == "*" + || op == "==" || op == "!=" || op == "<" || op == "<=" || op == ">" || op == ">=" + || op == "&" || op == "|" || op == "^" || op == "<<" || op == ">>" + || op == "&&" || op == "||") +} + diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das index e47cf2e785..78a70051c3 100644 --- a/tests/linq/test_linq_fold_ast.das +++ b/tests/linq/test_linq_fold_ast.das @@ -430,6 +430,26 @@ def target_select_count_fold() : int { return _fold(each([1, 2, 3, 4, 5])._select(_ * 2).count()) } +var g_select_count_proj_hits = 0 + +def side_effect_select_proj(x : int) : int { + g_select_count_proj_hits ++ + return x * 2 +} + +[export, marker(no_coverage)] +def target_select_count_fold_impure() : int { + return _fold(each([1, 2, 3, 4, 5])._select(side_effect_select_proj(_)).count()) +} + +// `each(range(...))` — argument is a `range`, not an array. peel_each must NOT fire +// here; we'd otherwise replace the iterator-yielding each call with the raw range +// and downstream length-shortcut / reserve-by-length would silently misbehave. +[export, marker(no_coverage)] +def target_each_range_count() : int { + return _fold(each(range(10))._where(_ > 5).count()) +} + // ── Tests: `_fold` Phase-2A loop emission ────────────────────────────── // Phase-2A `_fold` emits explicit for-loops inside an `invoke($block, $src)` wrapper // (no `ExprArrayComprehension` nodes). Each test asserts the invoke wrapper exists @@ -616,3 +636,171 @@ def test_select_count_fold_result(t : T?) { t |> equal(target_select_count_fold(), 5) } } + +// ── Counter-lane projection elision (has_sideeffects integration) ────── +// For pure counter chains (`_select(_ * 2).count()`, bare `.count()`, etc.) on +// length-supporting sources, the planner emits a `length(source)` shortcut and +// the for-loop is dropped entirely. For impure projections (function call w/ +// side effects), the per-element loop is preserved with the discardable bind. + +// Returns the number of ExprLet/ExprFor statements in the counter-lane invoke's +// inner block. Pure shortcut: `[var src, return length(src)]` → 0 for-loops. +// Impure loop: `[var src, var acc=0, for {...}, return acc]` → 1 for-loop. +def count_inner_for_loops(body_expr : Expression?) : int { + if (!(body_expr is ExprInvoke)) return -1 + let inv = body_expr as ExprInvoke + if (empty(inv.arguments) || !(inv.arguments[0] is ExprMakeBlock)) return -1 + let mb = inv.arguments[0] as ExprMakeBlock + let outer = mb._block as ExprBlock + if (outer == null) return -1 + var n = 0 + for (stmt in outer.list) { + if (stmt is ExprFor) { + n ++ + } + } + return n +} + +// Returns the number of stmts in the for-body, or -1 if no for loop exists. +def count_for_body_stmts(body_expr : Expression?) : int { + if (!(body_expr is ExprInvoke)) return -1 + let inv = body_expr as ExprInvoke + if (empty(inv.arguments) || !(inv.arguments[0] is ExprMakeBlock)) return -1 + let mb = inv.arguments[0] as ExprMakeBlock + let outer = mb._block as ExprBlock + if (outer == null) return -1 + for (stmt in outer.list) { + if (stmt is ExprFor) { + let fe = stmt as ExprFor + let fbody = fe.body as ExprBlock + if (fbody == null) return -1 + return length(fbody.list) + } + } + return -1 +} + +[test] +def test_pure_projection_uses_length_shortcut(t : T?) { + // `_select(_ * 2).count()` on a length-supporting source should collapse to + // `length(source)` — no for-loop emitted at all. + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_select_count_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper") + let n = count_inner_for_loops(body_expr) + t |> equal(n, 0, "pure select-count must emit length() shortcut (no for loop)") + } +} + +[test] +def test_bare_count_uses_length_shortcut(t : T?) { + // Bare `.count()` on an array source should also use the length shortcut. + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_count_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper") + let n = count_inner_for_loops(body_expr) + t |> equal(n, 0, "bare count on length-supporting source must use length() shortcut") + } +} + +[test] +def test_impure_projection_keeps_bind(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_select_count_fold_impure) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched && body_expr is ExprInvoke, "expected counter-lane invoke wrapper") + let n = count_for_body_stmts(body_expr) + t |> equal(n, 2, "impure projection should preserve vfinal bind (for-body has bind + ++acc)") + } +} + +// ── peel_each invariant: each() must always be peeled ────────── +// The planner's `peel_each` helper unwraps `each(x)` regardless of x's type so +// the emitted block sees the underlying container directly. Without this, the +// length() shortcut would never fire (each returns an iterator, which has no +// length) and array-lane reserve would emit against the iterator wrapper. + +// Returns the second arg of the invoke (the source expression passed in). If +// it's still an ExprCall to `each`, peel didn't run. +def invoke_source_is_each_wrapped(body_expr : Expression?) : bool { + if (!(body_expr is ExprInvoke)) return false + let inv = body_expr as ExprInvoke + if (length(inv.arguments) < 2 || !(inv.arguments[1] is ExprCall)) return false + let src_call = inv.arguments[1] as ExprCall + if (src_call.func == null) return false + return (src_call.func.name == "each" + || (src_call.func.fromGeneric != null && src_call.func.fromGeneric.name == "each")) +} + +[test] +def test_peel_each_on_array_source(t : T?) { + // Sanity: target_select_count_fold uses `each([1,2,3,4,5])`. After peel, the + // invoke wrapper must NOT receive an each-wrapped source. + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_select_count_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper") + t |> success(!invoke_source_is_each_wrapped(body_expr), + "peel_each must unwrap each(array) at macro time") + } +} + +[test] +def test_peel_each_on_bare_count(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_count_fold) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper") + t |> success(!invoke_source_is_each_wrapped(body_expr), + "peel_each must unwrap each(array) at macro time") + } +} + +// Negative case: `each(range(...))` argument is an iterator-yielding range, not an +// array. peel_each must NOT fire — peeling would drop the each call and put the raw +// range in source position; the downstream length-shortcut and reserve hints would +// then misbehave on a non-indexable source. +[test] +def test_peel_each_skips_non_array_source(t : T?) { + ast_gc_guard() { + var func = find_module_function_via_rtti(compiling_module(), @@target_each_range_count) + if (func == null) return + var body_expr : ExpressionPtr + let r = qmatch_function(func) $() { + return $e(body_expr) + } + t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper") + t |> success(invoke_source_is_each_wrapped(body_expr), + "peel_each must keep each(range) wrapper — only arrays may be peeled") + } +} + +[test] +def test_target_each_range_count_runs(t : T?) { + // Behavioral: ensure the iterator-source chain still compiles and produces the + // expected count. range(10) → [0,1,2,3,4,5,6,7,8,9]; filter > 5 → 4 elements. + t |> equal(target_each_range_count(), 4) +} diff --git a/tests/macro_boost/_has_sideeffects_probe.das b/tests/macro_boost/_has_sideeffects_probe.das new file mode 100644 index 0000000000..4b7dad96b6 --- /dev/null +++ b/tests/macro_boost/_has_sideeffects_probe.das @@ -0,0 +1,32 @@ +// Helper module for tests/macro_boost/test_has_sideeffects.das. +// +// Provides ``_test_has_sideeffects(expr)`` — a [call_macro] that invokes +// ``macro_boost::has_sideeffects`` on its argument at macro time and replaces +// the call with an ``ExprConstBool`` of the result. Lets test functions +// assert side-effect classification by writing ``t |> equal(_test_has_sideeffects(...), false)``. +// +// Lives in a separate ``.das`` with a leading underscore so dastest's file +// discovery skips it as a test. +options gen2 +options indenting = 4 + +module _has_sideeffects_probe public + +require daslib/ast public +require daslib/ast_boost +require daslib/macro_boost public + +[call_macro(name = "_test_has_sideeffects")] +class private TestHasSideeffects : AstCallMacro { + def override visit(prog : ProgramPtr; mod : Module?; var call : ExprCallMacro?) : Expression? { + if (call.arguments |> length != 1) { + macro_error(prog, call.at, "expecting _test_has_sideeffects(expression)") + return null + } + let b = has_sideeffects(call.arguments[0]) + var res : Expression? = new ExprConstBool(at = call.at, value = b) + res.force_at(call.at) + res.force_generated(true) + return res + } +} diff --git a/tests/macro_boost/test_has_sideeffects.das b/tests/macro_boost/test_has_sideeffects.das new file mode 100644 index 0000000000..7651eeb91f --- /dev/null +++ b/tests/macro_boost/test_has_sideeffects.das @@ -0,0 +1,181 @@ +options gen2 +require dastest/testing_boost public +require _has_sideeffects_probe public + +// ── Side-effect-bearing helpers (used as test sources) ──────────────────── + +var g_proj_hits = 0 + +def side_effect_fn(_x : int) : int { + g_proj_hits ++ + return _x * 2 +} + +struct Foo { + a : int + b : int +} + +// ── SAFE cases — has_sideeffects must return false ─────────────────────── +// +// Note: `let _x = 5` lets the compiler fold expressions using `_x` into constants +// before the call_macro runs (so the macro sees ExprConstInt, not the original +// ExprOp2). To exercise the operator paths explicitly, tests below use `var`. + +[test] +def test_const_int(t : T?) { + t |> equal(_test_has_sideeffects(42), false) +} + +[test] +def test_const_string(t : T?) { + t |> equal(_test_has_sideeffects("hello"), false) +} + +[test] +def test_const_bool(t : T?) { + t |> equal(_test_has_sideeffects(true), false) +} + +[test] +def test_var_read(t : T?) { + var _x = 5 + t |> equal(_test_has_sideeffects(_x), false) +} + +[test] +def test_arith_pure(t : T?) { + var _x = 5 + t |> equal(_test_has_sideeffects(_x + 1), false) +} + +[test] +def test_arith_nested(t : T?) { + var _x = 5 + var _y = 3 + t |> equal(_test_has_sideeffects(_x * 2 + _y - 3), false) +} + +[test] +def test_field_access(t : T?) { + var _s = Foo(a = 1, b = 2) + t |> equal(_test_has_sideeffects(_s.a), false) +} + +[test] +def test_array_index(t : T?) { + var _arr = [1, 2, 3, 4, 5] + t |> equal(_test_has_sideeffects(_arr[0]), false) +} + +[test] +def test_safe_table_lookup(t : T?) { + var tab : table + tab |> insert("k", 1) + t |> equal(_test_has_sideeffects(tab?["k"]), false) +} + +[test] +def test_comparison(t : T?) { + var _x = 5 + t |> equal(_test_has_sideeffects(_x == 0), false) +} + +[test] +def test_ternary(t : T?) { + var _x = 5 + t |> equal(_test_has_sideeffects(_x > 0 ? 1 : 2), false) +} + +[test] +def test_null_coalescing(t : T?) { + var _p : int? = null + t |> equal(_test_has_sideeffects(_p ?? 0), false) +} + +[test] +def test_logical_and(t : T?) { + var _x = 5 + var _y = 10 + t |> equal(_test_has_sideeffects(_x > 0 && _y < 100), false) +} + +[test] +def test_unary_neg(t : T?) { + var _x = 5 + t |> equal(_test_has_sideeffects(-_x), false) +} + +[test] +def test_string_builder_safe(t : T?) { + var _x = 5 + t |> equal(_test_has_sideeffects("hello {_x}"), false) +} + +// ── UNSAFE cases — has_sideeffects must return true ────────────────────── + +[test] +def test_user_call_unsafe(t : T?) { + var _x = 5 + t |> equal(_test_has_sideeffects(side_effect_fn(_x)), true) +} + +[test] +def test_table_insert_subscript(t : T?) { + var _tab : table + // _tab[k] auto-inserts a default value if k is missing — side effect. + t |> equal(_test_has_sideeffects(_tab["k"]), true) +} + +[test] +def test_division_unsafe(t : T?) { + var _x = 10 + var _y = 2 + // `/` can panic on div-by-zero — kept on the unsafe side by explicit blacklist. + t |> equal(_test_has_sideeffects(_x / _y), true) +} + +[test] +def test_modulo_unsafe(t : T?) { + var _x = 10 + var _y = 3 + t |> equal(_test_has_sideeffects(_x % _y), true) +} + +[test] +def test_array_literal_alloc(t : T?) { + t |> equal(_test_has_sideeffects([1, 2, 3]), true) +} + +[test] +def test_struct_construct_alloc(t : T?) { + t |> equal(_test_has_sideeffects(Foo(a = 1, b = 2)), true) +} + +[test] +def test_string_builder_unsafe_part(t : T?) { + var _x = 5 + // The string interpolation itself is safe, but a side-effecting operand propagates. + t |> equal(_test_has_sideeffects("hello {side_effect_fn(_x)}"), true) +} + +// ── Conservative-unsafe cases — daslang-generic helpers fall through ───── +// +// `length`, `key_exists`, etc. are defined as daslang generics in builtin.das +// (`def length(a : auto | #) ...`). The compile-time func resolution doesn't +// always reach a `flags.builtIn=true` C++ overload before the call_macro runs, +// so the conservative classifier rejects them. A future Function-level +// `[no_side_effects]` annotation could let user-defined helpers opt in. + +[test] +def test_generic_length_unresolved(t : T?) { + var _arr = [1, 2, 3] + t |> equal(_test_has_sideeffects(length(_arr)), true) +} + +[test] +def test_key_exists_unresolved(t : T?) { + var tab : table + tab |> insert("k", 1) + t |> equal(_test_has_sideeffects(key_exists(tab, "k")), true) +} From f77a072c570b65e910423cc8f65d37fc446c705e Mon Sep 17 00:00:00 2001 From: Boris Batkin Date: Sat, 16 May 2026 16:17:07 -0700 Subject: [PATCH 09/18] mouse-data/docs: 16 new + 1 updated card from linq_fold + Phase 2A session MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cards added in the course of the linq_fold splice rewrite + PR #2691 (has_sideeffects + counter-lane elision). Topics: linq_fold / macro-emission patterns: - daslang-generic-instance-detect-via-fromgeneric — func.fromGeneric is the canonical "which generic was this instantiated from?" link; func.name on typed instances is mangled. - daslib-macro-boost-has-sideeffects-predicate — new public predicate, full classification table, known limitations, test plumbing. - qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array — typedecl block-param const/ref handling differs between iterator and array sources; the two diagnostic error messages tell you which branch you picked wrong. - qmacro-gensym-per-callsite-via-lineinfo — backtick-prefixed names + line+column suffix, force_at / force_generated / can_shadow. - my-fold-macro-emits-a-loop-with-for-it-in-source-... (UPDATED) — peel_each pattern corrected for generic-instance detection + positive array gate + block-param typedecl handling. LINQ semantics: - are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-... - which-typedecl-predicates-identify-types-where-length-expr-is-... - why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-... - what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-... - my-macro-substitutes-it-for-a-projection-expression-via-template-... - when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-... - where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-... Tooling / ops: - how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-... - cpp-profiler-macos-samply-instruments.md - what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-... - how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-... Updated: - why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-... — original card pointed at a require-order red herring; real cause was ref_time_ticks() returning ns on POSIX while wait_until_ready's deadline math assumed μs. Fix landed in PR #2685. No code changes — docs only. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...output-to-the-underlying-linq-operators.md | 37 ++++++++ .../cpp-profiler-macos-samply-instruments.md | 68 ++++++++++++++ ...generic-instance-detect-via-fromgeneric.md | 33 +++++++ ...b-macro-boost-has-sideeffects-predicate.md | 43 +++++++++ ...pointer-e-g-addfontfromfilettf-on-getio.md | 66 ++++++++++++++ ...mode-and-what-s-the-command-line-syntax.md | 45 +++++++++ ...e-doesn-t-fire-when-the-chain-starts-wi.md | 56 ++++++++++++ ...-apply-template-but-the-result-fails-to.md | 24 +++++ ...qmacro-gensym-per-callsite-via-lineinfo.md | 43 +++++++++ ...ce-bind-typedecl-modifier-iter-vs-array.md | 46 ++++++++++ ...daslib-das-module-so-docs-build-cleanly.md | 46 ++++++++++ ...tors-aren-t-supported-as-sql-chain-term.md | 39 ++++++++ ...f-typeinfo-is-workhorse-e-proj-or-decid.md | 33 +++++++ ...res-at-the-user-s-call-site-rather-than.md | 36 ++++++++ ...statically-resolvable-in-daslang-macros.md | 63 +++++++++++++ ...-what-s-the-alternative-in-a-linq-chain.md | 36 ++++++++ ...status-works-fine-is-it-a-require-order.md | 91 +++++++++++-------- 17 files changed, 769 insertions(+), 36 deletions(-) create mode 100644 mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md create mode 100644 mouse-data/docs/cpp-profiler-macos-samply-instruments.md create mode 100644 mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md create mode 100644 mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md create mode 100644 mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md create mode 100644 mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md create mode 100644 mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md create mode 100644 mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md create mode 100644 mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md create mode 100644 mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md create mode 100644 mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md create mode 100644 mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md create mode 100644 mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md create mode 100644 mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md create mode 100644 mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md create mode 100644 mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md diff --git a/mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md b/mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md new file mode 100644 index 0000000000..551cbb2b0e --- /dev/null +++ b/mouse-data/docs/are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators.md @@ -0,0 +1,37 @@ +--- +slug: are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-the-underlying-linq-operators +title: Are there parity tests in tests/linq/ that compare `_fold` output to the underlying linq operators? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +There's no file named "parity" or similar. The parity-test surface IS the broader [tests/linq/](tests/linq/) directory: + +- `test_linq.das` — comprehension basics +- `test_linq_aggregation.das` — count/sum/min/max/avg +- `test_linq_querying.das` — any/all/contains +- `test_linq_transform.das` — select / select_many / zip +- `test_linq_sorting.das` — order / reverse +- `test_linq_group_by.das` — group_by / having +- `test_linq_join.das` — joins +- `test_linq_partition.das` — take / skip / chunk / take_while / skip_while +- `test_linq_set.das` — distinct / union / except / intersect / unique +- `test_linq_element.das` — first / last / single / element_at +- `test_linq_concat.das` — concat / prepend / append +- `test_linq_generation.das` — range / repeat +- `test_linq_bugs.das` — regressions + +Each file uses `[test]` functions with `t |> run("name") @(t) { ... }` blocks asserting `t |> equal(actual, expected)`. These exercise the regular linq operators (`where_`, `select`, `count`, ...) directly — they're not split into "fold-on" vs "fold-off" variants. + +Dedicated `_fold` tests live in `test_linq_fold.das` (functional output) and `test_linq_fold_ast.das` (AST-shape verification — pattern-matches the macro expansion). These DO compare `_fold(chain)` output against the plain `chain` output for the shapes the macro recognizes. + +When the user says "parity tests" in linq context, treat the full `test_linq_*.das` suite as the operator-coverage map. Phase-2+ benchmark/splice PRs should add a `benchmarks/sql/` entry for each shape exercised here that isn't already covered (tracked as a checklist in `benchmarks/sql/LINQ.md`). + +## Questions +- Are there parity tests in tests/linq/ that compare `_fold` output to the underlying linq operators? +- What's the "parity test" coverage surface for linq? +- Where are tests for linq operators? + +## Questions +- Are there parity tests in tests/linq/ that compare `_fold` output to the underlying linq operators? diff --git a/mouse-data/docs/cpp-profiler-macos-samply-instruments.md b/mouse-data/docs/cpp-profiler-macos-samply-instruments.md new file mode 100644 index 0000000000..db57cabc9b --- /dev/null +++ b/mouse-data/docs/cpp-profiler-macos-samply-instruments.md @@ -0,0 +1,68 @@ +--- +slug: cpp-profiler-macos-samply-instruments +title: What C++ sampling profiler should I use on macOS for daslang (and how do I run it)? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +# C++ sampling profiler on macOS (Apple Silicon) + +VS Code has **no first-class C++ profiler integration on macOS** — the "Performance Profiler" / similar extensions wrap Linux `perf` and don't help here. Skip them. Run a sampler from the integrated terminal and view results in browser/Instruments. + +## samply (default choice) + +Rust-built, Firefox-Profiler frontend, zero config. + +```bash +cargo install samply +samply record ./build/daslang script.das +``` + +- Opens flamegraph in browser automatically. +- Symbolicates Mach-O cleanly if you build `-DCMAKE_BUILD_TYPE=RelWithDebInfo` (do NOT use plain `Release` — symbols are stripped). +- Works without sudo on Apple Silicon. +- Good for "where does the CPU go" questions. + +## Xcode Instruments — Time Profiler (second opinion) + +Native macOS sampler, kernel-assisted, best symbolication on Apple Silicon. Use when samply's view is ambiguous or you want call-tree + timeline together. + +```bash +xcrun xctrace record --template 'Time Profiler' --launch -- ./build/daslang script.das +``` + +Then open the resulting `.trace` bundle (Instruments launches). UI is outside VS Code. + +## daslang-specific recipe + +Pair the sampler with the per-module compile-time logging (`-log-compile-time` CLI flag, added on branch `bbatkin/log-compile-time-cli`): + +```bash +cmake --build build --config RelWithDebInfo -j 64 +samply record ./build/daslang -log-compile-time path/to/script.das +``` + +- `-log-compile-time` tells you which module is slow. +- Sampling tells you which function inside that module is hot. +- Together they narrow "compile is slow" to a specific phase + symbol. + +## What NOT to use + +- `perf` — Linux only, doesn't exist on Darwin. +- Intel VTune — x86-mostly, ignore on Apple Silicon. +- `gprof` — instrumenting, not sampling; ancient. +- VS Code C++ profiler extensions — see above, all are Linux/perf wrappers or toys. +- `hyperfine` / `poop` — benchmarking (whole-program timing), not profiling (per-function hotspots). Different question. + +## Build flag reminder + +Both samply and Instruments need symbols. The two viable build types on this repo: + +- `RelWithDebInfo` — fast code + symbols. Use this for profiling. +- `Debug` — slow code; profile reflects debug overhead, not real hotspots. Avoid. + +Plain `Release` strips symbols and you'll get `???` everywhere in the flamegraph. + +## Questions +- What C++ sampling profiler should I use on macOS for daslang (and how do I run it)? diff --git a/mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md b/mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md new file mode 100644 index 0000000000..922061d8e7 --- /dev/null +++ b/mouse-data/docs/daslang-generic-instance-detect-via-fromgeneric.md @@ -0,0 +1,33 @@ +--- +slug: daslang-generic-instance-detect-via-fromgeneric +title: How do I detect that an ExprCall is to a daslang generic (e.g. each, length, find) when func.name is the mangled instance name and not the original generic's name? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +When a daslang generic function (`def each(a : array) : iterator`, `def length(a : auto | #) : int`, etc.) is resolved against a concrete type at infer time, the resolved `Function?` instance gets a **mangled name** like `` `builtin`each`30908`12345 ``. Macro code that compares `call.func.name == "each"` will never match a typed instance. + +The original generic's identity lives in `call.func.fromGeneric`: + +```das +[macro_function] +def private is_each_call(call : ExprCall?) : bool { + if (call == null || call.func == null) return false + return (call.func.name == "each" + || (call.func.fromGeneric != null && call.func.fromGeneric.name == "each")) +} +``` + +The `name == "each"` branch covers the unusual case where you see the call before the typer has specialized it (e.g. inside a custom call_macro that runs early). The `fromGeneric.name` branch is the normal case for any post-infer chain. + +**When this bites:** writing a `[macro_function]` that pattern-matches on a stdlib helper by name — `each`, `length`, `key_exists`, `find`, `set_insert`, all the generic `to_array`/`to_table` variants. Without the `fromGeneric` check, every typed chain silently falls through your match and your macro behaves as if the helper wasn't there. + +**Generalizes beyond function calls:** same applies to method overload resolution. `call.func.fromGeneric` is the canonical "which generic was this instantiated from?" link. There's no `originalName` field — the chain is `func → func.fromGeneric → fromGeneric.name`. + +**Doesn't apply to:** C++ builtins from `addExtern<>` (no fromGeneric, the `func.name` is the bound name directly). Builtins also have `func.flags.builtIn = true` if you need to distinguish. + +See [[my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi]] for the concrete case where this broke `peel_each` in `daslib/linq_fold.das`. + +## Questions +- How do I detect that an ExprCall is to a daslang generic (e.g. each, length, find) when func.name is the mangled instance name and not the original generic's name? diff --git a/mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md b/mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md new file mode 100644 index 0000000000..5d551d415c --- /dev/null +++ b/mouse-data/docs/daslib-macro-boost-has-sideeffects-predicate.md @@ -0,0 +1,43 @@ +--- +slug: daslib-macro-boost-has-sideeffects-predicate +title: Is there a conservative side-effect detector for Expression nodes in daslib macro_boost — something I can call from a call_macro to know if it's safe to elide an evaluation at macro time? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +Yes — `has_sideeffects(expr : Expression?) : bool` in `daslib/macro_boost` (added in PR #2691, follow-up to Phase 2A loop planner). Returns `true` if the expression has or **might have** side effects; `false` ONLY when provably pure. + +```das +require daslib/macro_boost public + +if (has_sideeffects(projection)) { + // Emit the bind — projection must run for its side effects. + sideEffectStmts |> push <| qmacro_expr() { + var $i(finalBindName) = $e(projection) + } +} else { + // Skip the bind — pure projection, no observable effect. +} +``` + +**Conservative — false is a promise:** + +- SAFE leaves: `ExprVar`, all `ExprConst*`, `ExprAddr`, `ExprTypeInfo/Decl/Tag`. +- SAFE via recursion: `ExprField`, `ExprSafeField`, `ExprSwizzle`, `ExprRef2Value/Ptr`, `ExprPtr2Ref`, `ExprIs`, `ExprAsVariant`, `ExprIsVariant`, `ExprSafeAsVariant`, `ExprCast`, `ExprNullCoalescing`, `ExprStringBuilder` (string heap is no-op per compiler), `ExprKeyExists` (pure container read). +- `ExprAt`: safe when `subexpr._type` is NOT `isGoodTableType` (tables auto-insert default on missing key — a write). `ExprSafeAt` (`?[...]`) always safe. +- `ExprOp1/Op2/Op3`: op-name allowlist for pure ops on workhorse types — `+ - * == != < <= > >= & | ^ << >> && || ?:` (Op2), `- ! ~ +` (Op1). Falls back to `func.flags.builtIn && !knownSideEffects && !unsafeOperation`. `/` and `%` BLACKLISTED (div-by-zero panic). +- `ExprCall`/`ExprCallFunc`: allowed when `func.flags.builtIn && !knownSideEffects && !unsafeOperation`, then recurse args. +- Everything else (including `ExprNew`, all `ExprMake*`, user-defined calls, `ExprInvoke`, `ExprYield`, statement-context exprs): UNSAFE. + +**Known limitations / when it returns conservative-unsafe:** + +- daslang-generic helpers like `length(arr)` and `key_exists(tab, k)` — the resolved `func.name` is the mangled instance, and the typer hasn't always reached the `flags.builtIn=true` C++ overload before the call_macro fires. They show up as user-call shapes and get rejected. Workaround: don't rely on this for length/key_exists in projections (they appear in `has_sideeffects` tests as `target_generic_length_unresolved` / `target_key_exists_unresolved` returning `true`). +- User-defined pure helpers — there's no `[no_side_effects]` annotation yet. The compiler's `expr.flags.noSideEffects` fast path catches some cases (set during infer), but anything the typer didn't tag falls through to UNSAFE. + +**Tests:** `tests/macro_boost/test_has_sideeffects.das` has 24 cases (17 safe + 5 unsafe + 2 conservative-unsafe) wired via a `_test_has_sideeffects` probe `call_macro` ([`tests/macro_boost/_has_sideeffects_probe.das`](../../tests/macro_boost/_has_sideeffects_probe.das)) that runs the predicate at macro time and emits `ExprConstBool` of the result. Use the same probe pattern when testing any new predicate that needs to run at macro time but be exercised via runtime tests. + +**Real use:** `daslib/linq_fold.das` `plan_loop_or_count` uses it for three optimizations: discardable `var vfinal =` bind elision, count→length shortcut gate (whole loop elided when no filter + all projections pure + source has length), and tracking `allProjectionsPure` across chained selects. select_count benchmark went from 2 → 0 ns/op. + +## Questions +- Is there a conservative side-effect detector for Expression nodes in daslib macro_boost — something I can call from a call_macro to know if it's safe to elide an evaluation at macro time? diff --git a/mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md b/mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md new file mode 100644 index 0000000000..450226dba8 --- /dev/null +++ b/mouse-data/docs/how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio.md @@ -0,0 +1,66 @@ +--- +slug: how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-that-s-bound-as-a-raw-pointer-e-g-addfontfromfilettf-on-getio +title: How do I call a dasImgui (or any managed C++) method on a struct field that's bound as a raw pointer — e.g. AddFontFromFileTTF on GetIO().Fonts? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +## TL;DR + +When a managed struct's field is bound as a pointer (`T?`) and the method on that pointed-to struct expects the value by-ref (`T implicit`), you must explicitly **dereference**. Plain `field |> method(...)` errors with mismatched types. + +## The error you'll hit + +``` +error[30341]: no matching functions or generics: AddFontFromFileTTF(imgui::ImFontAtlas?&, string const&, ...) + candidate function: ImFontAtlas implicit ... + invalid argument 'self' (0). expecting 'imgui::ImFontAtlas implicit', passing 'imgui::ImFontAtlas?&' +``` + +The `?` is the giveaway — `GetIO().Fonts` is `ImFontAtlas?` (raw pointer; field bound via `addField` against C++ `ImFontAtlas* Fonts`), but the method binding `das_call_member< ImFont * (ImFontAtlas::*)(...) >` takes the receiver by-value/ref. + +## The fix + +Bind a local ref through `unsafe(*ptr)`, then call as usual: + +```daslang +var atlas & = unsafe(*GetIO().Fonts) +let f = atlas |> AddFontFromFileTTF(ttf, 14.0f, null, null) +``` + +Equivalent inline form: + +```daslang +unsafe(*GetIO().Fonts) |> AddFontFromFileTTF(ttf, 14.0f, null, null) +``` + +## Why each part + +- **`*ptr`** is daslang's pointer-deref syntax (see `daslib/if_not_null.rst`: *"a dereferenced call: ``if (ptr != null) { call(*ptr, args) }``"*). The alternative `deref(ptr)` exists too but is rarer in modules; `*` is the idiom. +- **`unsafe(...)`** is required because dereferencing a raw `T?` is unsafe (no null check, no lifetime guarantee). +- **`var atlas &`** binds a local *reference* — without `&` you'd be copying the whole `ImFontAtlas` struct into a stack temporary, which (a) wastes memory and (b) means any mutation the method does (font atlas builds, glyph rasterization) hits the copy and is lost. +- **The pipe `|>` works fine on the local ref** — `atlas |> method(x, y)` desugars to `method(atlas, x, y)` and the `implicit` first-param accepts the ref directly. + +## Why NOT the other shapes + +- `GetIO().Fonts.AddFontFromFileTTF(...)` — `.method()` sugar is sugar for `method(self, ...)` only when `self` is a struct value. CLAUDE.md explicitly: *"Does NOT work on: primitives, tuples/arrays, and lambda typedefs"* — and (this case) raw pointers. Field *access* on a pointer auto-derefs (`GetIO().Fonts.TexID` works); method dispatch does not. +- `GetIO().Fonts->AddFontFromFileTTF(...)` — `->` is for class instances (smart_ptr / class types), not raw C-struct pointers from `ManagedStructureAnnotation`. +- `deref(GetIO().Fonts) |> AddFontFromFileTTF(...)` — works but the pipe gets a temporary value not a ref; mutations on the receiver disappear. Use `var x & = unsafe(*p)` instead. + +## When this comes up + +Anywhere a C++ binding exposes a struct field as `T*` (typical for "owns-an-atlas" or "owns-a-context" patterns): +- `ImGuiIO::Fonts` → `ImFontAtlas?` +- `ImDrawData::CmdLists` → indirection on lists +- anything bound via raw `addField` where the C++ type is `Foo*` + +If the C++ field were a value (`ImFontAtlas Fonts;` instead of `ImFontAtlas* Fonts;`), it'd bind as the struct directly and the pipe would just work. + +## Related + +- [[dasimgui-new-state-struct-widget-auto-emit-just-works]] — different topic (state-struct registration) but same module family. +- [[how-do-i-pack-an-im-col32-color-from-dasimgui-v2-code-without-depending-on-the-v1-daslib-imgui-boost-path]] — sibling dasImgui idiom. + +## Questions +- How do I call a dasImgui (or any managed C++) method on a struct field that's bound as a raw pointer — e.g. AddFontFromFileTTF on GetIO().Fonts? diff --git a/mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md b/mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md new file mode 100644 index 0000000000..014873daea --- /dev/null +++ b/mouse-data/docs/how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax.md @@ -0,0 +1,45 @@ +--- +slug: how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-line-syntax +title: How do I run dastest in benchmark-only mode and what's the command-line syntax? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +Benchmarks are functions annotated with `[benchmark]` from `dastest/testing_boost.das`. Run them via the dastest harness with `--bench`: + +```bash +# All benchmarks in a directory (skip the regular tests) +./bin/daslang dastest/dastest.das -- --bench --test benchmarks/sql --test-names none + +# Just one file +./bin/daslang dastest/dastest.das -- --bench --test benchmarks/sql/count_aggregate.das --test-names none + +# Filter by [benchmark] function-name prefix (substring match on the function name) +./bin/daslang dastest/dastest.das -- --bench --bench-names sum_ --test benchmarks/sql --test-names none + +# Collect N samples for variance / averaging +./bin/daslang dastest/dastest.das -- --bench --test benchmarks/sql/count_aggregate.das --test-names none --count 5 +``` + +Key flags: +- `--bench` — enable benchmark execution +- `--test ` — folder or single file (NOT positional) +- `--test-names none` — skip regular `[test]` discovery (benchmarks only) +- `--bench-names ` — filter benchmarks by function-name prefix +- `--bench-format ` — output format +- `--count ` — repeat all benchmarks N times + +Benchmarks only run after all module **tests** have passed; that's why `--test-names none` is the canonical "skip tests, run benchmarks" combo. + +Output is ` N ns/op /op /op /op /op`. If the benchmark `b |> run(name, chunk_size, op)` form passes a chunk_size (typically the dataset size), the displayed ns/op is **divided by that chunk_size** — i.e. per-element time, not per-op-call time. Sub-nanosecond results (`0 ns/op`) usually mean early-exit hit the answer in O(1) regardless of dataset size. + +Reference: `dastest/README.md` and `dastest/dastest_clargs.das`. + +## Questions +- How do I run dastest in benchmark-only mode and what's the command-line syntax? +- What's the dastest --bench command line? +- How do I filter dastest benchmarks by name? + +## Questions +- How do I run dastest in benchmark-only mode and what's the command-line syntax? diff --git a/mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md b/mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md new file mode 100644 index 0000000000..456c78e348 --- /dev/null +++ b/mouse-data/docs/my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi.md @@ -0,0 +1,56 @@ +--- +slug: my-fold-macro-emits-a-loop-with-for-it-in-source-acc-reserve-length-source-but-the-reserve-doesn-t-fire-when-the-chain-starts-wi +title: My fold macro emits a loop with `for (it in source); acc |> reserve(length(source))` but the reserve doesn't fire when the chain starts with `each(arr)`. How do I make it work? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [[daslang-generic-instance-detect-via-fromgeneric]] +--- + +Peel `each()` at macro time. `each(arr)` reports as `iterator`, so any "is the source an iterator?" check (e.g. `top._type.isIterator`) sees `true` and the array-only reserve path is skipped. But the iteration semantics of `for (it in each(arr))` and `for (it in arr)` are identical — the wrapper iterator is incidental in fold context. + +Pattern (corrected version, from `daslib/linq_fold.das` after Phase 2A bind-elision PR): + +```das +[macro_function] +def private is_each_call(call : ExprCall?) : bool { + // `each` in daslib/builtin.das is generic — the resolved `func.name` + // on a typed instance is mangled (e.g. `builtin`each`30908...`). + // The original generic's name lives in `func.fromGeneric.name`. + if (call == null || call.func == null) return false + return (call.func.name == "each" + || (call.func.fromGeneric != null && call.func.fromGeneric.name == "each")) +} + +[macro_function] +def private peel_each(var top : Expression?) : Expression? { + if (!(top is ExprCall)) return top + var topCall = top as ExprCall + if (!is_each_call(topCall) || topCall.arguments |> length != 1) return top + let argExpr = topCall.arguments[0] + // Only peel when the argument is a true array (or fixed-size array). + // Don't peel iterator-typed args like `each(range(10))` — replacing the + // each call with the raw range would break length-shortcut + reserve + // hints that assume an indexable source. + if ((argExpr == null || argExpr._type == null) + || (!argExpr._type.isGoodArrayType && !argExpr._type.isArray)) return top + return clone_expression(argExpr) +} +``` + +**Two gotchas the original version missed:** + +1. `func.name == "each"` never matched typed instances — generic-instance detection requires `fromGeneric.name`. See [[daslang-generic-instance-detect-via-fromgeneric]]. +2. Peel gate must be **positive** (`is good array`) not negative (`isn't iterator`). `each(range(N))` returns an iterator but its argument `range(N)` is also iterator-shaped (`isRange`) and would otherwise pass `!isIterator`. The positive `isGoodArrayType || isArray` gate cleanly excludes range/string/lambda sources. + +**Block-parameter typedecl needs branching on source shape after peel.** When peel fires, the source goes from iterator (rvalue, no modifiers) to array (`array const&` for `let arr <-` chains). The block parameter type: +- iterator source: `typedecl($e(topExpr)) - const` — strip rvalue const so body can iterate +- array source: `typedecl($e(topExpr))` (no modifier) — keep `const&` so const-ref source matches + +Both wrong → either `array const& vs array` mismatch or `can't iterate over const iterator`. + +**What this is worth:** brought `linq_fold`'s `each(arr)._where(...)._select(_.price).to_array()` benchmark from 13 → 10 ns/op (parity with comprehension baseline). The count→length shortcut built on top brings pure `each(arr)._select(_.x).count()` from 2 → 0 ns/op (loop entirely elided). + +**Generalizes:** any fused-loop emitter that needs the source's length (reserve, two-pass, length-aware operators like `take_last`), peel inner-array-yielding wrappers — but use `fromGeneric` for generic helpers and a positive array gate, not a negative iterator gate. + +## Questions +- My fold macro emits a loop with `for (it in source); acc |> reserve(length(source))` but the reserve doesn't fire when the chain starts with `each(arr)`. How do I make it work? diff --git a/mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md b/mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md new file mode 100644 index 0000000000..9652ba3436 --- /dev/null +++ b/mouse-data/docs/my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to.md @@ -0,0 +1,24 @@ +--- +slug: my-macro-substitutes-it-for-a-projection-expression-via-template-replacevariable-it-proj-apply-template-but-the-result-fails-to +title: My macro substitutes `it` for a projection expression via `Template.replaceVariable("it", proj) + apply_template`, but the result fails to compile with "can only dereference a reference". What's going wrong? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +Post-typer, reads of a `var` local appear wrapped as `ExprRef2Value(ExprVar(name))` — the invisible adapter the typer inserts to dereference a reference for its value. `templates_boost.TemplateVisitor.visitExprVar` (the engine behind `Template.replaceVariable + apply_template`) only matches the inner `ExprVar` and replaces IT with a clone of the substitute. The outer `ExprRef2Value` wrapper stays, but now it wraps a non-reference value — compile error `30921: can only dereference a reference`. + +This is the same `ExprRef2Value`-transparency problem `daslib/ast_match.das` documents for `qmatch` — they solve it on both pattern and source sides via `qm_peel_ref2value`. `apply_template` does NOT auto-peel. + +Two fixes for substitution: + +1. **Pre-peel the destination** before `apply_template`: walk `dst` and replace every `ExprRef2Value(ExprVar(name))` with the inner `ExprVar(name)` first. After substitution, the result is clean. Drawback: removes wrappers globally (around other identifiers too) — if other refs still need the wrapper, the typer will re-insert them, but you've added a pass. + +2. **Use a custom visitor instead of `Template.replaceVariable`**: override `visitExprRef2Value` to detect `ExprRef2Value(ExprVar(name))` and return `clone_expression(replacement)` directly (stripping the wrapper as part of the substitution). Override `visitExprVar` as a fallback for bare ExprVars. The pattern mirrors `qm_peel_ref2value`'s "peel both sides" approach. + +Concrete repro: daslang `linq_fold`'s Phase 2A planner tried to fuse chained `_select|_select` via `substitute_it_for(proj2, "it", proj1)`. proj1 was `it * 2` (where `it` is the typed-and-wrapped loop var), proj2 was `it + 1`. Substituting via Template replaced the inner ExprVar in proj2 but left `ExprRef2Value(it * 2) + 1` — type error. The fix was deferred (chained-select falls through unfolded in Phase 2A) but Phase 2B needs option 2. + +See `skills/das_macros.md` "Peel ExprRef2Value before qmatch" for the matcher-side analog. The substitution side has no in-tree helper yet. + +## Questions +- My macro substitutes `it` for a projection expression via `Template.replaceVariable("it", proj) + apply_template`, but the result fails to compile with "can only dereference a reference". What's going wrong? diff --git a/mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md b/mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md new file mode 100644 index 0000000000..515d4d51dd --- /dev/null +++ b/mouse-data/docs/qmacro-gensym-per-callsite-via-lineinfo.md @@ -0,0 +1,43 @@ +--- +slug: qmacro-gensym-per-callsite-via-lineinfo +title: How do I generate a uniquely-named gensym inside an AstCallMacro for a per-call-site variable, using LineInfo? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +Use the call site's `LineInfo.line` + `.column` interpolated into a backtick-prefixed identifier string. Backtick-prefixed names live in a separate namespace so they don't collide with user-typed identifiers and they survive lint/style passes. + +```das +def override visit(prog : ProgramPtr; mod : Module?; var call : ExprCallMacro?) : Expression? { + let at = call.at // LineInfo of the call site + let accName = "`acc`{at.line}`{at.column}" + let itName = "`it`{at.line}`{at.column}" + let srcName = "`src`{at.line}`{at.column}" + + var res = qmacro(invoke($($i(srcName) : typedecl($e(src))) { + var $i(accName) = 0 + for ($i(itName) in $i(srcName)) { + // ... + } + return $i(accName) + }, $e(src))) + res.force_at(at) + res.force_generated(true) + return res +} +``` + +Two follow-up steps you almost always want: + +1. `res.force_at(at)` + `res.force_generated(true)` — sets `at = call.at` on every emitted node and marks them macro-generated. The latter bypasses lint rules that would otherwise fire on synthesized code (e.g. STYLE001, LINT002 "unused variable"). +2. `(blk._block as ExprBlock).arguments[0].flags.can_shadow = true` on the bound let-variable — quiets shadow warnings if the user already has an `acc`/`it`/`src` in scope. Reach for `.flags.can_shadow` on any qmacro-bound name that might collide with caller context. + +**Why include both line AND column:** macros emitted from nested helpers can have several emission sites on the same line (e.g. piped chains where each `|>` step emits a separate gensym). Line alone is not unique. + +**Why backtick prefix:** the backtick is a daslang lexer hint that this is an internal/synthesized name. Without it, very-long generated names sometimes clash with user identifiers or trip naming rules (the formatter, the auto-rename tools). + +**Worked example:** `daslib/linq_fold.das` `plan_loop_or_count` — multiple gensyms per emission site (accumulator, iterator, source, bound projection). Variants per fold-helper too (`fold_where_count` uses `nName` over `accName`). + +## Questions +- How do I generate a uniquely-named gensym inside an AstCallMacro for a per-call-site variable, using LineInfo? diff --git a/mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md b/mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md new file mode 100644 index 0000000000..a1193075a1 --- /dev/null +++ b/mouse-data/docs/qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array.md @@ -0,0 +1,46 @@ +--- +slug: qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array +title: In a call_macro that emits an `invoke($($i(src) : typedecl($e(topExpr)) ) { ... }, $e(topExpr))` wrapper, what `` do I use so the param matches both array and iterator sources without const/ref mismatches? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +There is no single modifier that works for both — branch on `top._type.isIterator`: + +```das +if (top._type != null && top._type.isIterator) { + // Iterator source — rvalue from a function call like each(range(10)). + // typedecl() picks up the function-return type which carries const; + // -const strips it so the body can `for (it in src)` (otherwise + // daslang complains "can't iterate over const iterator"). + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr)) - const) { + // ... body uses $i(srcName) ... + }, $e(topExpr))) +} else { + // Container source with length — array/table/string/range/fixed-array. + // `let arr <- [...]` is `array const&`. Stripping -const would + // produce a non-const-ref param; passing the const-ref source then + // fails with `array const& vs array` ("can't ref types + // can only add constness"). Keep modifiers — typedecl() preserves + // them and the const-ref source matches exactly. + res = qmacro(invoke($($i(srcName) : typedecl($e(topExpr))) { + // ... body uses $i(srcName) ... + }, $e(topExpr))) +} +``` + +The two error messages are diagnostic — they tell you which branch you're on: +- `can't iterate over const iterator` → you forgot `-const` on an iterator path +- `array const& vs array ... can't ref types can only add constness` → you have `-const` on an array path + +**Why this is needed in the first place:** the block param is your way to bind the source expression to a stable name so the loop body can reference it once without re-evaluating side effects. The "right" param type is "whatever the source actually is" — but qmacro `typedecl(expr)` produces the raw type-of including const-ref from the call return, which only sometimes matches what the consumer needs. + +**Use `top._type != null` guard** — `_type` is null for freshly cloned expressions that haven't gone through the typer yet. Treating null as "not iterator" (default to array branch) is wrong if you're past the typer; pick conservatively and call out the assumption. + +**See `daslib/linq_fold.das` `plan_loop_or_count`** for a working example with five emission sites — counter lane, array-lane iter/sourceHasLength/else, and the length-shortcut path that's only reachable when the source has length (so it always uses the no-modifier form). + +**Fast path for length-shortcut:** if you can emit `length($e(topExpr))` directly without the invoke wrapper, do that — no source-bind problem. Works when the entire body is one expression and the source's evaluation cost is "you'd evaluate it once anyway." + +## Questions +- In a call_macro that emits an `invoke($($i(src) : typedecl($e(topExpr)) ) { ... }, $e(topExpr))` wrapper, what `` do I use so the param matches both array and iterator sources without const/ref mismatches? diff --git a/mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md b/mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md new file mode 100644 index 0000000000..bba99ca5e3 --- /dev/null +++ b/mouse-data/docs/what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly.md @@ -0,0 +1,46 @@ +--- +slug: what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-so-docs-build-cleanly +title: What's the end-to-end checklist for adding a new daslib/*.das module so docs build cleanly? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +Four things to update, in order: + +**1. `doc/reflections/das2rst.das`** — add a `require daslib/` near the other daslib requires, write a `document_module_(root : string)` function modeled on a sibling (e.g. `document_module_linq_boost`), and call it from the dispatcher block near the end. Minimal form for a module with mostly-private internals: + +```daslang +def document_module_my_new_module(root : string) { + var mod = find_module("my_new_module") + var groups : array + document("Short description", mod, "my_new_module.rst", groups) +} +``` + +For modules with many public functions, copy the `linq_boost` pattern and add `group_by_regex(...)` entries for each named group — anything left over lands in "Uncategorized" and **fails CI**. + +**2. `doc/source/stdlib/handmade/module-.rst`** — `das2rst` auto-creates this as `// stub\nModule `. Replace the **whole file** with a plain-text description (1-2 paragraphs, with a `.. code-block:: das` require + minimal example). See `module-linq.rst` / `module-linq_boost.rst` for the convention. + +**3. `doc/source/stdlib/sec_*.rst`** — find the section your module belongs in (e.g. `sec_algorithms.rst` for linq family, `sec_strings.rst` for strings, etc.) and add `generated/.rst` to its `.. toctree::`. Without this the page builds but isn't linked. + +**4. Regenerate + verify:** + +```bash +./bin/daslang doc/reflections/das2rst.das # picks up new module + handmade stub +grep -rl "// stub" doc/source/stdlib/handmade/ # must be empty after step 2 +grep -c Uncategorized doc/source/stdlib/generated/*.rst | grep -v ':0$' # must be empty +rm -rf doc/sphinx-build site/doc # clean cache (cached builds hide warnings) +sphinx-build -b html -d doc/sphinx-build doc/source site/doc 2>&1 | tee /tmp/sphinx_out.txt +grep -iE "warning:|error:" /tmp/sphinx_out.txt # must be empty +``` + +`doc/source/stdlib/generated/*.rst` and `generated/detail/*.rst` are **gitignored** — only commit (1) das2rst.das, (2) the handmade module-.rst, and (3) the sec_*.rst toctree update. + +## Questions +- What's the end-to-end checklist for adding a new daslib/*.das module so docs build cleanly? +- Where do I register a new daslib module in das2rst.das? +- Why does my new module appear as `// stub` in the generated RST? + +## Questions +- What's the end-to-end checklist for adding a new daslib/*.das module so docs build cleanly? diff --git a/mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md b/mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md new file mode 100644 index 0000000000..c37c00c27a --- /dev/null +++ b/mouse-data/docs/what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term.md @@ -0,0 +1,39 @@ +--- +slug: what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-average-and-what-operators-aren-t-supported-as-sql-chain-term +title: What's the right sqlite_linq chain form for aggregates (sum/min/max/average), and what operators aren't supported as `_sql` chain terminals? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +Column aggregates in `_sql` chains use the **regular linq function name** after a `_select`, NOT an `_aggregate(_.Col)` macro: + +```daslang +// CORRECT — _sql analyzer recognizes `_select(_.Col) |> sum()` and emits SELECT SUM(price) +let s = _sql(db |> select_from(type) |> _select(_.price) |> sum()) +let m = _sql(db |> select_from(type) |> _select(_.price) |> min()) +let a = _sql(db |> select_from(type) |> _select(_.price) |> average()) // promotes to double +``` + +There is no `_sum` / `_min` / `_max` / `_average` chain macro. The error if you try one is `error[30838]: can't locate variable '_'` because `_sum` doesn't dispatch as a call macro. + +The full set of `_sql` chain terminals is **`_to_array()`, `_first()`, `_first_opt()`, `count()`, and `sum()`/`min()`/`max()`/`average()` after a 1-column `_select`**. These are NOT supported as chain terminals: + +| Chain | Why not | Workaround | +|---|---|---| +| `_any()` (no args, terminal) | not implemented | `_first_opt() \|> is_some` | +| `_all(pred)` | no SQL idiom recognized | invert: `_where(NOT pred) \|> count() == 0` | +| `take(N) \|> count()` | LIMIT-after-aggregate has no effect (aggregate collapses to 1 row) | drop count, materialize: `take(N)` returns array, take `length()` | +| `skip(M) \|> take(N) \|> count()` | same | same — terminate in to_array | +| `distinct() \|> count()` | `COUNT(DISTINCT col)` not yet implemented | `distinct()` alone, then `length()` of result array | +| `_sql(... \|> _join(select_from(type), ...))` | inner `select_from` needs db handle wired inside the analyzer | omit m1 / use raw SQL string for join benchmarks | + +The error messages from `sqlite_linq.das` are explicit — read them, they spell out the alternative form. Pattern matching for these lives in `modules/dasSQLITE/daslib/sqlite_linq.das` `peel_column_aggregate` and `analyze_chain`. + +## Questions +- What's the right sqlite_linq chain form for aggregates (sum/min/max/average), and what operators aren't supported as `_sql` chain terminals? +- Why does `_sum(_.price)` fail in `_sql` with "can't locate variable '_'"? +- How do I express `any`/`all`/distinct-count/take-count in `_sql`? + +## Questions +- What's the right sqlite_linq chain form for aggregates (sum/min/max/average), and what operators aren't supported as `_sql` chain terminals? diff --git a/mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md b/mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md new file mode 100644 index 0000000000..50512f0506 --- /dev/null +++ b/mouse-data/docs/when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid.md @@ -0,0 +1,33 @@ +--- +slug: when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-should-i-emit-static-if-typeinfo-is-workhorse-e-proj-or-decid +title: When a call_macro needs to pick copy-vs-move-init for a projection, should I emit `static_if (typeinfo is_workhorse($e(proj)))` or decide at macro time? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +Decide at macro time. By the time a `[call_macro]` `visit()` fires, inner macros have expanded and the typer has run, so every sub-expression carries a resolved `_type`. Read `projection._type.isWorkhorseType` directly and emit exactly one branch — no `static_if`, no `typeinfo is_workhorse` at runtime, less AST for the typer to fold away later. + +Pattern: + +```das +let workhorseProj = projection._type != null && projection._type.isWorkhorseType +var perElem : Expression? +if (workhorseProj) { + perElem = qmacro_expr() { $i(accName) |> push($e(projection)) } +} else { + perElem = qmacro_block() { + var $i(valName) <- $e(projection) + $i(accName) |> emplace($i(valName)) + } +} +``` + +For workhorse types (`int`, `float`, `bool`, `string`, …, anything `isWorkhorseType` returns true for) you can push the expression directly with no intermediate `var v = expr`. For non-workhorse, `<-` is a statement not an expression — you need `var v <- proj; acc |> emplace(v)`. The two-step is only required there. + +This trick brought daslang `linq_fold`'s `where|select|to_array` emission from 13 → 11 ns/op (parity with the `_old_fold` comprehension baseline) at 100K rows. See [daslib/linq_fold.das](daslib/linq_fold.das) `plan_loop_or_count` (the array lane). The previous version had a runtime `static_if` inside the qmacro — correct but generated 2× the AST and lost the temp-binding optimization opportunity. + +Other `TypeDecl` predicates available at macro time: `isIterator`, `isGoodArrayType`, `isConst`, `isPod`, plus `firstType` / `secondType` / `argTypes` for compound types. Use them; the typer has already done the work. + +## Questions +- When a call_macro needs to pick copy-vs-move-init for a projection, should I emit `static_if (typeinfo is_workhorse($e(proj)))` or decide at macro time? diff --git a/mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md b/mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md new file mode 100644 index 0000000000..e73aff48a7 --- /dev/null +++ b/mouse-data/docs/where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than.md @@ -0,0 +1,36 @@ +--- +slug: where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-a-qmacro-expr-and-fires-at-the-user-s-call-site-rather-than +title: Where does `// nolint:RULE` go when a lint warning is emitted from inside a `qmacro_expr` and fires at the user's call site rather than at the macro source? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +The nolint comment must be **inline at the end of the offending line**, inside the `qmacro_expr {...}` block — NOT on a separate comment line above and NOT at the user call site. + +When a macro emits code via `qmacro_expr { var $i(name) = $e(expr) }`, lint analyzes the expansion at the user's call site but **reports the source position** as the line inside the qmacro_expr body. To suppress, the comment must travel with the emitted line: + +```daslang +} else { + blk.list |> emplace_new <| qmacro_expr() { + var $i(newArgName) = $e(newCall) // nolint:PERF009 + } + ... +} +``` + +What DOESN'T work: +- `// nolint:PERF009` on a comment line above the qmacro_expr block — suppresses nothing. +- `// nolint:PERF009` on the user-side `let x = _fold(...)` line — the lint engine reports against the macro source position, not the user site. + +The placement rule generalizes: `nolint:RULE` must be **on the literal line** that the lint output points at. For macro-quoted code, that's inside the `qmacro_expr { ... }` body. + +Concrete example: PERF009 ("redundant move into variable immediately returned") fired at `daslib/linq_fold.das:490:24` (a line inside `fold_linq_default`'s qmacro_expr) when called via `benchmarks/sql/take_count.das`'s single-pass chain. Inline `// nolint:PERF009` on the emitted `var = expr` line suppresses it cleanly. + +## Questions +- Where does `// nolint:RULE` go when a lint warning is emitted from inside a `qmacro_expr` and fires at the user's call site rather than at the macro source? +- nolint for macro-generated lint warnings +- How to suppress a lint rule that fires only at certain user call sites? + +## Questions +- Where does `// nolint:RULE` go when a lint warning is emitted from inside a `qmacro_expr` and fires at the user's call site rather than at the macro source? diff --git a/mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md b/mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md new file mode 100644 index 0000000000..b847fb36f8 --- /dev/null +++ b/mouse-data/docs/which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros.md @@ -0,0 +1,63 @@ +--- +slug: which-typedecl-predicates-identify-types-where-length-expr-is-statically-resolvable-in-daslang-macros +title: Which TypeDecl predicates identify types where length(expr) is statically resolvable in daslang macros? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +# Length-supporting types in daslang macros + +When a `[macro_function]` / `[call_macro]` needs to emit `length($e(src))` and have it compile, the source's `TypeDecl` must be one of: + +- `isGoodArrayType` — `array` (the dynamic array, including `array#`) +- `isGoodTableType` — `table` +- `isString` — `string` / `string#` +- `isArray` — fixed array `T[N]` (NOT `array` — that's `isGoodArrayType`; the naming is confusing) +- `isRange` — `range` / `urange` / `range64` / `urange64` + +**Excluded** (no `length()` overload — emitting `length(src)` will fail to compile inside macro output): + +- `isIterator` — iterators don't carry length, even when wrapping a length-having source. Use the underlying container. +- `isGoodLambdaType` — `def each(lam : lambda<...>)` makes lambdas iterable, but they have no `length()`. This is a common trap when peeling `each()` based solely on "not an iterator." +- Custom user `def each(MyType)` types — depends on whether the user also defined `length(MyType)`; assume no. + +## Canonical predicate + +```das +[macro_function] +def private type_has_length(t : TypeDecl?) : bool { + if (t == null) return false + return (t.isGoodArrayType || t.isGoodTableType || t.isString + || t.isArray || t.isRange) +} +``` + +Note the parenthesization: a bare `||`-chain split across lines hits a `gen2` parse error at the leading `||`. Wrap the chain in `(...)`. + +## Why this matters for `each()` peeling + +A common optimization: when a chain starts `each()._where(...)...`, peel the `each` and iterate `` directly so reserve/length work. The peel must be gated on `type_has_length(._type)` — checking only `!isIterator` would silently accept `each(lambda)` and emit broken `reserve(length(lambda))`. + +Example from `daslib/linq_fold.das` (PR #2689, Phase 2A): + +```das +[macro_function] +def private peel_each_length_source(var top : Expression?) : Expression? { + if (!(top is ExprCall)) return top + var topCall = top as ExprCall + if (topCall.func == null || topCall.func.name != "each" + || topCall.arguments |> length != 1 + || !type_has_length(topCall.arguments[0]._type)) return top + return clone_expression(topCall.arguments[0]) +} +``` + +The `clone_expression` is needed because `topCall.arguments[0]` is `Expression? const` (the args vector entry is const-typed even when the outer call is `var`); the planner stores `top` as `var Expression?` so the clone drops the const. + +## Discovery + +The set of `length()`-supporting types is not advertised as a single predicate anywhere — assembled from `mcp__daslang__describe_type TypeDecl` (the `isXxx` method list) cross-referenced with the `def length(...)` overloads in `daslib/builtin.das` and the `def each(...)` overloads. Lambda iterables surfaced as a Copilot review finding on PR #2689. + +## Questions +- Which TypeDecl predicates identify types where length(expr) is statically resolvable in daslang macros? diff --git a/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md b/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md new file mode 100644 index 0000000000..cc8f58dc2a --- /dev/null +++ b/mouse-data/docs/why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain.md @@ -0,0 +1,36 @@ +--- +slug: why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-outside-a-for-and-what-s-the-alternative-in-a-linq-chain +title: Why does `each(arr)` fail with "unsafe when not source of for-loop" outside a for, and what's the alternative in a linq chain? +created: 2026-05-16 +last_verified: 2026-05-16 +links: [] +--- + +`each(arr)` returns an iterator that walks the array. Daslang's safety rules say that iterator is unsafe **unless it's directly consumed by a for-loop in the same scope** — passing it through `|>` chains, capturing it in a `let`, or handing it to a function argument all trip: + +``` +error[31013]: '__::builtin`each`...' is unsafe, when not source of the for-loop; + must be inside the 'unsafe' block +``` + +**Inside `_fold(...)`** the error doesn't fire because `_fold` is a macro that expands to a for-loop body where `each(arr)` IS the source. So `_fold(each(arr)._where(...).count())` compiles cleanly. + +**Outside a fold macro**, in a plain pipe chain, use the array directly — most `_` call-macros (`AstCallMacro_LinqPred2`) accept both `iterator` and `array` for arg 0: + +| Doesn't work | Use instead | +|---|---| +| `let prices <- (each(arr) \|> _where(...) \|> _select_to_array(_.price))` (iterator outside `_fold`) | `let prices <- (arr \|> _where(...) \|> _select(_.price))` — array+macros chains as array; result is `array`, no `_to_array` suffix needed | +| `let c = each(cars)._join(each(dealers), ...)` inside `_fold` (two `each()`s, one not the chain source) | `let c = _fold(cars \|> _join(dealers, ..., ...) \|> count())` — pass arrays directly | +| `let r = each(arr) \|> ...` outside any fold | wrap in `unsafe(each(arr))`, OR start the chain with `arr` directly and let the macro handle iterator promotion | + +**Heuristic:** if the chain ends in a `_fold(...)` / `_old_fold(...)` wrapper or a for-loop, `each(arr)` works as the source. If the chain produces a value (or array) that escapes the expression — a `let`, a function return, the second arg to a macro — drop the `each()` and pass the array directly. + +The lint at runtime points at the **specific** `each(arr)` call that escapes, so for multi-each chains (`_join`, `_zip`), check which side is the issue. + +## Questions +- Why does `each(arr)` fail with "unsafe when not source of for-loop" outside a for, and what's the alternative in a linq chain? +- error[31013] '__::builtin`each`' is unsafe — how to fix? +- When can I use `each(arr)` in a linq pipe chain? + +## Questions +- Why does `each(arr)` fail with "unsafe when not source of for-loop" outside a for, and what's the alternative in a linq chain? diff --git a/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md b/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md index efa990fc8a..9a9e930854 100644 --- a/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md +++ b/mouse-data/docs/why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-when-external-curl-to-status-works-fine-is-it-a-require-order.md @@ -15,61 +15,80 @@ links: [] [imgui_playwright] readiness gate FAILED ``` -(30s `wait_until_ready` timeout, then 120s popen drain timeout. External `curl http://localhost:9090/status` from a sibling shell returns 200 with proper status JSON throughout — only the popen parent's request loop can't see it.) +External `curl http://localhost:9090/status` from a sibling shell returns 200 with proper status JSON throughout — only the popen parent's poll loop "can't see it". Reproduces on macOS and Linux; appears to NOT reproduce on Windows (which is the trap — see below). # Root cause -`live/live_api` was required BEFORE `imgui_app + glfw/glfw_boost + opengl/* + glfw_live + opengl_live` somewhere in the requirer chain (usually a wrapper module like `imgui/imgui_harness`). The `[_macro] installing` in `live_api.das` calls `fork_debug_agent_context(@@debug_agent)` at compile time. If that fork happens before GLFW is initialized in the live runtime, the resulting LiveApiServer becomes unreachable from a popen parent on Windows. +**`ref_time_ticks()` returns nanoseconds on POSIX, but the wait-loop math assumes microseconds.** -Filed: [#2677](https://github.com/GaijinEntertainment/daScript/issues/2677). Distinct from #2675 (`ANY("*")` route shadowing). +`src/hal/performance_time.cpp` defines `ref_time_ticks()` per platform: -# Fix (mechanical) +| Platform | Returns | +|---|---| +| Linux | `tv_sec * 1e9 + tv_nsec` — **nanoseconds** | +| macOS | `clock_gettime_nsec_np(CLOCK_MONOTONIC_RAW)` — **nanoseconds** | +| Windows | `QueryPerformanceCounter().QuadPart` — counter ticks, freq depends on hardware (often ~10 MHz, accidentally close to 1 MHz / microsecond scaling) | -In the requirer module (yours or a wrapper you control), reorder requires so the **windowed backend stack comes first**: +`imgui_playwright`'s `wait_until_ready` (and other deadline loops) used: ```das -// Windowed backend FIRST (correctness, not aesthetics). -require imgui -require imgui_app -require glfw/glfw_boost -require opengl/opengl_boost -require live/glfw_live -require live/opengl_live - -// Live-host + boost-runtime stack AFTER. -require live/live_api -require live/live_commands -require live/live_vars -require live_host -require imgui/imgui_live -require imgui/imgui_boost_runtime -require imgui/imgui_boost_v2 -require imgui/imgui_widgets_builtin -require imgui/imgui_containers_builtin -require imgui/imgui_visual_aids +let deadline = ref_time_ticks() + int64(timeout_sec * 1000000.0f) +while (ref_time_ticks() < deadline) { + GET("{base_url}/status") $(resp) { ... } + sleep(READY_POLL_INTERVAL_MS) +} ``` -This mirrors the canonical order every pre-`imgui_harness` example/test used verbatim. Reordering is a no-op for visibility / re-export semantics — purely a workaround for the install-time ordering bug. +That `* 1000000.0f` assumes ref-time is in microseconds. So: +- **Linux/macOS**: a "30s" deadline is `30 * 1e6 = 30 million nanoseconds = 30 milliseconds`. Loop fires 0-1 polls and exits. The `connect 127.0.0.1:9090 failed!` line is the one in-flight libhv connect attempt timing out — server health is fine; the loop just budgeted itself out of existence. +- **Windows**: QPC freq is hardware-dependent but on common runners works out near enough to 1 MHz that `* 1e6` lands in the "seconds" ballpark by accident, masking the bug. + +# The Windows-only "require order" workaround is misleading + +[#2677](https://github.com/GaijinEntertainment/daScript/issues/2677) and a prior version of this card blamed require-order — windowed-backend stack vs. live-host stack — claiming `[_macro] installing` in `live_api.das` calling `fork_debug_agent_context(@@debug_agent)` before GLFW init was the issue. That diagnosis was wrong. The reorder happened to nudge timings just enough on Windows for the (already-too-short) loop to occasionally win the race, which read as "fix". On POSIX, the same reorder changes nothing — the loop still exits in 30 ms regardless of require order. + +If you see code in `imgui_harness.das` carrying a `// NOTE on require ordering` comment about live_api needing to come after the windowed stack: that comment is load-bearing only by accident on Windows. The real fix is in the timing math. + +# Fix + +Replace any `ref_time_ticks() + int64(seconds * 1000000.0f)` deadline pattern with platform-correct math. Two options: + +```das +// Option A — use the elapsed-microsec helper (always microseconds, all platforms) +let t_start = ref_time_ticks() +let timeout_us = int(timeout_sec * 1000000.0f) +while (get_time_usec(t_start) < timeout_us) { + ... +} + +// Option B — compute deadline in nanoseconds, on POSIX +let deadline = ref_time_ticks() + int64(timeout_sec * 1000000000.0f) +// (DON'T do this without a per-platform branch — breaks Windows) +``` + +**Option A is the right one.** `get_time_usec(reft)` is defined per-platform in `performance_time.cpp` and always returns microseconds. Audit any other `ref_time_ticks() + ... * 1000000.0f` patterns in your codebase the same way. # How to recognize this gotcha - Test hangs at `readiness gate FAILED` (not at `body did not converge` or similar). -- External `curl` to `localhost:9090/status` works while the test hangs (proves the server is up — the popen parent specifically can't reach it). -- Always reproduces — not a flaky timing issue. -- ONLY triggers when run via `popen` (via `with_imgui_app` in `imgui_playwright`, or any `dastest` integration test). Direct `bin/Release/daslang-live.exe