linq_fold: Phase 2A loop planner (where|select array + counter lanes) by borisbat · Pull Request #2689 · GaijinEntertainment/daScript

borisbat · 2026-05-16T17:59:01Z

Summary

Phase 2A of the linq_fold splice-mode rewrite (plan: ~/.claude/plans/keen-hopping-balloon.md; foundation landed in #2687). _fold now emits an explicit for-loop inside invoke($block, $src) for two narrow shape families. Anything outside scope returns the raw chain unfolded — no dispatch to _old_fold or to fold_linq_default's nested-pass emitter.

In scope:

Array lane — [where_*][select*] (implicit to_array). Chained _where|_where|... fuses via &&; chained workhorse _select|_select|... fuses via intermediate var v_N = projection_N let-bindings (each next lambda's _ is renamed straight to the prior binding's name, no expression substitution).
Counter lane — same intermediates terminated by _count. Emits var n = 0; for... if... n++; return n with no array materialization.

Out of scope (falls through to raw chain): _select|_where, sum, min, max, average, first, any, all, long_count, _order, _distinct, _take, _skip, _zip, _reverse, and non-workhorse chained selects. These compile to plain linq (m3f ≈ m3) — accepted regressions vs the historical _old_fold baseline. Phase 2B picks them up; the regressions are the forcing function so we feel the gap immediately.

_old_fold is untouched and continues to own the comprehension-emission contract; 10 existing AST tests were retargeted to _old_fold so the frozen baseline still has explicit coverage, and 8 new AST tests + 6 behavioral parity tests cover the new loop emission.

Phase 2A benchmark deltas (100K rows, ns/op per element, INTERP)

Benchmark	Shape	m3f_old	m3f	Delta
count_aggregate	`where → count`	5	5	parity
chained_where	`where → where → count`	17	8	2.1× faster
select_count	`select → count`	15	2	7.5× faster
to_array_filter	`where → select → to_array`	11	11	parity

Implementation notes

Four small tricks closed the to_array_filter gap from a first-cut 13 → 11 ns/op (parity with the comprehension baseline):

Workhorse decision at macro time. The projection's _type is resolved by the time LinqFold.visit() fires, so the planner reads projection._type.isWorkhorseType directly and emits exactly one branch — no static_if (typeinfo is_workhorse(...)) at runtime.
Peel each(<array>). each(arr) reports as iterator<T>, so the array-only reserve path got skipped on benchmark sources like each(arr)._where(...)._select(...).to_array(). The planner detects each(<source-with-length>) and unwraps it. Iteration semantics are unchanged — for (it in arr) and for (it in each(arr)) yield the same element refs.
Pre-reserve. Once the source has a known length (post-peel), emit acc |> reserve(length(src)) before the loop — matches what ExprArrayComprehension lowering does internally.
No emplace in emission. emplace moves out of its argument and can corrupt the source when the projection returns a ref into it (e.g. _._field). The planner emits push for workhorse and push_clone for non-workhorse — no intermediate var v <- proj; emplace(v) dance.

Chained _select|_select was the original Phase 2A gap. Plain Template.replaceVariable("it", proj_prev) + apply_template substitution fails because the typer wraps it reads in ExprRef2Value; the Template visitor only replaces the inner ExprVar, leaving ExprRef2Value(<non-ref value>) and a "can only dereference a reference" error. The fix here is to not substitute at all — bind the prior projection to a fresh local in the loop body and rename the next lambda's _ to that name via the existing fold_linq_cond. Non-workhorse chained selects still fall through (would need := clone for the intermediate binding since <- can corrupt source for lvalue projections; deferred to Phase 2B).

Files changed

daslib/linq_fold.das — new plan_loop_or_count planner; LinqFold.visit() rewired to plan-then-fall-through. _old_fold, fold_linq_default, g_foldSeq, linqCalls, flatten_linq, fold_linq_cond, and every existing fold_* helper are untouched.
tests/linq/test_linq_fold_ast.das — 10 AST tests retargeted to _old_fold (parallel target_*_old_fold functions added); 8 new AST tests covering the new loop-emission shapes; 6 new behavioral parity tests for chained-where, chained-select, where-count, select-count, count, bare-count.
benchmarks/sql/select_count.das — new 4-way bench. chained_where.das and to_array_filter.das already existed; their m3f columns now show the speedup / parity.
benchmarks/sql/LINQ.md — Phase 2A row in the phase-status table; new "Phase 2A — Loop planner" section with the delta table and implementation notes.

Test plan

mcp__daslang__lint on all changed .das files — 0 issues
mcp__daslang__format_file — all already formatted
mcp__daslang__compile_check — all 4 files clean
Full tests/linq/ suite (15 files, 592 tests) — pass
tests/dasSQLITE/test_05_sql_macro.das (19 tests) — pass (no transitive regression)
AOT: test_aot -use-aot dastest/dastest.das -- --use-aot --test tests/linq — 614 tests, all pass
Phase 2A 4 benchmarks at 100K — m3f at parity-or-better on every shape

🤖 Generated with Claude Code

Replaces _fold's comprehension emitter with a planner that walks the chain and emits a plain for-loop inside invoke($block, $src). Two terminator lanes: - array lane: [_where*][_select?] → loop + push_clone (identity) or emplace-of-bound-projection (workhorse choice made at macro time from the projection's _type.isWorkhorseType, no runtime static_if). - counter lane: same intermediates + _count → counter loop with `n++`. Chained _where|_where fuse into a single && predicate; chained _select|_select fall through (needs ExprRef2Value-aware substitution, deferred to Phase 2B). Anything outside the two lanes (_select|_where, _sum, _min, _max, _first, _any, _all, _long_count, _order, _distinct, _take, _skip, _zip, _reverse, ...) returns the raw chain unfolded — no dispatch to _old_fold or fold_linq_default. _old_fold and fold_linq_default are untouched; the comprehension contract now lives solely on _old_fold (10 AST tests retargeted; 8 new AST tests + 6 behavioral tests cover the new loop emission). Benchmark deltas (100K, INTERP, ns/op per element): count_aggregate (where|count): 5 → 5 parity chained_where (where|where|count): 17 → 8 2.1× faster select_count (select|count): 15 → 2 7.5× faster to_array_filter (where|select): 11 → 13 ~18% slower vs comprehension Out-of-scope shapes regress to m3 (plain linq) — accepted as the forcing function for Phase 2B (sum/min/max/first/any/all + chained selects + take/skip). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ilter parity The first Phase-2A cut was ~18% slower than the _old_fold comprehension on where|select|to_array. Four small fixes brought it to 11 ns/op parity: 1. Workhorse decision at macro time, not runtime. The projection's _type is resolved when the planner runs, so the macro reads projection._type.isWorkhorseType directly and emits exactly one branch instead of a runtime static_if. 2. Pre-reserve when the source has a known length. The planner emits acc |> reserve(length(src)) when top._type isn't an iterator — matches what ExprArrayComprehension lowering does internally. 3. Peel each(<array>) at macro time. each(arr) reports as iterator<T> so (2) wouldn't fire on benchmark sources like each(arr)._where(...). The planner now detects each(<expr>) where the inner has length and unwraps it — the emitted loop iterates the array directly. 4. Drop the intermediate var binding for workhorse projections. Workhorse values copy cheaply, so the planner emits acc |> push(projection) directly. Non-workhorse keeps the bind-then-emplace dance because <- is a statement, not an expression. Phase 2A benchmark deltas (100K, INTERP, ns/op per element): count_aggregate (where|count): 5 → 5 parity chained_where (where|where|count): 17 → 8 2.1× faster select_count (select|count): 15 → 2 7.5× faster to_array_filter (where|select): 11 → 11 parity (was 13 pre-fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two follow-up improvements on top of the Phase-2A loop planner: 1. Chained _select|_select|... now fuses (for workhorse projections). The planner emits intermediate `var v_N = projection_N` let-bindings inside the loop body; each next lambda's `_` is renamed straight to the prior binding's name via fold_linq_cond. No expression substitution = no ExprRef2Value-wrapper trap. Non-workhorse chained selects still fall through (needs `:=` clone semantics — Phase 2B). 2. Drop emplace from emission. emplace moves out of its argument and can corrupt the source when the projection returns a ref into it (e.g. `_._field`). The planner now emits `push` for workhorse and `push_clone` for non-workhorse — no intermediate `var v <- proj; emplace(v)` dance, which both simplifies the AST and is safer. The chained-select AST test (previously asserting fall-through) now asserts invoke emission. All 118 fold + ast tests pass; benchmark deltas held vs the previous commit: count_aggregate: 5 parity chained_where: 8 2.1× faster select_count: 2 7.5× faster to_array_filter: 11 parity Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR advances the linq_fold splice-mode rewrite by introducing a Phase 2A planner that emits explicit invoke(...) + for loops for a narrow set of LINQ shapes, while making _fold fall through to the raw (unfolded) LINQ chain for everything out of scope. It also preserves the historical comprehension-emission contract via _old_fold and updates tests/benchmarks/docs accordingly.

Changes:

Add plan_loop_or_count in daslib/linq_fold.das and rewire _fold to “plan-then-fall-through” (no dispatch to _old_fold / fold_linq_default when unmatched).
Retarget existing AST-shape tests to _old_fold and add new AST + behavioral tests for the new loop-emission shapes.
Add a new select → count benchmark and document Phase 2A status/deltas in benchmarks/sql/LINQ.md.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`daslib/linq_fold.das`	Adds the Phase 2A loop planner and changes `_fold` to emit loops for supported shapes or fall through unchanged.
`tests/linq/test_linq_fold_ast.das`	Splits AST-contract coverage: `_old_fold` retains comprehension baseline tests; `_fold` gains loop-emission and parity tests.
`benchmarks/sql/select_count.das`	Introduces a benchmark for `select → count` to measure Phase 2A counter-lane behavior.
`benchmarks/sql/LINQ.md`	Updates the phase status table and documents Phase 2A scope and benchmark deltas.

Comments suppressed due to low confidence (1)

tests/linq/test_linq_fold_ast.das:474

Same issue as the where-case: qm_resolve_comprehension is run on the invoke’s block argument (ExprMakeBlock), which will always return null and doesn’t validate the absence of comprehensions inside the generated loop body. The assertion should inspect the block’s body/return expression instead of the block node itself.

        let inv = body_expr as ExprInvoke
        var arg0 = clone_expression(inv.arguments[0])
        var maybe_comp <- qm_resolve_comprehension(arg0)
        t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension")
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

PR #2689 review fixes (Copilot): 1. Counter lane drop-projection bug. `_fold(src._select(f).count())` was skipping the projection entirely, which diverges from raw LINQ `count(select(src, f))` when `f` has side effects. Counter lane now binds the final projection to a discardable local per matched element so user-visible side effects fire. The optimizer dead-code-eliminates the binding for pure projections (the common case — `_.x * 2`, `_.price` etc.), so the 7.5× select_count speedup is preserved. 2. Vacuous comprehension assertion in two AST tests. Pass `body_expr` (the full ExprInvoke wrapper) to `qm_resolve_comprehension` instead of `inv.arguments[0]` (the inner ExprMakeBlock, which can never match either branch of the resolver). The fixed form actually verifies the loop output is not the `fromComprehension=true` shape. Adds 2 behavioral tests for the side-effects invariant (single `select|count` and `where|select|count`). All Phase 2A benchmarks held: count_aggregate 5/5, chained_where 8/17 (2.1×), select_count 2/15 (7.5×), to_array_filter 11/11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reflect counter-lane semantics fix: projection is now evaluated per matched element (side effects fire); optimizer DCEs pure projections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

PR #2689 review fixes (Copilot, round 2): 1. Peel-each + reserve guard. The inline `each(<x>)` peel + `sourceHasLength` gate previously accepted any non-iterator inner type, including `each(lambda)` (a lambda iterable per builtin.das:1351). That would peel to a lambda, then emit `reserve(length(lambda))` which has no overload and would fail to compile inside the macro output. Phase 2A never hit this in practice because the test suite only uses array sources, but it's a latent trap. Extracted `peel_each_length_source` and `type_has_length` helpers. Peel now triggers only when the inner type satisfies `isGoodArrayType || isGoodTableType || isString || isArray (T[N]) || isRange`. Same predicate gates the array-lane reserve emission, so the two stay in sync. Lambdas / custom user iterables fall through unfolded. 2. Reworded `test_select_count_fold_result` assertion message: the old "(projection ignored by counter)" wording was outdated after the counter-lane fix in 6226a1e — the planner now evaluates the projection per iteration (for side effects); only the value is discarded. Reads "(projection does not affect count value)" now. select_count benchmark held at 2 ns/op (vs 15 for old fold), to_array_filter held at 11/11 parity. AST + behavioral tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

borisbat and others added 3 commits May 16, 2026 10:51

Copilot AI review requested due to automatic review settings May 16, 2026 17:59

Copilot started reviewing on behalf of borisbat May 16, 2026 17:59 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread tests/linq/test_linq_fold_ast.das Outdated

Comment thread daslib/linq_fold.das Outdated

Comment thread benchmarks/sql/select_count.das Outdated

Comment thread benchmarks/sql/LINQ.md Outdated

borisbat and others added 2 commits May 16, 2026 11:34

linq_fold: update select_count benchmark header comment

6cda3c7

Reflect counter-lane semantics fix: projection is now evaluated per matched element (side effects fire); optimizer DCEs pure projections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

borisbat requested a review from Copilot May 16, 2026 18:41

Copilot started reviewing on behalf of borisbat May 16, 2026 18:42 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread daslib/linq_fold.das Outdated

Comment thread tests/linq/test_linq_fold_ast.das Outdated

borisbat requested a review from Copilot May 16, 2026 18:58

Copilot started reviewing on behalf of borisbat May 16, 2026 18:58 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread daslib/linq_fold.das

Comment thread benchmarks/sql/LINQ.md

borisbat merged commit d805df7 into master May 16, 2026
32 checks passed

This was referenced May 16, 2026

macro_boost: add has_sideeffects + counter-lane elision #2691

Merged

mouse-data/docs: 21 new + 1 updated card from linq_fold + dasImgui PR #38 #2692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linq_fold: Phase 2A loop planner (where|select array + counter lanes)#2689

linq_fold: Phase 2A loop planner (where|select array + counter lanes)#2689
borisbat merged 6 commits into
masterfrom
bbatkin/linq-fold-phase-2a-loop-planner

borisbat commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

borisbat commented May 16, 2026

Summary

Phase 2A benchmark deltas (100K rows, ns/op per element, INTERP)

Implementation notes

Files changed

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants