linq_fold: Phase 2A loop planner (where|select array + counter lanes)#2689
Merged
Conversation
Replaces _fold's comprehension emitter with a planner that walks the chain and emits a plain for-loop inside invoke($block, $src). Two terminator lanes: - array lane: [_where*][_select?] → loop + push_clone (identity) or emplace-of-bound-projection (workhorse choice made at macro time from the projection's _type.isWorkhorseType, no runtime static_if). - counter lane: same intermediates + _count → counter loop with `n++`. Chained _where|_where fuse into a single && predicate; chained _select|_select fall through (needs ExprRef2Value-aware substitution, deferred to Phase 2B). Anything outside the two lanes (_select|_where, _sum, _min, _max, _first, _any, _all, _long_count, _order, _distinct, _take, _skip, _zip, _reverse, ...) returns the raw chain unfolded — no dispatch to _old_fold or fold_linq_default. _old_fold and fold_linq_default are untouched; the comprehension contract now lives solely on _old_fold (10 AST tests retargeted; 8 new AST tests + 6 behavioral tests cover the new loop emission). Benchmark deltas (100K, INTERP, ns/op per element): count_aggregate (where|count): 5 → 5 parity chained_where (where|where|count): 17 → 8 2.1× faster select_count (select|count): 15 → 2 7.5× faster to_array_filter (where|select): 11 → 13 ~18% slower vs comprehension Out-of-scope shapes regress to m3 (plain linq) — accepted as the forcing function for Phase 2B (sum/min/max/first/any/all + chained selects + take/skip). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ilter parity The first Phase-2A cut was ~18% slower than the _old_fold comprehension on where|select|to_array. Four small fixes brought it to 11 ns/op parity: 1. Workhorse decision at macro time, not runtime. The projection's _type is resolved when the planner runs, so the macro reads projection._type.isWorkhorseType directly and emits exactly one branch instead of a runtime static_if. 2. Pre-reserve when the source has a known length. The planner emits acc |> reserve(length(src)) when top._type isn't an iterator — matches what ExprArrayComprehension lowering does internally. 3. Peel each(<array>) at macro time. each(arr) reports as iterator<T> so (2) wouldn't fire on benchmark sources like each(arr)._where(...). The planner now detects each(<expr>) where the inner has length and unwraps it — the emitted loop iterates the array directly. 4. Drop the intermediate var binding for workhorse projections. Workhorse values copy cheaply, so the planner emits acc |> push(projection) directly. Non-workhorse keeps the bind-then-emplace dance because <- is a statement, not an expression. Phase 2A benchmark deltas (100K, INTERP, ns/op per element): count_aggregate (where|count): 5 → 5 parity chained_where (where|where|count): 17 → 8 2.1× faster select_count (select|count): 15 → 2 7.5× faster to_array_filter (where|select): 11 → 11 parity (was 13 pre-fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-up improvements on top of the Phase-2A loop planner: 1. Chained _select|_select|... now fuses (for workhorse projections). The planner emits intermediate `var v_N = projection_N` let-bindings inside the loop body; each next lambda's `_` is renamed straight to the prior binding's name via fold_linq_cond. No expression substitution = no ExprRef2Value-wrapper trap. Non-workhorse chained selects still fall through (needs `:=` clone semantics — Phase 2B). 2. Drop emplace from emission. emplace moves out of its argument and can corrupt the source when the projection returns a ref into it (e.g. `_._field`). The planner now emits `push` for workhorse and `push_clone` for non-workhorse — no intermediate `var v <- proj; emplace(v)` dance, which both simplifies the AST and is safer. The chained-select AST test (previously asserting fall-through) now asserts invoke emission. All 118 fold + ast tests pass; benchmark deltas held vs the previous commit: count_aggregate: 5 parity chained_where: 8 2.1× faster select_count: 2 7.5× faster to_array_filter: 11 parity Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR advances the linq_fold splice-mode rewrite by introducing a Phase 2A planner that emits explicit invoke(...) + for loops for a narrow set of LINQ shapes, while making _fold fall through to the raw (unfolded) LINQ chain for everything out of scope. It also preserves the historical comprehension-emission contract via _old_fold and updates tests/benchmarks/docs accordingly.
Changes:
- Add
plan_loop_or_countindaslib/linq_fold.dasand rewire_foldto “plan-then-fall-through” (no dispatch to_old_fold/fold_linq_defaultwhen unmatched). - Retarget existing AST-shape tests to
_old_foldand add new AST + behavioral tests for the new loop-emission shapes. - Add a new
select → countbenchmark and document Phase 2A status/deltas inbenchmarks/sql/LINQ.md.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
daslib/linq_fold.das |
Adds the Phase 2A loop planner and changes _fold to emit loops for supported shapes or fall through unchanged. |
tests/linq/test_linq_fold_ast.das |
Splits AST-contract coverage: _old_fold retains comprehension baseline tests; _fold gains loop-emission and parity tests. |
benchmarks/sql/select_count.das |
Introduces a benchmark for select → count to measure Phase 2A counter-lane behavior. |
benchmarks/sql/LINQ.md |
Updates the phase status table and documents Phase 2A scope and benchmark deltas. |
Comments suppressed due to low confidence (1)
tests/linq/test_linq_fold_ast.das:474
- Same issue as the where-case:
qm_resolve_comprehensionis run on the invoke’s block argument (ExprMakeBlock), which will always return null and doesn’t validate the absence of comprehensions inside the generated loop body. The assertion should inspect the block’s body/return expression instead of the block node itself.
let inv = body_expr as ExprInvoke
var arg0 = clone_expression(inv.arguments[0])
var maybe_comp <- qm_resolve_comprehension(arg0)
t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension")
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR #2689 review fixes (Copilot): 1. Counter lane drop-projection bug. `_fold(src._select(f).count())` was skipping the projection entirely, which diverges from raw LINQ `count(select(src, f))` when `f` has side effects. Counter lane now binds the final projection to a discardable local per matched element so user-visible side effects fire. The optimizer dead-code-eliminates the binding for pure projections (the common case — `_.x * 2`, `_.price` etc.), so the 7.5× select_count speedup is preserved. 2. Vacuous comprehension assertion in two AST tests. Pass `body_expr` (the full ExprInvoke wrapper) to `qm_resolve_comprehension` instead of `inv.arguments[0]` (the inner ExprMakeBlock, which can never match either branch of the resolver). The fixed form actually verifies the loop output is not the `fromComprehension=true` shape. Adds 2 behavioral tests for the side-effects invariant (single `select|count` and `where|select|count`). All Phase 2A benchmarks held: count_aggregate 5/5, chained_where 8/17 (2.1×), select_count 2/15 (7.5×), to_array_filter 11/11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reflect counter-lane semantics fix: projection is now evaluated per matched element (side effects fire); optimizer DCEs pure projections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2689 review fixes (Copilot, round 2): 1. Peel-each + reserve guard. The inline `each(<x>)` peel + `sourceHasLength` gate previously accepted any non-iterator inner type, including `each(lambda)` (a lambda iterable per builtin.das:1351). That would peel to a lambda, then emit `reserve(length(lambda))` which has no overload and would fail to compile inside the macro output. Phase 2A never hit this in practice because the test suite only uses array sources, but it's a latent trap. Extracted `peel_each_length_source` and `type_has_length` helpers. Peel now triggers only when the inner type satisfies `isGoodArrayType || isGoodTableType || isString || isArray (T[N]) || isRange`. Same predicate gates the array-lane reserve emission, so the two stay in sync. Lambdas / custom user iterables fall through unfolded. 2. Reworded `test_select_count_fold_result` assertion message: the old "(projection ignored by counter)" wording was outdated after the counter-lane fix in 6226a1e — the planner now evaluates the projection per iteration (for side effects); only the value is discarded. Reads "(projection does not affect count value)" now. select_count benchmark held at 2 ns/op (vs 15 for old fold), to_array_filter held at 11/11 parity. AST + behavioral tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2A of the linq_fold splice-mode rewrite (plan:
~/.claude/plans/keen-hopping-balloon.md; foundation landed in #2687)._foldnow emits an explicit for-loop insideinvoke($block, $src)for two narrow shape families. Anything outside scope returns the raw chain unfolded — no dispatch to_old_foldor tofold_linq_default's nested-pass emitter.In scope:
[where_*][select*](implicitto_array). Chained_where|_where|...fuses via&&; chained workhorse_select|_select|...fuses via intermediatevar v_N = projection_Nlet-bindings (each next lambda's_is renamed straight to the prior binding's name, no expression substitution)._count. Emitsvar n = 0; for... if... n++; return nwith no array materialization.Out of scope (falls through to raw chain):
_select|_where,sum,min,max,average,first,any,all,long_count,_order,_distinct,_take,_skip,_zip,_reverse, and non-workhorse chained selects. These compile to plain linq (m3f ≈ m3) — accepted regressions vs the historical_old_foldbaseline. Phase 2B picks them up; the regressions are the forcing function so we feel the gap immediately._old_foldis untouched and continues to own the comprehension-emission contract; 10 existing AST tests were retargeted to_old_foldso the frozen baseline still has explicit coverage, and 8 new AST tests + 6 behavioral parity tests cover the new loop emission.Phase 2A benchmark deltas (100K rows, ns/op per element, INTERP)
where → countwhere → where → countselect → countwhere → select → to_arrayImplementation notes
Four small tricks closed the
to_array_filtergap from a first-cut 13 → 11 ns/op (parity with the comprehension baseline):_typeis resolved by the timeLinqFold.visit()fires, so the planner readsprojection._type.isWorkhorseTypedirectly and emits exactly one branch — nostatic_if (typeinfo is_workhorse(...))at runtime.each(<array>).each(arr)reports asiterator<T>, so the array-only reserve path got skipped on benchmark sources likeeach(arr)._where(...)._select(...).to_array(). The planner detectseach(<source-with-length>)and unwraps it. Iteration semantics are unchanged —for (it in arr)andfor (it in each(arr))yield the same element refs.acc |> reserve(length(src))before the loop — matches whatExprArrayComprehensionlowering does internally.emplacein emission.emplacemoves out of its argument and can corrupt the source when the projection returns a ref into it (e.g._._field). The planner emitspushfor workhorse andpush_clonefor non-workhorse — no intermediatevar v <- proj; emplace(v)dance.Chained
_select|_selectwas the original Phase 2A gap. PlainTemplate.replaceVariable("it", proj_prev) + apply_templatesubstitution fails because the typer wrapsitreads inExprRef2Value; the Template visitor only replaces the innerExprVar, leavingExprRef2Value(<non-ref value>)and a "can only dereference a reference" error. The fix here is to not substitute at all — bind the prior projection to a fresh local in the loop body and rename the next lambda's_to that name via the existingfold_linq_cond. Non-workhorse chained selects still fall through (would need:=clone for the intermediate binding since<-can corrupt source for lvalue projections; deferred to Phase 2B).Files changed
daslib/linq_fold.das— newplan_loop_or_countplanner;LinqFold.visit()rewired to plan-then-fall-through._old_fold,fold_linq_default,g_foldSeq,linqCalls,flatten_linq,fold_linq_cond, and every existingfold_*helper are untouched.tests/linq/test_linq_fold_ast.das— 10 AST tests retargeted to_old_fold(paralleltarget_*_old_foldfunctions added); 8 new AST tests covering the new loop-emission shapes; 6 new behavioral parity tests for chained-where, chained-select, where-count, select-count, count, bare-count.benchmarks/sql/select_count.das— new 4-way bench.chained_where.dasandto_array_filter.dasalready existed; their m3f columns now show the speedup / parity.benchmarks/sql/LINQ.md— Phase 2A row in the phase-status table; new "Phase 2A — Loop planner" section with the delta table and implementation notes.Test plan
mcp__daslang__linton all changed.dasfiles — 0 issuesmcp__daslang__format_file— all already formattedmcp__daslang__compile_check— all 4 files cleantests/linq/suite (15 files, 592 tests) — passtests/dasSQLITE/test_05_sql_macro.das(19 tests) — pass (no transitive regression)test_aot -use-aot dastest/dastest.das -- --use-aot --test tests/linq— 614 tests, all pass🤖 Generated with Claude Code