Skip to content

linq_fold: Phase 2A loop planner (where|select array + counter lanes)#2689

Merged
borisbat merged 6 commits into
masterfrom
bbatkin/linq-fold-phase-2a-loop-planner
May 16, 2026
Merged

linq_fold: Phase 2A loop planner (where|select array + counter lanes)#2689
borisbat merged 6 commits into
masterfrom
bbatkin/linq-fold-phase-2a-loop-planner

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

Phase 2A of the linq_fold splice-mode rewrite (plan: ~/.claude/plans/keen-hopping-balloon.md; foundation landed in #2687). _fold now emits an explicit for-loop inside invoke($block, $src) for two narrow shape families. Anything outside scope returns the raw chain unfolded — no dispatch to _old_fold or to fold_linq_default's nested-pass emitter.

In scope:

  • Array lane[where_*][select*] (implicit to_array). Chained _where|_where|... fuses via &&; chained workhorse _select|_select|... fuses via intermediate var v_N = projection_N let-bindings (each next lambda's _ is renamed straight to the prior binding's name, no expression substitution).
  • Counter lane — same intermediates terminated by _count. Emits var n = 0; for... if... n++; return n with no array materialization.

Out of scope (falls through to raw chain): _select|_where, sum, min, max, average, first, any, all, long_count, _order, _distinct, _take, _skip, _zip, _reverse, and non-workhorse chained selects. These compile to plain linq (m3f ≈ m3) — accepted regressions vs the historical _old_fold baseline. Phase 2B picks them up; the regressions are the forcing function so we feel the gap immediately.

_old_fold is untouched and continues to own the comprehension-emission contract; 10 existing AST tests were retargeted to _old_fold so the frozen baseline still has explicit coverage, and 8 new AST tests + 6 behavioral parity tests cover the new loop emission.

Phase 2A benchmark deltas (100K rows, ns/op per element, INTERP)

Benchmark Shape m3f_old m3f Delta
count_aggregate where → count 5 5 parity
chained_where where → where → count 17 8 2.1× faster
select_count select → count 15 2 7.5× faster
to_array_filter where → select → to_array 11 11 parity

Implementation notes

Four small tricks closed the to_array_filter gap from a first-cut 13 → 11 ns/op (parity with the comprehension baseline):

  1. Workhorse decision at macro time. The projection's _type is resolved by the time LinqFold.visit() fires, so the planner reads projection._type.isWorkhorseType directly and emits exactly one branch — no static_if (typeinfo is_workhorse(...)) at runtime.
  2. Peel each(<array>). each(arr) reports as iterator<T>, so the array-only reserve path got skipped on benchmark sources like each(arr)._where(...)._select(...).to_array(). The planner detects each(<source-with-length>) and unwraps it. Iteration semantics are unchanged — for (it in arr) and for (it in each(arr)) yield the same element refs.
  3. Pre-reserve. Once the source has a known length (post-peel), emit acc |> reserve(length(src)) before the loop — matches what ExprArrayComprehension lowering does internally.
  4. No emplace in emission. emplace moves out of its argument and can corrupt the source when the projection returns a ref into it (e.g. _._field). The planner emits push for workhorse and push_clone for non-workhorse — no intermediate var v <- proj; emplace(v) dance.

Chained _select|_select was the original Phase 2A gap. Plain Template.replaceVariable("it", proj_prev) + apply_template substitution fails because the typer wraps it reads in ExprRef2Value; the Template visitor only replaces the inner ExprVar, leaving ExprRef2Value(<non-ref value>) and a "can only dereference a reference" error. The fix here is to not substitute at all — bind the prior projection to a fresh local in the loop body and rename the next lambda's _ to that name via the existing fold_linq_cond. Non-workhorse chained selects still fall through (would need := clone for the intermediate binding since <- can corrupt source for lvalue projections; deferred to Phase 2B).

Files changed

  • daslib/linq_fold.das — new plan_loop_or_count planner; LinqFold.visit() rewired to plan-then-fall-through. _old_fold, fold_linq_default, g_foldSeq, linqCalls, flatten_linq, fold_linq_cond, and every existing fold_* helper are untouched.
  • tests/linq/test_linq_fold_ast.das — 10 AST tests retargeted to _old_fold (parallel target_*_old_fold functions added); 8 new AST tests covering the new loop-emission shapes; 6 new behavioral parity tests for chained-where, chained-select, where-count, select-count, count, bare-count.
  • benchmarks/sql/select_count.das — new 4-way bench. chained_where.das and to_array_filter.das already existed; their m3f columns now show the speedup / parity.
  • benchmarks/sql/LINQ.md — Phase 2A row in the phase-status table; new "Phase 2A — Loop planner" section with the delta table and implementation notes.

Test plan

  • mcp__daslang__lint on all changed .das files — 0 issues
  • mcp__daslang__format_file — all already formatted
  • mcp__daslang__compile_check — all 4 files clean
  • Full tests/linq/ suite (15 files, 592 tests) — pass
  • tests/dasSQLITE/test_05_sql_macro.das (19 tests) — pass (no transitive regression)
  • AOT: test_aot -use-aot dastest/dastest.das -- --use-aot --test tests/linq — 614 tests, all pass
  • Phase 2A 4 benchmarks at 100K — m3f at parity-or-better on every shape

🤖 Generated with Claude Code

borisbat and others added 3 commits May 16, 2026 10:51
Replaces _fold's comprehension emitter with a planner that walks the chain
and emits a plain for-loop inside invoke($block, $src). Two terminator
lanes:

- array lane: [_where*][_select?] → loop + push_clone (identity) or
  emplace-of-bound-projection (workhorse choice made at macro time from
  the projection's _type.isWorkhorseType, no runtime static_if).
- counter lane: same intermediates + _count → counter loop with `n++`.

Chained _where|_where fuse into a single && predicate; chained
_select|_select fall through (needs ExprRef2Value-aware substitution,
deferred to Phase 2B). Anything outside the two lanes (_select|_where,
_sum, _min, _max, _first, _any, _all, _long_count, _order, _distinct,
_take, _skip, _zip, _reverse, ...) returns the raw chain unfolded —
no dispatch to _old_fold or fold_linq_default.

_old_fold and fold_linq_default are untouched; the comprehension contract
now lives solely on _old_fold (10 AST tests retargeted; 8 new AST tests +
6 behavioral tests cover the new loop emission).

Benchmark deltas (100K, INTERP, ns/op per element):
  count_aggregate (where|count):       5 → 5    parity
  chained_where (where|where|count):  17 → 8    2.1× faster
  select_count (select|count):        15 → 2    7.5× faster
  to_array_filter (where|select):     11 → 13   ~18% slower vs comprehension

Out-of-scope shapes regress to m3 (plain linq) — accepted as the
forcing function for Phase 2B (sum/min/max/first/any/all + chained
selects + take/skip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ilter parity

The first Phase-2A cut was ~18% slower than the _old_fold comprehension on
where|select|to_array. Four small fixes brought it to 11 ns/op parity:

1. Workhorse decision at macro time, not runtime. The projection's _type is
   resolved when the planner runs, so the macro reads
   projection._type.isWorkhorseType directly and emits exactly one branch
   instead of a runtime static_if.

2. Pre-reserve when the source has a known length. The planner emits
   acc |> reserve(length(src)) when top._type isn't an iterator — matches
   what ExprArrayComprehension lowering does internally.

3. Peel each(<array>) at macro time. each(arr) reports as iterator<T> so
   (2) wouldn't fire on benchmark sources like each(arr)._where(...). The
   planner now detects each(<expr>) where the inner has length and unwraps
   it — the emitted loop iterates the array directly.

4. Drop the intermediate var binding for workhorse projections. Workhorse
   values copy cheaply, so the planner emits acc |> push(projection)
   directly. Non-workhorse keeps the bind-then-emplace dance because <- is
   a statement, not an expression.

Phase 2A benchmark deltas (100K, INTERP, ns/op per element):
  count_aggregate (where|count):       5 → 5    parity
  chained_where (where|where|count):  17 → 8    2.1× faster
  select_count (select|count):        15 → 2    7.5× faster
  to_array_filter (where|select):     11 → 11   parity (was 13 pre-fix)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-up improvements on top of the Phase-2A loop planner:

1. Chained _select|_select|... now fuses (for workhorse projections).
   The planner emits intermediate `var v_N = projection_N` let-bindings
   inside the loop body; each next lambda's `_` is renamed straight to
   the prior binding's name via fold_linq_cond. No expression substitution
   = no ExprRef2Value-wrapper trap. Non-workhorse chained selects still
   fall through (needs `:=` clone semantics — Phase 2B).

2. Drop emplace from emission. emplace moves out of its argument and
   can corrupt the source when the projection returns a ref into it
   (e.g. `_._field`). The planner now emits `push` for workhorse and
   `push_clone` for non-workhorse — no intermediate `var v <- proj;
   emplace(v)` dance, which both simplifies the AST and is safer.

The chained-select AST test (previously asserting fall-through) now
asserts invoke emission. All 118 fold + ast tests pass; benchmark
deltas held vs the previous commit:
  count_aggregate:    5  parity
  chained_where:      8  2.1× faster
  select_count:       2  7.5× faster
  to_array_filter:   11  parity

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 16, 2026 17:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the linq_fold splice-mode rewrite by introducing a Phase 2A planner that emits explicit invoke(...) + for loops for a narrow set of LINQ shapes, while making _fold fall through to the raw (unfolded) LINQ chain for everything out of scope. It also preserves the historical comprehension-emission contract via _old_fold and updates tests/benchmarks/docs accordingly.

Changes:

  • Add plan_loop_or_count in daslib/linq_fold.das and rewire _fold to “plan-then-fall-through” (no dispatch to _old_fold / fold_linq_default when unmatched).
  • Retarget existing AST-shape tests to _old_fold and add new AST + behavioral tests for the new loop-emission shapes.
  • Add a new select → count benchmark and document Phase 2A status/deltas in benchmarks/sql/LINQ.md.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
daslib/linq_fold.das Adds the Phase 2A loop planner and changes _fold to emit loops for supported shapes or fall through unchanged.
tests/linq/test_linq_fold_ast.das Splits AST-contract coverage: _old_fold retains comprehension baseline tests; _fold gains loop-emission and parity tests.
benchmarks/sql/select_count.das Introduces a benchmark for select → count to measure Phase 2A counter-lane behavior.
benchmarks/sql/LINQ.md Updates the phase status table and documents Phase 2A scope and benchmark deltas.
Comments suppressed due to low confidence (1)

tests/linq/test_linq_fold_ast.das:474

  • Same issue as the where-case: qm_resolve_comprehension is run on the invoke’s block argument (ExprMakeBlock), which will always return null and doesn’t validate the absence of comprehensions inside the generated loop body. The assertion should inspect the block’s body/return expression instead of the block node itself.
        let inv = body_expr as ExprInvoke
        var arg0 = clone_expression(inv.arguments[0])
        var maybe_comp <- qm_resolve_comprehension(arg0)
        t |> success(maybe_comp == null, "loop planner must NOT emit a comprehension")
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/linq/test_linq_fold_ast.das Outdated
Comment thread daslib/linq_fold.das Outdated
Comment thread benchmarks/sql/select_count.das Outdated
Comment thread benchmarks/sql/LINQ.md Outdated
borisbat and others added 2 commits May 16, 2026 11:34
PR #2689 review fixes (Copilot):

1. Counter lane drop-projection bug. `_fold(src._select(f).count())` was
   skipping the projection entirely, which diverges from raw LINQ
   `count(select(src, f))` when `f` has side effects. Counter lane now
   binds the final projection to a discardable local per matched element
   so user-visible side effects fire. The optimizer dead-code-eliminates
   the binding for pure projections (the common case — `_.x * 2`,
   `_.price` etc.), so the 7.5× select_count speedup is preserved.

2. Vacuous comprehension assertion in two AST tests. Pass `body_expr`
   (the full ExprInvoke wrapper) to `qm_resolve_comprehension` instead
   of `inv.arguments[0]` (the inner ExprMakeBlock, which can never match
   either branch of the resolver). The fixed form actually verifies the
   loop output is not the `fromComprehension=true` shape.

Adds 2 behavioral tests for the side-effects invariant (single
`select|count` and `where|select|count`). All Phase 2A benchmarks held:
count_aggregate 5/5, chained_where 8/17 (2.1×), select_count 2/15
(7.5×), to_array_filter 11/11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reflect counter-lane semantics fix: projection is now evaluated per
matched element (side effects fire); optimizer DCEs pure projections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread daslib/linq_fold.das Outdated
Comment thread tests/linq/test_linq_fold_ast.das Outdated
PR #2689 review fixes (Copilot, round 2):

1. Peel-each + reserve guard. The inline `each(<x>)` peel + `sourceHasLength`
   gate previously accepted any non-iterator inner type, including
   `each(lambda)` (a lambda iterable per builtin.das:1351). That would peel
   to a lambda, then emit `reserve(length(lambda))` which has no overload
   and would fail to compile inside the macro output. Phase 2A never hit
   this in practice because the test suite only uses array sources, but
   it's a latent trap.

   Extracted `peel_each_length_source` and `type_has_length` helpers.
   Peel now triggers only when the inner type satisfies `isGoodArrayType
   || isGoodTableType || isString || isArray (T[N]) || isRange`. Same
   predicate gates the array-lane reserve emission, so the two stay in
   sync. Lambdas / custom user iterables fall through unfolded.

2. Reworded `test_select_count_fold_result` assertion message: the old
   "(projection ignored by counter)" wording was outdated after the
   counter-lane fix in 6226a1e — the planner now evaluates the
   projection per iteration (for side effects); only the value is
   discarded. Reads "(projection does not affect count value)" now.

select_count benchmark held at 2 ns/op (vs 15 for old fold), to_array_filter
held at 11/11 parity. AST + behavioral tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread daslib/linq_fold.das
Comment thread benchmarks/sql/LINQ.md
@borisbat borisbat merged commit d805df7 into master May 16, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants