diff --git a/benchmarks/sql/linq_fold_chain_audit.md b/benchmarks/sql/linq_fold_chain_audit.md
new file mode 100644
index 000000000..bc5924ec1
--- /dev/null
+++ b/benchmarks/sql/linq_fold_chain_audit.md
@@ -0,0 +1,1413 @@
+# `_fold` chain coverage audit — silent fall-off catalog
+
+Generated 2026-05-23 from `0a2da407f`. Probe files live under
+`/tmp/audit_probes/` (re-runnable; see "How to re-run" at the bottom).
+
+## Status — what this audit has closed
+
+**Theme 1 (terminal `_select` extension) — landed 2026-05-24** (`59c4f3f98`):
+
+- **1a, 1e + motivating** (`plan_order_family` / `plan_decs_order_family`): terminal `_select` accepted after `take(N)`. Bounded-heap holds the raw element; projection runs ≤K times at return.
+- **plan_reverse / plan_decs_reverse**: terminal `_select` accepted after `reverse [+ take(N)]`. Closes the natural "filter, reverse for newest-first, take K, project" idiom. NOT closed: the `reverse + _select + take` ordering (2c / 2e exact shape) — user must reorder to `reverse + take + _select`.
+- **8b** (`plan_decs_join`): single trailing `_select` between `_join` and the implicit `to_array`. Substitutes via a let-bound join-result + projection.
+- **7a** (`plan_zip`): 3-arg `zip(a, b, sel)` pre-lowered to 2-arg `zip(a, b) |> _select(sel-as-tuple)` — the natural dot-product idiom now splices.
+
+Coverage extension: 1395 → 1415 linq tests (10 new tests in `tests/linq/test_linq_fold_terminal_select.das`).
+
+**Theme 2 (trailing `_where` / HAVING) — landed 2026-05-24**:
+
+- **8a, C6** (`plan_decs_join`): single trailing `_where` between `_join` and the terminator. Predicate references join-result fields; emission binds the result once per pair and gates `count++` / `push_clone`. Composes with the terminal `_select` from Theme 1.
+- **4a** (`plan_group_by`) + **4e** (`plan_decs_group_by` via shared `plan_group_by_core`): trailing `_where` AFTER `_select(reducer)`, i.e. SQL HAVING on the post-aggregate tuple. Binds the constructed output once per bucket and gates buf-emit / count-emit. Distinct from the existing `having_` slot (which is pre-select and can lift hidden reducer slots) — both can fire on the same chain.
+- **5c** (`plan_loop_or_count` across all 4 lanes — counter / accumulator / early-exit / array): `take(N)._where(p).terminator` accepted. Take cap ticks unconditionally per element; trailing `_where` gates only the per-element contribution, preserving the "first N elements, then keep matching" semantic that auto-rewriting can't reproduce.
+
+Coverage extension: 1415 → 1437 linq tests (12 new tests in `tests/linq/test_linq_fold_theme2_trailing_where.das`).
+
+Still open (queued for the next session per the cross-cutting findings below):
+
+- Theme 3 — cross-arm composition (5 of 6 composition probes).
+- Themes 4–8 — see "Cross-cutting findings" section.
+
+
+The audit catalogs **silent fall-off** in `daslib/linq_fold.das`: chains where a
+natural user phrasing makes the splice arm return null and the planner falls
+back to the slow default cascade (`fold_linq_default`) — without any warning.
+Every row below shows the post-macro `ast_dump` of one probe, classified as
+SPLICE-FIRES (single-pass specialized loop) or FALLS-OFF (cascade of
+`__::linq\`helper\`` calls plus intermediate `array<...>` allocations).
+
+Each "FALLS-OFF" row names the bail line in `linq_fold.das` and proposes
+either a cheap user-side rewrite or an arm extension. The audit does NOT
+change any code — every finding is a follow-up TODO.
+
+---
+
+## Motivating example — closest 10 sounds
+
+The audit was prompted by this user scenario: "I have an array of sounds with
+`(id, position)` and a head position. Give me the ids of the 10 closest." The
+natural translation looks like:
+
+```das
+let closest_ids <- _fold(each(sounds)
+    ._order_by(distance(_.position, head))
+    .take(10)
+    ._select(_.id)
+    .to_array())
+```
+
+That chain SILENTLY FALLS OFF the splice. The arm that should fire is
+`plan_order_family` (linq_fold.das:1234), which emits a bounded-heap walk
+holding at most N elements — but its accept list is `[where_*] order_*
+[take|first]?`, and the `._select(_.id)` between `take` and `to_array` is not
+in that list. The chain falls through line 1284's `else { return null }` and
+into the default 3-pass cascade.
+
+**With `_select` (the natural form)** — `/tmp/audit_probes/motivating_with_select.das`:
+
+```das
+return <- invoke($(var source : iterator<Sound&>) : array<int> {
+    var pass_0 <- __::linq`order_by_to_array(source, $(_) { return distance(_.position, head); });
+    __::linq`take_inplace(pass_0, 10);
+    var pass_2 <- __::linq`select(pass_0, $(_) { return _.id; });
+    __::builtin`finalize(pass_0);
+    return <- pass_2;
+}, __::builtin`each(sounds))
+```
+
+Default cascade: full sort over N elements + allocation, then truncate, then
+another allocation for the projection.
+
+**Without `_select` (the splice-eligible form)** — `/tmp/audit_probes/motivating_without_select.das`:
+
+```das
+return <- invoke($(source : array<Sound>) : array<Sound> {
+    var order_buf : array<Sound>;
+    for (it in source)
+        if (length(order_buf) < 10)
+            push_clone(order_buf, it)
+            spliced_push_heap(order_buf, $(v1, v2) => less(distance(v1.position, head), distance(v2.position, head)))
+        elif (less(distance(it.position, head), distance(order_buf[0].position, head)))
+            spliced_pop_heap(order_buf, ...)
+            order_buf[length(order_buf) - 1] = it
+            spliced_push_heap(order_buf, ...)
+    order_inplace(order_buf, ...)
+    return <- order_buf
+}, sounds)
+```
+
+Single-pass bounded heap holding ≤10 elements. No N-sized allocation.
+
+The rest of this document is the systematic version of that comparison
+across every `plan_*` arm in the splice machinery.
+
+---
+
+## Chain 1 — `plan_order_family` / `plan_decs_order_family`
+
+**Accepts**: `[where_*] order_* [take(N)|first|first_or_default]?` (linq_fold.das:1234 array, :4547 decs)
+**Common bails**: select-anywhere (line 1284 / 4594 fall-through), where-after-order (line 1252 / 4566), reverse-in-chain (line 1284 / 4594 fall-through), explicit comparator on bare `order` / `order_descending` (line 1264 / 4575)
+
+### 1a — Closest 10 sounds, return ids (array)
+
+**Probe** (`/tmp/audit_probes/chain1_1a.das`):
+```das
+def probe_1a(sounds : array<Sound>; head : float3) : array<int> {
+    unsafe {
+        return <- _fold(each(sounds)._order_by(distance(_.position, head)).take(10)._select(_.id).to_array())
+    }
+}
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<Sound&>) : array<int> {
+    var pass_0 <- __::linq`order_by_to_array(source, $(_) { return distance(_.position, head); });
+    __::linq`take_inplace(pass_0, 10);
+    var pass_2 <- __::linq`select(pass_0, $(_) { return _.id; });
+    __::builtin`finalize(pass_0);
+    return <- pass_2;
+}, __::builtin`each(sounds))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `_select(_.id)` between `take(10)` and `to_array()` is not in plan_order_family's allowed call list (linq_fold.das:1284 fall-through). The cascade fully sorts N elements into `pass_0`, then truncates, then allocates a fresh `array<int>` for the projection. Rewrite: split into two steps — `let top <- _fold(...take(10).to_array()); let ids <- [for (s in top); s.id]`. Extending the arm to accept a terminal `_select` after `take(N)` / `first` / `first_or_default` is the highest-impact fix since the bounded heap already holds ≤N elements; a final projection at emission time is essentially free.
+
+### 1b — Top 5 scores descending via order + reverse (array)
+
+**Probe** (`/tmp/audit_probes/chain1_1b.das`):
+```das
+def probe_1b(scores : array<Score>) : array<Score> {
+    unsafe {
+        return <- _fold(each(scores)._order_by(_.score).reverse().take(5).to_array())
+    }
+}
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<Score&>) : array<Score> {
+    var pass_0 <- __::linq`order_by_to_array(source, $(_) { return _.score; });
+    __::linq`reverse_inplace(pass_0);
+    __::linq`take_inplace(pass_0, 5);
+    return <- pass_0;
+}, __::builtin`each(scores))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `reverse()` between `_order_by` and `take` is not in the accepted vocabulary (line 1284 fall-through). The natural way for a user to ask for "top 5 descending" via the ascending key is `order_by(...).reverse().take(5)`; the splice path requires the user to know `_order_by_descending(...).take(5)` instead. The arm could recognize `_order_by(k).reverse()` and rewrite it to the `_order_by_descending(k)` form before the comparator-emission step.
+
+### 1c — Where-after-order (array)
+
+**Probe** (`/tmp/audit_probes/chain1_1c.das`):
+```das
+def probe_1c(employees : array<Employee>) : array<Employee> {
+    unsafe {
+        return <- _fold(each(employees)._order_by(_.salary)._where(_.dept == "eng").take(10).to_array())
+    }
+}
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<Employee&>) : array<Employee> {
+    var pass_0 <- __::linq`order_by_to_array(source, $(_) { return _.salary; });
+    var pass_1 <- __::linq`where_(pass_0, $(_) { return _.dept == "eng"; });
+    __::builtin`finalize(pass_0);
+    __::linq`take_inplace(pass_1, 10);
+    return <- pass_1;
+}, __::builtin`each(employees))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `_where` after `_order_by` is explicitly rejected by `if (hasOrder) return null` at line 1252. Sorting first and then filtering is genuinely wasteful (sorts ~N elements just to drop most of them); the rewrite is mechanical — move the `_where` before the `_order_by`. Worth a lint suggestion rather than a splice extension: keeping the post-sort `_where` semantically correct in the splice would require either re-running the filter inside the bounded heap or re-allocating after the sort, both of which lose the cascade's correctness while not actually buying anything over the trivial user rewrite.
+
+### 1d — Closest 10 sounds, baseline (array)
+
+**Probe** (`/tmp/audit_probes/chain1_1d.das`):
+```das
+def probe_1d(sounds : array<Sound>; head : float3) : array<Sound> {
+    unsafe {
+        return <- _fold(each(sounds)._order_by(distance(_.position, head)).take(10).to_array())
+    }
+}
+```
+
+**Generated** (trimmed):
+```das
+return <- invoke($(source : array<Sound>) : array<Sound> {
+    var order_buf : array<Sound>;
+    for (it in source)
+        if (length(order_buf) < 10)
+            push_clone(order_buf, it)
+            spliced_push_heap(order_buf, ...)
+        elif (less(distance(it.position, head), distance(order_buf[0].position, head)))
+            spliced_pop_heap(order_buf, ...)
+            order_buf[length(order_buf) - 1] = it
+            spliced_push_heap(order_buf, ...)
+    order_inplace(order_buf, ...)
+    return <- order_buf;
+}, sounds)
+```
+
+**Classification**: SPLICE-FIRES — bounded-heap arm (line 1300 `useBoundedHeap`).
+
+**Conclusion**: Baseline confirms the bounded-heap splice path — O(N log K) heap maintenance over the walk, no full-N sort, no full-N allocation.
+
+### 1e — Closest 10 sounds, return ids (decs)
+
+**Probe** (`/tmp/audit_probes/chain1_1e.das`): same shape as 1a over `from_decs_template(type<DecsSound>)`.
+
+**Generated** (trimmed):
+```das
+return <- invoke($(var source : iterator<tuple<id:int; position:float3>>) : array<int> {
+    var pass_0 <- __::linq`order_by_to_array(source, ...);
+    __::linq`take_inplace(pass_0, 10);
+    var pass_2 <- __::linq`select(pass_0, ...);
+    __::builtin`finalize(pass_0);
+    return <- pass_2;
+}, invoke($() : iterator<tuple<...>> {
+    var res : array<tuple<...>>;
+    for_each_archetype(..., $(arch) {
+        for (ds_id, ds_position in get_ro(arch, "ds_id", type<int>), get_ro(arch, "ds_position", type<float3>))
+            push(res, tuple(ds_id, ds_position));
+    });
+    return <- __::linq`to_sequence(res);
+}))
+```
+
+**Classification**: FALLS-OFF — default cascade, with the **double penalty** that the decs source bridge eagerly materializes ALL rows into `res` before the array cascade even starts.
+
+**Conclusion**: Same trailing `_select` mismatch as 1a (line 4594 fall-through in `plan_decs_order_family`). Worse than 1a because when `plan_decs_order_family` returns null, the `from_decs_template` bridge has no other splice to bind it to — it degenerates to full materialization of every archetype row into a temp `res`, wrapped in `to_sequence` for the array cascade. Two allocations of the full row set + one projection. Extension fix: same as 1a — accept a terminal `_select` after `take(N)` / `first` / `first_or_default` in `plan_decs_order_family`.
+
+### 1f — Closest 10 sounds, baseline (decs)
+
+**Probe** (`/tmp/audit_probes/chain1_1f.das`): same shape as 1d over decs.
+
+**Generated** (trimmed):
+```das
+return <- invoke($() : array<tuple<id:int; position:float3>> {
+    var decs_buf : array<tuple<...>>;
+    for_each_archetype(..., $(arch) {
+        for (ds_id, ds_position in get_ro(...), get_ro(...))
+            if (length(decs_buf) < 10)
+                push_clone(decs_buf, tuple(ds_id, ds_position))
+                spliced_push_heap(decs_buf, ...)
+            elif (less(distance(ds_position, head), distance(decs_buf[0].position, head)))
+                spliced_pop_heap(decs_buf, ...)
+                decs_buf[length(decs_buf) - 1] = tuple(...)
+                spliced_push_heap(decs_buf, ...)
+    });
+    order_inplace(decs_buf, ...);
+    return <- decs_buf;
+})
+```
+
+**Classification**: SPLICE-FIRES — bounded-heap arm fused into a single `for_each_archetype` (line 4609 `useBoundedHeap`).
+
+**Conclusion**: Decs baseline confirms the bounded-heap path through the archetype walk: ≤10 push_clones to `decs_buf`, no eager materialization.
+
+### Chain 1 — follow-up TODOs
+
+- **Highest impact**: extend `plan_order_family` (line 1234) and `plan_decs_order_family` (line 4547) to accept a terminal `_select` after `take(N)` / `first` / `first_or_default`. The bounded-heap arm already holds at most N elements; emitting the projection during the final `order_inplace` walk is essentially free. The "closest N, return projected field" idiom is extremely natural.
+- Recognize `_order_by(k).reverse()` and rewrite to `_order_by_descending(k)` (and dual) before the comparator-emission step. Currently `reverse()` mid-chain is a hard bail (line 1284 / 4594).
+- Lint suggestion (style rule, not a splice extension): `_order_by(...)._where(...)` should reorder to `_where(...)._order_by(...)`. Sorting before filtering is wasteful in any execution mode.
+- Decs FALLS-OFF cases are doubly penalized — the bridge's eager-materialize default lands behind every `plan_decs_order_family` bail. Worth a dedicated diagnostic when this path is hit.
+
+---
+
+## Chain 2 — `plan_reverse` / `plan_decs_reverse`
+
+**Accepts**: `[where_*][select?] reverse [take(N)]? [count|first|first_or_default]?` (linq_fold.das:1764 array, :4802 decs)
+**Common bails**: `where_` / `select` AFTER reverse or AFTER select (line 1797-1800 / 4833-4838 — `seenSelect || hasReverse` guards), order-anywhere (line 1813 / 4851 fall-through), double-reverse (line 1804 / 4843), `take(N)` paired with a separate terminator (line 1816 / 4854 bail).
+
+Note: `_where → _select → reverse → take → to_array` (select BEFORE reverse, no further select/where after) is ACCEPTED — the guards prevent `where AFTER select` and `select AFTER select`, not `select BEFORE reverse`.
+
+### 2a — Reverse + distinct_by (array)
+
+**Probe** (`/tmp/audit_probes/chain2_2a.das`):
+```das
+def probe_2a(events : array<Event>) : array<Event> {
+    unsafe {
+        return <- _fold(each(events).reverse()._distinct_by(_.kind).to_array())
+    }
+}
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<Event&>) : array<Event> {
+    var pass_0 <- __::linq`reverse_to_array(source);
+    __::linq`distinct_by_inplace(pass_0, $(_) { return _.kind; });
+    return <- pass_0;
+}, __::builtin`each(events))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `distinct_by` after `reverse` is not in plan_reverse's vocabulary (line 1813 fall-through), and the call appears before any recognized terminator so plan_reverse cannot peel one. plan_distinct in turn doesn't model `reverse` in its prelude either (line 1999 fall-through). Result: full `reverse_to_array` allocation + `distinct_by_inplace`. Cheap user rewrite: `_distinct_by` is order-stable, so the user could write `_distinct_by(_.kind).reverse()` — but that's a behavior change (different element survives per kind). A real splice extension would need `plan_reverse` to recognize `reverse + distinct_by` and emit a single-pass walk that retains the LAST element per key, then reverses.
+
+### 2b — Order then reverse (array)
+
+**Probe** (`/tmp/audit_probes/chain2_2b.das`):
+```das
+def probe_2b(events : array<Event>) : array<Event> {
+    unsafe {
+        return <- _fold(each(events)._order_by(_.ts).reverse().to_array())
+    }
+}
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<Event&>) : array<Event> {
+    var pass_0 <- __::linq`order_by_to_array(source, $(_) { return _.ts; });
+    __::linq`reverse_inplace(pass_0);
+    return <- pass_0;
+}, __::builtin`each(events))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: Symmetric to 1b. `plan_reverse` bails at `_order_by` (line 1813 fall-through); `plan_order_family` bails at `reverse()` (line 1284 fall-through). Neither splice fires. User rewrite: `_order_by_descending(_.ts).to_array()`. Same TODO as chain 1: recognize `_order_by(k).reverse()` and normalize to `_order_by_descending(k)` before either planner gets it.
+
+### 2c — Select-after-reverse (array)
+
+**Probe** (`/tmp/audit_probes/chain2_2c.das`):
+```das
+def probe_2c(users : array<User>) : array<string> {
+    unsafe {
+        return <- _fold(each(users)._where(_.active).reverse()._select(_.name).take(5).to_array())
+    }
+}
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<User&>) : array<string> {
+    var pass_0 <- __::linq`where_to_array(source, $(_) { return _.active; });
+    __::linq`reverse_inplace(pass_0);
+    var pass_2 <- __::linq`select(pass_0, $(_) { return _.name; });
+    __::builtin`finalize(pass_0);
+    __::linq`take_inplace(pass_2, 5);
+    return <- pass_2;
+}, __::builtin`each(users))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `_select` AFTER `reverse` trips the `hasReverse` half of the guard at line 1800 (`if (hasReverse || seenSelect) return null` inside the select arm). User rewrite to fire the splice is non-obvious — move the select before reverse: `_where(_.active)._select(_.name).reverse().take(5).to_array()` (which DOES splice, since select BEFORE reverse is accepted). Extension fix: in `plan_reverse`, when a terminal `_select(...)` follows `reverse` (or `reverse + take(N)`), treat it as a projection applied on the buffer return — same shape as the chain 1 terminal-projection extension.
+
+### 2d — Where + reverse + take (array baseline)
+
+**Probe** (`/tmp/audit_probes/chain2_2d.das`):
+```das
+def probe_2d(events : array<Event>) : array<Event> {
+    unsafe {
+        return <- _fold(each(events)._where(_.active).reverse().take(5).to_array())
+    }
+}
+```
+
+**Generated** (trimmed):
+```das
+return <- invoke($(source : array<Event>) : array<Event> {
+    var buf : array<Event>;
+    for (it in source)
+        if (it.active)
+            push_clone(buf, it);
+    reverse_inplace(buf);
+    resize(buf, length(buf) > 5 ? 5 : length(buf));
+    return <- buf;
+}, events)
+```
+
+**Classification**: SPLICE-FIRES — R5 buffer + reverse_inplace + resize arm.
+
+**Conclusion**: Baseline confirms the standard plan_reverse buffer arm — one filtered push pass, one in-place reverse, one resize to take(N).
+
+### 2e — Select-after-reverse (decs)
+
+**Probe** (`/tmp/audit_probes/chain2_2e.das`): same shape as 2c over decs.
+
+**Generated** (trimmed): identical structure to 2c, plus eager decs bridge materializing `res` first.
+
+**Classification**: FALLS-OFF — default cascade, doubled by eager decs materialization.
+
+**Conclusion**: Same root cause as 2c (line 4838 in plan_decs_reverse). Three full-N allocations: `res`, `pass_0` (where filtered), `pass_2` (selected). Same extension fix as 2c.
+
+### 2f — Where + reverse + take (decs baseline)
+
+**Probe** (`/tmp/audit_probes/chain2_2f.das`): same shape as 2d over decs.
+
+**Generated** (trimmed):
+```das
+return <- invoke($() : array<tuple<...>> {
+    var decs_buf : array<tuple<...>>;
+    for_each_archetype(..., $(arch) {
+        for (de_id, de_kind, de_ts, de_active in get_ro(...), get_ro(...), get_ro(...), get_ro(...))
+            if (de_active)
+                push_clone(decs_buf, tuple(de_id, de_kind, de_ts, de_active));
+    });
+    reverse_inplace(decs_buf);
+    resize(decs_buf, 5 < length(decs_buf) ? 5 : length(decs_buf));
+    return <- decs_buf;
+})
+```
+
+**Classification**: SPLICE-FIRES — plan_decs_reverse buffer + reverse_inplace + resize.
+
+**Conclusion**: Decs-side mirror of 2d.
+
+### Chain 2 — follow-up TODOs
+
+- **Highest impact**: accept a terminal `_select(...)` after `reverse` (and after `reverse + take(N)`) in BOTH `plan_reverse` (line 1764) and `plan_decs_reverse` (line 4802). The current bail catches the natural "filter-then-reverse-for-newest-first-then-project" idiom.
+- Recognize `reverse + distinct_by` as a fused walk retaining the LAST element per key, single archetype walk for decs and single source walk + buffer for arrays.
+- Recognize `_order_by(k).reverse()` and rewrite to `_order_by_descending(k)` (chain 1 also wants this).
+- Decs FALLS-OFF cases hit the same eager-materialize double penalty as chain 1.
+
+---
+
+## Chain 3 — `plan_distinct` / `plan_decs_distinct`
+
+**Accepts**: `[where_*][select?] (distinct|distinct_by) [take(N)]? [count|long_count|sum|to_array]?` (linq_fold.das:1945 array, :5049 decs)
+**Common bails**: `where_` AFTER `select` or `distinct` (line 1979 / 5085), second `select` (line 1982 / 5090, seenSelect), order-anywhere (line 1998 / 5108 fall-through), reverse-anywhere (same), 2-arg `count` / `long_count` / `sum` with predicate (line 1953-1957 / 5057-5063 — terminator-peel only fires for 1-arg form), distinct-after-distinct (line 1986 / 5095), `take(N)` paired with non-implicit terminator (line 2002 / 5112).
+
+Note: `_select(_.field)._where(...) → distinct → count` flips the order to `_where AFTER _select` which DOES hit the seenSelect bail (line 1979). `_where → _select → distinct → count` is ACCEPTED.
+
+### 3a — Select then where then distinct then count (array)
+
+**Probe** (`/tmp/audit_probes/chain3_3a.das`):
+```das
+def probe_3a(events : array<EventA>) : int {
+    unsafe {
+        return _fold(each(events)._select(_.email)._where(length(_) > 0).distinct().count())
+    }
+}
+```
+
+**Generated**:
+```das
+return invoke($(var source : iterator<EventA&>) : int {
+    var pass_0 <- __::linq`select_to_array(source, $(_) { return _.email; });
+    var pass_1 <- __::linq`where_(pass_0, $(_) { return length(_) > 0; });
+    __::builtin`finalize(pass_0);
+    __::linq`distinct_inplace(pass_1);
+    var pass_3 = __::linq`count(pass_1);
+    __::builtin`finalize(pass_1);
+    return <- pass_3;
+}, __::builtin`each(events))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `_where` AFTER `_select` trips the seenSelect bail at line 1979. 4-pass cascade for what could be a single-walk streaming dedup (no buffer at all because terminator is `count`). `_where → _select → distinct → count` is splice-eligible; the user just has to swap the where and select.
+
+### 3b — Distinct then order (array)
+
+**Probe** (`/tmp/audit_probes/chain3_3b.das`):
+```das
+def probe_3b(rows : array<Row>) : array<Row> {
+    unsafe {
+        return <- _fold(each(rows)._distinct_by(_.user_id)._order_by(_.ts).to_array())
+    }
+}
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<Row&>) : array<Row> {
+    var pass_0 <- __::linq`distinct_by_to_array(source, $(_) { return _.user_id; });
+    __::linq`order_by_inplace(pass_0, $(_) { return _.ts; });
+    return <- pass_0;
+}, __::builtin`each(rows))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `_order_by` after `_distinct_by` is unrecognized in plan_distinct (line 1998 fall-through), and plan_order_family doesn't model `_distinct_by` as an upstream call (line 1284). Cascade materializes the distinct-by result, then in-place sorts. The two ops don't commute (distinct-then-sort ≠ sort-then-distinct), so no obvious user rewrite. Extension fix: plan_order_family could recognize an upstream `_distinct_by(keyFn)` and emit a fused walk that hash-tracks seen keys while feeding survivors into the bounded heap.
+
+### 3c — Distinct then predicated count (array)
+
+**Probe** (`/tmp/audit_probes/chain3_3c.das`):
+```das
+def probe_3c(events : array<EventR>) : int {
+    unsafe {
+        return _fold(each(events)._distinct_by(_.region)._count(_.active))
+    }
+}
+```
+
+**Generated**:
+```das
+return invoke($(var source : iterator<EventR&>) : int {
+    var pass_0 <- __::linq`distinct_by_to_array(source, $(_) { return _.region; });
+    var pass_1 = __::linq`count(pass_0, $(_) { return _.active; });
+    __::builtin`finalize(pass_0);
+    return <- pass_1;
+}, __::builtin`each(events))
+```
+
+**Classification**: FALLS-OFF — default cascade (tier-2 helpers).
+
+**Conclusion**: `_count(predicate)` is the 2-arg form, and the terminator-peel at line 1953 requires `length(calls.back()._0.arguments) == 1` — bails BY DESIGN since the 1-arg splice template would silently drop the predicate (emit `length(seen)` instead of counting predicate-true survivors). Extension fix: extend plan_distinct's terminator branch to recognize 2-arg `_count(p)` and `_long_count(p)` and emit `if (p(it)) cnt++` at the fresh-key site.
+
+### 3d — Where + distinct_by + count (array baseline)
+
+**Probe** (`/tmp/audit_probes/chain3_3d.das`):
+```das
+def probe_3d(events : array<EventD>) : int {
+    return _fold(each(events)._where(_.recent)._distinct_by(_.user_id).count())
+}
+```
+
+**Generated** (trimmed):
+```das
+return invoke($(source : array<EventD>) : int {
+    var inscope seen : table<int; void>;
+    for (it in source)
+        if (it.recent) {
+            let k = unique_key(it.user_id);
+            if (!key_exists(seen, k))
+                insert(seen, k);
+        }
+    return length(seen);
+}, events)
+```
+
+**Classification**: SPLICE-FIRES — buffer-free count arm (terminator is `length(seen)`).
+
+**Conclusion**: Streaming-dedup arm — single source walk, hashed dedup, return length. No element buffer (line 2014-2016, `needBuffer = false` when terminator is count).
+
+### 3e — Select then where then distinct then count (decs)
+
+**Probe** (`/tmp/audit_probes/chain3_3e.das`): same shape as 3a over decs.
+
+**Classification**: FALLS-OFF — default cascade, doubled by eager decs materialization.
+
+**Conclusion**: plan_decs_distinct bails at line 5085. Three full-N allocations (`res` + `pass_0` + `pass_1`) to compute a scalar count. Same user rewrite (swap where and select) and same extension fix as 3a.
+
+### 3f — Where + distinct_by + count (decs baseline)
+
+**Probe** (`/tmp/audit_probes/chain3_3f.das`):
+
+**Generated** (trimmed):
+```das
+return invoke($() : int {
+    var inscope decs_seen : table<int; void>;
+    for_each_archetype(..., $(arch) {
+        for (ded_user_id, ded_recent in get_ro(...), get_ro(...))
+            if (ded_recent) {
+                let decs_k = unique_key(ded_user_id);
+                if (!key_exists(decs_seen, decs_k))
+                    insert(decs_seen, decs_k);
+            }
+    });
+    return length(decs_seen);
+})
+```
+
+**Classification**: SPLICE-FIRES — plan_decs_distinct streaming-dedup arm with hoisted `decs_seen` table.
+
+**Conclusion**: Hoisted seen-table spans archetypes, single walk with where + key insert, return length.
+
+### Chain 3 — follow-up TODOs
+
+- **Highest impact**: extend the 1-arg terminator peel in plan_distinct (line 1953) and plan_decs_distinct (line 5057) to accept 2-arg `_count(p)` / `_long_count(p)`. Emit predicate as a gate at the fresh-key site.
+- Document (possibly as a STYLE lint) that `_select → _where → distinct → terminator` should be rewritten to `_where → _select → distinct → terminator` — the pre-select form is splice-eligible.
+- Niche: recognize `_distinct_by(keyFn) + _order_by(otherKey) + to_array` and emit a fused hash-track + bounded sort walk.
+- Decs FALLS-OFF inherits the eager-materialize double penalty from the bridge.
+
+---
+
+## Chain 4 — `plan_group_by` / `plan_decs_group_by`
+
+**Accepts**: `[where_*][select*] group_by_lazy(key) [having_]? select(reducer) [count]?` (linq_fold.das:3030 array, :4500 decs, shared core :2729)
+**Common bails**: missing terminal select (line 3046 / 4516), missing group_by_lazy (line 3056 / 4526), unrecognized reducer specs (line 2808), bare reducer + hidden HAVING slots (line 2821).
+
+### 4a — Inventory: sum price per category, keep categories totaling >1000
+
+**Probe** (`/tmp/audit_probes/chain4_4a.das`):
+```das
+return <- _fold(items
+    ._group_by(_.category)
+    ._select((C = _._0, T = _._1 |> select(@(i : Item) => i.price) |> sum))
+    ._where(_.T > 1000))
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`group_by_lazy(source, $(_) { return _.category; });
+var pass_1 <- __::linq`select(pass_0, $(_) {
+    return tuple(_._0, __::linq`sum(__::linq`select(_._1, <lambda i.price>)));
+});
+finalize(pass_0);
+var pass_2 <- __::linq`where_(pass_1, $(_) { return _.T > 1000; });
+finalize(pass_1);
+return <- pass_2;
+```
+
+**Classification**: FALLS-OFF (default cascade — three eager array allocations).
+
+**Conclusion**: A `_where` AFTER `_select(reducer)` lives outside the group_by recognizer (linq_fold.das:3046 demands `select` to be the immediate tail, optionally with one `having_` between it and `group_by_lazy`). The user's "post-aggregate HAVING" is exactly what the optional `having_` slot is for — rewrite to `._group_by(_.category)._having(_._1 |> select(@(i:Item)=>i.price) |> sum > 1000)._select(...)` and the splice fires. Arm extension: peel a single trailing `_where` and translate it to a `having_` slot when its predicate references only post-projection field names.
+
+### 4b — Brands sorted by count
+
+**Probe** (`/tmp/audit_probes/chain4_4b.das`):
+```das
+return <- _fold(items
+    ._group_by(_.brand)
+    ._select((B = _._0, C = _._1 |> length))
+    ._order_by(_.C))
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`group_by_lazy(source, $(_) { return _.brand; });
+var pass_1 <- __::linq`select(pass_0, $(_) { return tuple(_._0, length(_._1)); });
+finalize(pass_0);
+__::linq`order_by_inplace(pass_1, $(_) { return _.C; });
+return <- pass_1;
+```
+
+**Classification**: FALLS-OFF (default cascade).
+
+**Conclusion**: Any post-`select(reducer)` op forces the recognizer to bail (line 3046). `plan_group_by_core` finishes first; `_order_by` then operates on the bucket-array shape via tier-2. No clean rewrite — this is a genuine two-stage pipeline. Arm extension: after `plan_group_by_core` emits its table, peel a trailing `_order_by` / `_reverse` / `take` cascade on the bucket output as a buffer-required post-pass.
+
+### 4c — Distinct names per brand
+
+**Probe** (`/tmp/audit_probes/chain4_4c.das`):
+```das
+return <- _fold(items
+    ._group_by(_.brand)
+    ._select((B = _._0, Names = _._1 |> select(@(i : Item) => i.name) |> distinct)))
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`group_by_lazy(source, $(_) { return _.brand; });
+var pass_1 <- __::linq`select(pass_0, $(_) {
+    return tuple(_._0, __::linq`distinct(__::linq`select(_._1, <lambda i.name>)));
+});
+finalize(pass_0);
+return <- pass_1;
+```
+
+**Classification**: FALLS-OFF (default cascade).
+
+**Conclusion**: `recognize_reducer_specs` (line 2807) only knows count / length / long_count / sum / min / max / first / average + their `select(...) |> reducer` variants. `distinct` is not a recognized reducer spec — `specs` comes back empty and we bail at line 2808. Extension: accept `distinct[_by]` / `reverse` / `to_array` as reducer ends, accumulating to `array<T>` slot type (table-of-arrays accumulator pattern already exists).
+
+### 4d — Baseline: count per brand
+
+**Probe** (`/tmp/audit_probes/chain4_4d.das`):
+```das
+return <- _fold(items
+    ._group_by(_.brand)
+    ._select((B = _._0, C = _._1 |> length)))
+```
+
+**Generated** (trimmed):
+```das
+var inscope tab : table<string const; tuple<B; C>>
+var dummy : tuple<B; C>
+for (it in source) {
+    let k = it.brand
+    let uk = unique_key(k)
+    var entry & = tab?[uk] ?? dummy
+    if (addr(entry) == addr(dummy)) {
+        entry._1 = 1
+        dummy._0 = k; tab[uk] = dummy; dummy = default
+    } else {
+        ++entry._1
+    }
+}
+var buf; reserve(buf, length(tab))
+for (kv in values(tab)) { buf |> push_clone(kv) }
+return <- buf
+```
+
+**Classification**: SPLICE-FIRES (`plan_group_by_core` table-state arm — line 2853).
+
+**Conclusion**: Reference arm — table-of-accumulators + addr-compare first-key-wins state machine.
+
+### 4e — DECS variant of 4a (post-aggregate HAVING)
+
+Same shape as 4a over `from_decs_template`. Decs bridge unrolls (good) but bucket then materializes through the standard array cascade — worst of both worlds (for_each_archetype expansion + three array allocations + per-bucket reducer invoke).
+
+**Classification**: FALLS-OFF (line 4516 mirrors line 3046).
+
+**Conclusion**: `plan_group_by_core` is shared, so one extension covers both planners.
+
+### 4f — DECS baseline (count per brand)
+
+**Generated** (trimmed):
+```das
+var inscope decs_tab : table<string const; tuple<B; C>>
+var decs_dummy : tuple<B; C>
+for_each_archetype(<hash>, <erq>, $(arch) {
+    for (e_brand in get_ro(arch,"e_brand",type<string>))
+        let decs_k = e_brand
+        let decs_uk = unique_key(decs_k)
+        var decs_entry & = decs_tab?[decs_uk] ?? decs_dummy
+        if (addr(decs_entry) == addr(decs_dummy)) {
+            decs_entry._1 = 1
+            decs_dummy._0 = decs_k; decs_tab[decs_uk] = decs_dummy
+            decs_dummy = default
+        } else { ++decs_entry._1 }
+})
+... reserve(decs_buf); for (kv in values(decs_tab)); push_clone ...
+```
+
+**Classification**: SPLICE-FIRES (`plan_decs_group_by` → shared core with decs adapter).
+
+**Conclusion**: Decs adapter routes the table-accumulator through `for_each_archetype`. User MUST write `.to_array()` explicitly here.
+
+### Chain 4 — follow-up TODOs
+
+- **HAVING-shaped trailing `_where`**: post-aggregate filter after `_select(reducer)` is the natural shape for "GROUP BY ... HAVING SUM(x) > N". Peel one trailing `_where` and translate to synthetic `having_`.
+- **`_order_by` / `_reverse` / `take` on group buckets**: very common SQL shape. Add a post-pass to `plan_group_by_core` that inlines these into the buf-emit loop.
+- **`distinct[_by]` as a per-bucket reducer**: extend `recognize_reducer_specs` with array-shaped reducers; slot type becomes `array<T>`.
+
+---
+
+## Chain 5 — `plan_loop_or_count`
+
+**Accepts**: `[where_*][select*][skip?][skip_while?][take_while?][take?] [terminator]?` over 17 terminator names (count / long_count / sum / min / max / average / first / first_or_default / any / all / contains / element_at / element_at_or_default / last / last_or_default / single / single_or_default / aggregate). Source must be array-typed via `each(...)` (linq_fold.das:1563).
+**Common bails**: where-after-range (line 1603), select-after-range (line 1630), impure select before where (line 1607), duplicate range ops (lines 1647/1654/1662/1669), buffer-required op (line 1674 — order_by/distinct/group_by/reverse all bail here to their planners), unknown op (line 1678), identity ARRAY chain (line 1748).
+
+### 5a — Two `where_`s around a pure `_select`
+
+**Probe** (`/tmp/audit_probes/chain5_5a.das`):
+```das
+return _fold(each(items)._where(_.active)._select(_.score)._where(_ > 100).count())
+```
+
+**Generated**:
+```das
+return invoke($(source) {
+    var acc = 0;
+    for (it in source)
+        if (it.active && (it.score > 100)) ++acc
+    return acc;
+}, items)
+```
+
+**Classification**: SPLICE-FIRES (counter lane with merged predicate).
+
+**Conclusion**: where-after-select is HANDLED by the planner (linq_fold.das:1605-1620) — when the projection is pure, the second `where_` substitutes the projection into the predicate and merges with the first via `&&`. Zero allocation. Pure-select fast path does exactly what users expect.
+
+### 5b — Select after a range op
+
+**Probe** (`/tmp/audit_probes/chain5_5b.das`):
+```das
+return <- _fold(each(items)._select((x = _.score, y = _.active)).skip(5)._select(_.x).to_array())
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`select_to_array(source, <projection lambda>);
+__::linq`skip_inplace(pass_0, 5);
+var pass_2 <- __::linq`select(pass_0, $(_) { return _.x; });
+finalize(pass_0);
+return <- pass_2;
+```
+
+**Classification**: FALLS-OFF (default cascade).
+
+**Conclusion**: linq_fold.das:1630 — the second `_select` arrives with `seenSkip == true` and the planner bails. A `_select` after any range op is structurally incompatible with the single-pass shape because the projection identity shifts mid-chain. User rewrite: collapse to a single projection. Arm extension would require multi-segment shape with per-segment binds — almost a new planner.
+
+### 5c — Where after take
+
+**Probe** (`/tmp/audit_probes/chain5_5c.das`):
+```das
+return _fold(each(items).take(100)._where(_.active).count())
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`take_to_array(source, 100);
+var pass_1 <- __::linq`where_(pass_0, $(_) { return _.active; });
+finalize(pass_0);
+var pass_2 = __::linq`count(pass_1);
+finalize(pass_1);
+return <- pass_2;
+```
+
+**Classification**: FALLS-OFF (default cascade).
+
+**Conclusion**: linq_fold.das:1603 — `where_` arrives with `seenTake == true`. Semantically distinct from `where.take`: `take(100)._where(...)` = "first 100 elements, then keep active ones" (count ≤ 100); `_where(...).take(100)` = "first 100 active ones" (count exactly 100 if there are ≥100 active). No automatic rewrite is safe. Extension: counter lane with take-cap that ticks BEFORE the where filter.
+
+### 5d — Aggregate terminator
+
+**Probe** (`/tmp/audit_probes/chain5_5d.das`):
+```das
+return _fold(each(items).aggregate(0, $(acc : int; x : Item) => acc + x.score))
+```
+
+**Generated**:
+```das
+return invoke($(source) {
+    var agg = 0;
+    for (it in source)
+        agg = agg + it.score
+    return agg;
+}, items)
+```
+
+**Classification**: SPLICE-FIRES (walk lane, Slice 5).
+
+**Conclusion**: `aggregate` is a recognized walk-lane terminator. Seed and reducer block are inlined; per-element body is just `agg = agg + body`. No invoke into `aggregate_impl`.
+
+### 5e — Baseline: where + select + take + sum
+
+**Probe** (`/tmp/audit_probes/chain5_5e.das`):
+```das
+return _fold(each(items)._where(_.active)._select(_.score).take(10).sum())
+```
+
+**Generated**:
+```das
+return invoke($(source) {
+    var tc = 0;
+    var acc = 0;
+    for (it in source)
+        if (it.active)
+            if (tc >= 10) break
+            else { ++tc; acc += it.score }
+    return acc;
+}, items)
+```
+
+**Classification**: SPLICE-FIRES (accumulator lane + take cap).
+
+**Conclusion**: Canonical happy path — where → select fuses, take adds a counter, sum is an accumulator. Reference arm.
+
+### Chain 5 — follow-up TODOs
+
+- **`where` after `take` / `take_while`**: not algebraically equivalent so can't auto-reorder, but the counter-lane shape could handle it manually.
+- **`select` after `skip` / `take` / `take_while` / `skip_while`**: requires per-segment bind handling. Probably better to document canonical order in `skills/linq.md` and lint-warn.
+- Unrecognized op cascade (order/distinct/group/reverse): the bail at line 1674 is the explicit handoff to per-family planners, not a fall-off.
+
+---
+
+## Chain 6 — `plan_decs_unroll`
+
+**Accepts**: same shape as `plan_loop_or_count` (count/long_count/sum/min/max/average/first/first_or_default/any/all/contains/element_at/element_at_or_default/last/last_or_default/single/single_or_default/aggregate/min_by/max_by + implicit-to_array) over `from_decs_template(...)` bridges. Delegates to plan_decs_order_family / plan_decs_reverse / plan_decs_distinct / plan_decs_group_by / plan_decs_join for buffer-required shapes.
+**Common bails**: source not a decs bridge (line 4455), no recognized terminator + no implicit to_array (line 4493), range extraction failed (line 4476), chain info failed (line 4478), sum/min/max/average over tuple element (line 4483), select before predicate-driven range (line 3568).
+
+### 6a — Select before predicate-driven range
+
+**Probe** (`/tmp/audit_probes/chain6_6a.das`):
+```das
+return _fold(from_decs_template(type<DecsItem>)._select(_.score)._skip_while(_ < 0).count())
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`select_to_array(source, $(_) { return _.score; });
+var pass_1 <- __::linq`skip_while(pass_0, $(_) { return _ < 0; });
+finalize(pass_0);
+var pass_2 = __::linq`count(pass_1);
+finalize(pass_1);
+return <- pass_2;
+```
+
+**Classification**: FALLS-OFF (default cascade for linq chain; bridge IS unrolled but doesn't connect to the rest).
+
+**Conclusion**: linq_fold.das:3568 — when suffix contains `skip_while` / `take_while`, prefix must be select-free (predicates use source tuple, not projected scalar). User rewrite: drop the `_select`, move comparison into `_skip_while`: `._skip_while(_.score < 0).count()`. Make the rule explicit in `skills/linq.md`. Arm extension would require predicate rewriting through projection.
+
+### 6b — Aggregate over decs
+
+**Probe** (`/tmp/audit_probes/chain6_6b.das`):
+```das
+return _fold(from_decs_template(type<DecsItem>).aggregate(0, $(acc : int; x : auto) => acc + x.score))
+```
+
+**Generated**:
+```das
+return invoke($() {
+    var decs_agg = 0;
+    for_each_archetype(<hash>, <erq>, $(arch) {
+        for (e_score in get_ro(arch, "e_score", type<int>))
+            decs_agg = decs_agg + e_score
+    });
+    return decs_agg;
+})
+```
+
+**Classification**: SPLICE-FIRES (`emit_decs_walk_lane`, Slice 5f).
+
+**Conclusion**: Aggregate is in the `isWalk` set. Bridge fuses into accumulator loop — pruner trimmed to ONLY `e_score` reads (no `e_id`/`e_active`). Best-in-class shape.
+
+### 6c — min_by with where
+
+**Probe** (`/tmp/audit_probes/chain6_6c.das`):
+```das
+let r = _fold(from_decs_template(type<DecsItem>)._where(_.active)._min_by(_.score))
+```
+
+**Generated**:
+```das
+return invoke($() {
+    var decs_first = true;
+    var decs_bkey : int;
+    var decs_belem : tuple<id; score; active>;
+    for_each_archetype(..., $(arch) {
+        for (e_id, e_score, e_active in get_ro(...), get_ro(...), get_ro(...))
+            if (e_active)
+                let decs_key = e_score
+                if (decs_first)
+                    decs_bkey = decs_key
+                    decs_belem = tuple(e_id, e_score, e_active)
+                    decs_first = false
+                elif (decs_key < decs_bkey)
+                    decs_bkey = decs_key
+                    decs_belem = tuple(e_id, e_score, e_active)
+    });
+    return decs_belem;
+})
+```
+
+**Classification**: SPLICE-FIRES (`emit_decs_min_max_by`, streaming single-best state).
+
+**Conclusion**: Canonical streaming-min shape on decs. All three columns read since `min_by` returns the full element.
+
+### 6d — element_at with where
+
+**Probe** (`/tmp/audit_probes/chain6_6d.das`):
+```das
+let r = _fold(from_decs_template(type<DecsItem>)._where(_.active).element_at(3))
+```
+
+**Generated**:
+```das
+return invoke($() {
+    var decs_ec = 0;
+    var decs_found = false;
+    var decs_res : tuple<...>;
+    for_each_archetype_find(..., $(arch) : bool {
+        for (e_id, e_score, e_active in get_ro(...), get_ro(...), get_ro(...))
+            if (e_active)
+                if (decs_ec == 3)
+                    decs_res = tuple(e_id, e_score, e_active)
+                    decs_found = true
+                    return true
+                else { ++decs_ec }
+        return false;
+    });
+    if (!decs_found) panic("element index out of range", ...)
+    return <- decs_res;
+})
+```
+
+**Classification**: SPLICE-FIRES (`emit_decs_element_at`, Slice 5f).
+
+**Conclusion**: `for_each_archetype_find` outer (returns bool to break early across archetypes) + counter inside, then panics if not found. Reference arm.
+
+### 6e — reverse + take + to_array (delegates to plan_decs_reverse)
+
+**Probe** (`/tmp/audit_probes/chain6_6e.das`):
+```das
+return <- _fold(from_decs_template(type<DecsItem>).reverse().take(10).to_array())
+```
+
+**Generated**:
+```das
+return invoke($() {
+    var decs_total : int64 = 0;
+    for_each_archetype(..., $(arch) { decs_total += arch.size; });
+    let decs_actual = (decs_total > 10) ? 10 : decs_total;
+    let decs_skip = decs_total - decs_actual;
+    var decs_buf;
+    if (decs_actual == 0) return <- decs_buf
+    reserve(decs_buf, int(decs_actual));
+    var decs_seen : int64 = 0;
+    for_each_archetype_find(..., $(arch) : bool {
+        if (decs_seen + arch.size <= decs_skip)
+            decs_seen += arch.size; return false
+        var decs_skips = (decs_skip > decs_seen) ? (decs_skip - decs_seen) : 0;
+        for (e_id, e_score, e_active in get_ro(...), get_ro(...), get_ro(...))
+            if (decs_skips > 0) { --decs_skips; continue }
+            else
+                var decs_tup = tuple(e_id, e_score, e_active)
+                push_clone(decs_buf, decs_tup)
+                if (int64(length(decs_buf)) >= decs_actual) break
+        decs_seen += arch.size;
+        return int64(length(decs_buf)) >= decs_actual;
+    });
+    __::linq`reverse_inplace(decs_buf);
+    return <- decs_buf;
+})
+```
+
+**Classification**: SPLICE-FIRES (`plan_decs_reverse` — PR #2834 reverse skip-into-tail pattern).
+
+**Conclusion**: `plan_decs_unroll` does NOT handle `reverse` itself — dispatch happens earlier through `plan_decs_reverse`. 2-pass shape (sum sizes → skip into tail → reverse_inplace) is exactly the PR #2834 win. Dispatch works as designed.
+
+### Chain 6 — follow-up TODOs
+
+- **`select` before `skip_while` / `take_while`**: same root cause as Chain 5 5b (predicate semantics differ pre- vs post-projection). Document canonical order.
+- **sum/min/max/average over tuple element without `_select`**: line 4483 bail is correct but silent — emit a planner diagnostic when `isAccum && selectCount == 0`.
+- **Implicit to_array gate**: line 4493 requires `expr._type.isGoodArrayType`. Failure mode for "no terminator at all" is opaque.
+
+---
+
+## Chain 7 — `plan_zip`
+
+**Accepts**: `zip(srcB) [where_*][select?][skip?][skip_while?][take_while?][take?] [terminator]?` STRICTLY 2-arg zip (linq_fold.das:5395)
+**Common bails**: 3-arg result-selector zip (line 5402), unrecognized intermediate op (line 5528), chained selects (line 5486)
+
+### 7a — `zip(srcB, result_selector)` 3-arg form + `sum()`
+
+**Probe** (`/tmp/audit_probes/chain7_7a.das`):
+```das
+return _fold(each(a) |> zip(each(b), $(x, y : int) => x + y) |> sum())
+```
+
+**Generated**:
+```das
+return invoke($(var source : iterator<int&>) : int {
+    var pass_0 <- __::linq`zip_to_array(source, each(b), $(x,y:int) => x+y);
+    var pass_1 = __::linq`sum(pass_0);
+    finalize(pass_0);
+    return <- pass_1;
+}, each(a))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: The natural "sum of (a[i] op b[i])" — the dot-product idiom — bails at line 5402 because the result-selector lives inside `zip(...)`. To recover splice, rewrite as `zip(b) |> _select(_._0 * _._1) |> sum()` (probe 7d). Either lower the 3-arg form to 2-arg `zip + _select` inside the macro before reaching `plan_zip`, or extend `plan_zip` to peel a 3-arg zip's result_selector into the chain `projection` slot.
+
+### 7b — `zip` + `_order_by` terminator
+
+**Probe** (`/tmp/audit_probes/chain7_7b.das`):
+```das
+return <- _fold(each(a) |> zip(each(b)) |> _order_by(_._0) |> to_array())
+```
+
+**Generated**:
+```das
+return <- invoke($(var source : iterator<int&>) : array<tuple<int;int>> {
+    var pass_0 <- __::linq`zip_to_array(source, each(b));
+    __::linq`order_by_inplace(pass_0, $(_) { return _._0; });
+    return <- pass_0;
+}, each(a))
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: `_order_by` after `zip` is unrecognized intermediate op (line 5528). A targeted "zip-then-order-by-then-take" arm would be the right next splice.
+
+### 7c — `zip` + chained `_select`s
+
+**Probe** (`/tmp/audit_probes/chain7_7c.das`):
+```das
+return <- _fold(each(a) |> zip(each(b)) |> _select(_._0) |> _select(_ * 2) |> to_array())
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`zip_to_array(source, each(b));
+var pass_1 <- __::linq`select(pass_0, $(_) { return _._0; });
+finalize(pass_0);
+var pass_2 <- __::linq`select(pass_1, $(_) { return _ * 2; });
+finalize(pass_1);
+return <- pass_2;
+```
+
+**Classification**: FALLS-OFF — default cascade (3 buffer allocations + 2 finalize calls).
+
+**Conclusion**: Two `_select`s in a row bail at line 5486. Collapse N consecutive `_select`s into a single projection via repeated `peel_lambda_rename_var` + body composition. Same shape unblocks plan_loop_or_count.
+
+### 7d — Baseline: `zip` + `_select` + `sum`
+
+**Probe** (`/tmp/audit_probes/chain7_7d.das`):
+```das
+return _fold(each(a) |> zip(each(b)) |> _select(_._0 * _._1) |> sum())
+```
+
+**Generated**:
+```das
+return invoke($(srcA : array<int>; srcB : array<int>) : int {
+    var acc = 0;
+    for (itA, itB in srcA, srcB)
+        let it : tuple<int;int> = tuple(itA, itB)
+        acc += (it._0 * it._1)
+    return acc;
+}, a, b)
+```
+
+**Classification**: SPLICE-FIRES — inline parallel `for` + accumulator, zero intermediate buffers.
+
+**Conclusion**: User-facing gap: 7d's wording (`zip(b) |> _select(_._0 * _._1) |> sum()`) is strictly less readable than 7a's `zip(b, $(x,y) => x*y) |> sum()` form, yet the latter falls off. Splice ergonomics suffer when the "fast path" requires the awkward spelling.
+
+### Chain 7 — follow-up TODOs
+
+- Pre-lower 3-arg `zip(a, b, sel)` to 2-arg `zip(a, b) |> _select(...)` inside `LinqFold.visit` (or hoist the selector into `projection` directly inside `plan_zip`). Closes 7a.
+- Extend `plan_zip` to accept `_order_by` / `reverse` between zip and a terminator that needs full materialization anyway. Closes 7b and unblocks "top-K of zip" patterns.
+- Collapse N consecutive `_select` projections (line 5486 + plan_loop_or_count's analog) — symmetric with how N consecutive `where_` already compose via `&&`. Closes 7c.
+
+---
+
+## Chain 8 — `plan_decs_join`
+
+**Accepts**: `_join(srcA, srcB, on, into) [count]?` strictly binary, primitive keys, no intermediate chain ops (linq_fold.das:5267)
+**Common bails**: post-join chain op of ANY kind (line 5284), non-primitive key type (lines 5296-5303), keya/keyb untyped (line 5293).
+
+### 8a — `_join` + post-join `_where` + `count`
+
+**Probe** (`/tmp/audit_probes/chain8_8a.das`):
+```das
+return _fold(from_decs_template(type<DecsCar>) |> _join(from_decs_template(type<DecsDealer>),
+                                                         $(l, r) => l.dealer_id == r.id,
+                                                         $(l, r) => (CarName = l.name, DealerName = r.name))
+                                                |> _where(_.CarName != "")
+                                                |> count())
+```
+
+**Generated** (trimmed):
+```das
+var pass_0 <- __::linq`join_to_array(source, invoke($() : iterator<...> {
+    var res : array<...>; for_each_archetype(...) { ... push(...) }
+    return <- to_sequence(res);
+}), keya_block, keyb_block, result_block);
+var pass_1 <- __::linq`where_(pass_0, predicate);
+finalize(pass_0);
+var pass_2 = __::linq`count(pass_1);
+finalize(pass_1);
+```
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: Bails at line 5284 because `_where` sits between `_join` and `count`. Worst-case: full materialization of BOTH dealer and car archetypes into per-iterator buffers before `join_impl` runs, plus a second `where_to_array` pass and a third `count`. Fix: wrap the count-bump in `if (predicate) { ... }` inside the probe loop.
+
+### 8b — `_join` + post-join `_select` + `to_array`
+
+**Probe** (`/tmp/audit_probes/chain8_8b.das`):
+```das
+return <- _fold(from_decs_template(type<DecsCar>) |> _join(from_decs_template(type<DecsDealer>), ..., ...)
+                                                  |> _select(_.CarName)
+                                                  |> to_array())
+```
+
+**Classification**: FALLS-OFF — default cascade (3 buffer allocations).
+
+**Conclusion**: Same bail (line 5284). The natural projection-shaping idiom — produce the full join row then project — is universally faster as inline projection. Fix is symmetric with 8a: accept single trailing `_select` and substitute body into the result lambda position.
+
+### 8c — Composite (tuple) join key
+
+**Probe** (`/tmp/audit_probes/chain8_8c.das`):
+```das
+return _fold(_join(srcA, srcB, $(l, r) => (l.dealer_id, l.id) == (r.region, r.id), ...) |> count())
+```
+
+**Generated** (trimmed): `join_to_array` instantiated with `tuple<int;int>` keys; `unique_key(tuple<int;int>) : string` invoked by `join_impl`.
+
+**Classification**: FALLS-OFF — default cascade.
+
+**Conclusion**: Bails at primitive-key gate (lines 5296-5303): `keyType.baseType == Type.tTuple` not in whitelist. The `_join` macro itself accepts tuple-equi form, so the gate is the ONLY reason this falls off. Two fixes: (a) plumb `unique_key(keyBody)` into probe/insert sites of the splice (matches `join_impl`); or (b) accept tuples-of-primitives directly as `table<tuple<...>; array<...>>` key, since daslang tables hash tuples natively. (b) is cleaner.
+
+### 8d — Baseline: `_join` + `count`
+
+**Generated**:
+```das
+return invoke($() : int {
+    var decs_jcnt = 0;
+    var decs_hash : table<int; array<tuple<id:int;name:string>>>;
+    for_each_archetype(<dealer_hash>, <dealer_erq>, $(arch) {
+        for (dealer_id, dealer_name in get_ro(arch,"dealer_id",type<int>), get_ro(arch,"dealer_name",type<string>))
+            var decs_tup_b = tuple(dealer_id, dealer_name)
+            push_clone(decs_hash[decs_tup_b.id], decs_tup_b)
+    });
+    for_each_archetype(<car_hash>, <car_erq>, $(arch) {
+        for (car_id, car_dealer_id, car_name in get_ro(...) [...3 cols])
+            var decs_tup_a = tuple(car_id, car_dealer_id, car_name)
+            get(decs_hash, decs_tup_a.dealer_id, $(var decs_jarr) {
+                decs_jcnt += length(decs_jarr);
+            })
+    });
+    return decs_jcnt;
+})
+```
+
+**Classification**: SPLICE-FIRES — single hash, two `for_each_archetype` passes, count bumped by bucket-length.
+
+**Conclusion**: Confirms hashed-join splice fires for bench-supported shape. Narrow surface is the issue: any meaningful post-processing reverts to full materialization.
+
+### Chain 8 — follow-up TODOs
+
+- Add a single-trailing-`_where` arm (mirror plan_zip's `whereCond` slot). Closes 8a + C6.
+- Add a single-trailing-`_select` arm: substitute select's lambda body into result-push position. Closes 8b.
+- Accept tuples-of-primitives as keys directly (`table<tuple<int;int>; array<...>>`); cascade non-primitive structs to `unique_key`. Closes 8c.
+- Document the splice's narrow shape in `LinqJoin`'s docstring.
+
+---
+
+## Composition probes
+
+When a user chain combines two splice families, dispatch order (linq_fold.das:5700-5727) claims one and the other op bails the whole arm. Six obvious user-natural compositions, all FALLS-OFF:
+
+### C1 — Distinct + order + take
+
+**Why interesting**: "Top-K most recent distinct users" — both `_distinct_by` and `_order_by` have splice arms, neither tolerates the other op.
+
+**Probe** (`/tmp/audit_probes/comp_C1.das`):
+```das
+return <- _fold(each(items) |> _distinct_by(_.user) |> _order_by(_.ts) |> take(10) |> to_array())
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`distinct_by_to_array(source, $(_) { return _.user; });
+__::linq`order_by_inplace(pass_0, $(_) { return _.ts; });
+__::linq`take_inplace(pass_0, 10);
+return <- pass_0;
+```
+
+**Classification**: FALLS-OFF — `plan_distinct` runs first (dispatch line 5712), sees non-distinct trailing op, returns null; `plan_order_family` runs second, sees `distinct_by` upstream, returns null. Tier-2 cascade.
+
+**Conclusion**: Two splice arms exist but the planner picks neither because each insists on owning the whole chain. Fast shape would be bounded-heap of size 10 keyed on `(seen_users_set, _.ts)` — collect into heap during single source pass, gated by set-insert success. Cross-splice composition is the obvious gap.
+
+### C2 — Group-by + select + order-by + to_array
+
+**Why interesting**: "Brands sorted by frequency" — canonical SQL `GROUP BY ... ORDER BY COUNT(*)`.
+
+**Probe** (`/tmp/audit_probes/comp_C2.das`):
+```das
+return <- _fold(each(items)
+    |> _group_by(_.brand)
+    |> _select((B = _._0, C = _._1 |> count()))
+    |> _order_by(_.C)
+    |> to_array())
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`group_by_lazy_to_array(source, $(_) { return _.brand; });
+var pass_1 <- __::linq`select(pass_0, $(_) { return tuple(_._0, __::linq`count(_._1)); });
+finalize(pass_0);
+__::linq`order_by_inplace(pass_1, $(_) { return _.C; });
+return <- pass_1;
+```
+
+**Classification**: FALLS-OFF — `plan_group_by` bails because trailing op is `_order_by`.
+
+**Conclusion**: `plan_group_by_core` already builds the bucket map directly. Letting `_select` + `_order_by` consume the bucket inside the same emission would give 1 hashmap walk + 1 inplace sort but skip the intermediate `array<tuple<string;array<Item>>>` materialization. Cross-cuts with 7b/C1 observation.
+
+### C3 — Decs join + select + group_by + select
+
+**Why interesting**: "Join cars onto dealers, group by region, count" — universal BI shape on decs.
+
+**Probe** (`/tmp/audit_probes/comp_C3.das`):
+```das
+return <- _fold(_join(decsCars, decsDealers, on=..., into=(Region=r.region, CarName=l.name))
+                |> _group_by(_.Region)
+                |> _select((R = _._0, N = _._1 |> count()))
+                |> to_array())
+```
+
+**Classification**: FALLS-OFF — `plan_decs_join` bails at 5284 (trailing chain ops); `plan_decs_group_by` requires a decs source on top, not a `_join` invoke; default cascade builds dealer-array → join-array → group-map → select-array. Three intermediate allocations.
+
+**Conclusion**: This is the "killer demo" composition. The structural fix is to refactor `plan_decs_join` so its emission integrates with `plan_decs_group_by`'s bucket-fill — instead of `push_clone(buf, result_lam(...))` in the probe loop, emit `bucket[keyExpr] |> push_clone(...)` directly. Largest single architectural change suggested by the audit.
+
+### C4 — Zip + reverse + to_array
+
+**Why interesting**: "Pair two parallel sequences, walk backward" — natural for time-reversed analyses.
+
+**Probe** (`/tmp/audit_probes/comp_C4.das`):
+```das
+return <- _fold(each(a) |> zip(each(b)) |> reverse() |> to_array())
+```
+
+**Generated**:
+```das
+var pass_0 <- __::linq`zip_to_array(source, each(b));
+__::linq`reverse_inplace(pass_0);
+return <- pass_0;
+```
+
+**Classification**: FALLS-OFF — `plan_zip` lists `reverse` as unrecognized op (line 5528); `plan_reverse` doesn't recognize a 2-source zip head.
+
+**Conclusion**: Cheapest fall-off in absolute cost (1 buffer + 1 inplace), but trivial to absorb: zip's natural emission can be `for i in length downto 0` parallel `for` — 1-line change when `reverse` is the only intermediate. Bundle with the 7b TODO.
+
+### C5 — Order-by + distinct + take + to_array
+
+**Why interesting**: Variant on C1 with order-first.
+
+**Probe** (`/tmp/audit_probes/comp_C5.das`):
+```das
+return <- _fold(each(items) |> _order_by(_.score) |> distinct() |> take(10) |> to_array())
+```
+
+**Classification**: FALLS-OFF — `plan_order_family` doesn't recognize `distinct`; `plan_distinct` doesn't recognize `_order_by` upstream.
+
+**Conclusion**: Identical reasoning to C1, operator-order swapped. Confirms the "two splice families never cooperate" pattern is symmetric — not a property of which arm runs first.
+
+### C6 — Decs_join + post-join filter
+
+**Why interesting**: Composition view of 8a — confirms the failure mode is the same whether reached via "splice arm couldn't extend" or "two arms collide".
+
+**Probe** (`/tmp/audit_probes/comp_C6.das`): same as 8a.
+
+**Classification**: FALLS-OFF — same root cause as 8a (linq_fold.das:5284).
+
+**Conclusion**: Listed here to make the symmetry explicit. Closing 8a TODO closes this row too.
+
+### Composition — cross-cutting observation
+
+Five of six composition probes (C1, C2, C3, C5, C6) are blocked by the same architectural pattern: each splice arm currently requires `flatten_linq` to yield a contiguous run of recognized ops, with the planner pipeline trying one arm at a time and falling to tier-2 the moment ANY arm refuses. There is no cross-arm composition mechanism. The highest-leverage next investment isn't another arm — it's a "compose-aware" planner step that walks the call chain once, attributes each op to a candidate arm (or "boundary op" like `_where`/`_select` that any arm can host), and stitches the emissions. C4 is the lone outlier where one arm could absorb the second op trivially; the other five point at the same missing infrastructure.
+
+---
+
+## Cross-cutting findings
+
+Synthesizing the per-chain TODOs into prioritized themes:
+
+### Theme 1 — Terminal `_select` extension (HIGH impact, MEDIUM effort)
+
+Recurs in: **chains 1, 2, 7, 8**. Almost every arm that produces a buffer or holds a bounded-K state could accept a terminal `_select` that projects during the emission/return — currently bails almost universally. The bounded-heap, R5 buffer, and join probe-loop arms all hold `≤K` or per-element values they then need to discard or project; absorbing the projection is a small qmacro splice each.
+
+Specific arms to extend:
+- `plan_order_family` line 1234 + `plan_decs_order_family` line 4547 — accept terminal `_select` after `take(N)` / `first` / `first_or_default`. Closes 1a, 1e.
+- `plan_reverse` line 1764 + `plan_decs_reverse` line 4802 — accept terminal `_select` after `reverse [take(N)]`. Closes 2c, 2e.
+- `plan_decs_join` line 5267 — accept single trailing `_select` substituting into result lambda. Closes 8b.
+- `plan_zip` line 5395 — pre-lower 3-arg `zip(a, b, sel)` to 2-arg `zip(a, b) |> _select(sel)`. Closes 7a.
+
+### Theme 2 — Trailing `_where` / HAVING (HIGH impact, MEDIUM effort)
+
+Recurs in: **chains 4, 5, 8**. The "trailing post-aggregate filter" idiom is universal in SQL-like usage and falls off whenever it appears in a splice arm:
+- `plan_group_by_core` — peel trailing `_where` to synthetic `having_` slot (closes 4a, 4e).
+- `plan_decs_join` — accept single trailing `_where` mirroring plan_zip's `whereCond` (closes 8a, C6).
+- `plan_loop_or_count` — counter-lane with take-cap that ticks BEFORE the where filter (closes 5c).
+
+### Theme 3 — Cross-arm composition (HIGHEST impact, LARGE effort)
+
+Recurs in: **5 of 6 composition probes** (C1, C2, C3, C5, C6). The planner pipeline tries arms in order and fails to tier-2 if any arm refuses; there is no mechanism for two arms to share a chain. The structural fix is a "compose-aware" planner step that:
+1. Walks the call chain once
+2. Attributes each call to a candidate arm or to a "boundary op" (`_where` / `_select` are universal)
+3. Stitches arm emissions at boundary points (e.g. plan_decs_join emits `bucket[keyExpr] |> push_clone(...)` directly into plan_decs_group_by's bucket-fill loop)
+
+This is the largest architectural change suggested by the audit but unlocks the most common BI-style queries. Closes C1, C2, C3, C5.
+
+### Theme 4 — 2-arg terminator predicates (LOW effort, MEDIUM impact)
+
+Recurs in: **chain 3, chain 5, chain 7**. Several splice arms only accept 1-arg `count()` / `long_count()` / `sum()` etc., and silently bail when the user adds a predicate. The extension is trivial: emit `if (p(it)) cnt++` at the existing increment site.
+
+- `plan_distinct` line 1953, `plan_decs_distinct` line 5057 — accept 2-arg `count(p)` / `long_count(p)`. Closes 3c.
+- `plan_zip` lines 5412-5436 — same shape. (Not probed explicitly but observed in agent 1 inventory.)
+- `plan_decs_unroll` line 4458 — same shape.
+
+### Theme 5 — `_order_by(k).reverse()` → `_order_by_descending(k)` normalization
+
+Recurs in: **chains 1, 2**. Pure rewrite at the macro level, before any planner sees the chain. Closes 1b, 2b. Trivial to implement; sized like a half-day.
+
+### Theme 6 — Decs-bridge double penalty (MEDIUM impact, LOW effort)
+
+Whenever a `plan_decs_*` arm bails, the `from_decs_template` bridge degenerates to full `for_each_archetype` materialization into a temp `res` array, which is then wrapped in `to_sequence` for the array-side cascade. This costs an EXTRA allocation on top of whatever cascade follows.
+
+Fix: in `FromDecsMacro` (or at the `_fold` dispatch point), emit a diagnostic (`compile_warning` style) when the bridge survives without any decs-side splice arm claiming it. Doesn't fix the underlying chain but tells the user where the perf cliff is.
+
+### Theme 7 — Chained `_select` collapse
+
+Recurs in: **chain 5 (5b), chain 7 (7c)**. N consecutive `_select` projections should collapse into a single projection via repeated `peel_lambda_rename_var` + body composition — symmetric with how N consecutive `_where` already compose via `&&`. Same mechanism unblocks both plan_loop_or_count and plan_zip.
+
+### Theme 8 — Specialized fusion arms (low priority)
+
+Recurs in: **chains 2, 3, C4**. Several "two specific arms could fuse" cases:
+- `reverse + distinct_by` (chain 2a) — single walk retaining LAST element per key.
+- `_distinct_by(keyFn) + _order_by(otherKey)` (chain 3b) — hash-track + bounded sort walk.
+- `zip + reverse` (C4) — emit `for i in length downto 0`.
+
+Each is small and self-contained. Lower priority than themes 1-3 but cheap follow-ups when in the area.
+
+### Out-of-scope observations
+
+- **`linq_fold_patterns.rst` cross-check**: this audit did NOT systematically verify that every "splice arm exists" claim in the RST page is reachable via the canonical chain shape. A future doc-only PR should walk the RST table row-by-row and probe each shape (most are covered above; rows not represented are likely doc-only fictions).
+- **JIT verification**: all probes here are INTERP-only. The JIT lane may behave differently — e.g. the bounded-heap arm's `spliced_push_heap` may or may not optimize well under llvm_jit.
+- **Bench impact quantification**: the cross-cutting findings are ordered by "how natural is the user phrasing" + "how expensive is the cascade", not by measured ns/op. A follow-up bench round (writing N FALLS-OFF chains as new benches, measuring fall-off cost) would sharpen the prioritization.
+
+---
+
+## How to re-run
+
+The audit is reproducible. Per-probe workflow:
+
+```bash
+# Single probe: compile + dump
+mcp__daslang__compile_check /tmp/audit_probes/chain1_1a.das
+mcp__daslang__ast_dump file=/tmp/audit_probes/chain1_1a.das function=probe_1a mode=source
+
+# Whole audit: compile all probes
+for f in /tmp/audit_probes/*.das; do
+    mcp__daslang__compile_check "$f"
+done
+```
+
+To re-create the probe set after deleting `/tmp/audit_probes/`, follow each probe's "Probe" code block — each is self-contained (`options gen2` + `require` lines + struct + one `[export] def probe_NX` + stub `def main(){}`). The audit doesn't depend on any fixture outside the probe files themselves.
+
+Classification rules:
+- `for_each_archetype` + inline state (heap, accumulator, counter, table) → **SPLICE-FIRES**
+- `__::linq\`*_to_array\`` / `__::linq\`*_inplace\`` / cascade of `pass_0 → pass_1 → ...` → **FALLS-OFF**
+- Direct `min_by_impl` / `top_n_by_impl` invocation without inlining → **BAILS-TO tier-2**
diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md
index 4020ed60f..999c7add9 100644
--- a/benchmarks/sql/results.md
+++ b/benchmarks/sql/results.md
@@ -1,6 +1,6 @@
 # Benchmarks — SQL / Array / Decs comparison
 
-Generated 2026-05-23 from `62336a4a7` (PR for `plan_decs_join`).
+Generated 2026-05-24 from `4b13eed9a` (Theme 2 — trailing-`_where` extension).
 Fixture size: n = 100 000 (cars), 100 dealers, 5 brands. Each row is
 one bench family in `benchmarks/sql/`; columns are nanoseconds per
 logical operation. `—` marks an intentionally absent lane — see
@@ -26,114 +26,113 @@ before the timer resolution can measure them — they should be read as
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | Decs vs Array |
 |---|---:|---:|---:|---:|
-| `aggregate_match` | 35.3 | 5.9 | 5.8 | 0.98× |
-| `all_match` | 28.1 | 3.6 | 3.5 | 0.97× |
+| `aggregate_match` | 35.1 | 6.0 | 5.8 | 0.97× |
+| `all_match` | 27.8 | 3.6 | 3.5 | 0.97× |
 | `any_match` | 0.00 | 0.00 | 0.00 | — |
-| `average_aggregate` | 29.9 | 5.9 | 8.8 | 1.49× |
-| `bare_order_where` | 278.2 | 118.4 | 126.6 | 1.07× |
-| `chained_where` | 38.5 | 6.7 | 6.7 | 1.00× |
+| `average_aggregate` | 29.9 | 6.4 | 10.4 | 1.62× |
+| `bare_order_where` | 278.4 | 119.0 | 126.8 | 1.07× |
+| `chained_where` | 36.0 | 6.7 | 6.7 | 1.00× |
 | `contains_match` | 0.00 | 2.2 | 1.4 | 0.64× |
-| `count_aggregate` | 29.1 | 4.1 | 4.2 | 1.02× |
-| `distinct_by_count` | 40.6 | 15.8 | 16.0 | 1.01× |
-| `distinct_count` | 41.1 | 16.1 | 16.0 | 0.99× |
+| `count_aggregate` | 29.2 | 4.1 | 4.1 | 1.00× |
+| `distinct_by_count` | 41.0 | 15.7 | 16.1 | 1.03× |
+| `distinct_count` | 41.0 | 16.1 | 16.0 | 0.99× |
 | `distinct_take` | 0.00 | 0.00 | 0.00 | — |
 | `element_at_match` | 0.00 | 0.00 | 0.00 | — |
 | `first_match` | 0.00 | 0.00 | 0.00 | — |
 | `first_or_default_match` | 0.00 | 0.00 | 0.00 | — |
-| `groupby_average` | 174.3 | 30.3 | 30.2 | 1.00× |
-| `groupby_count` | 144.3 | 19.4 | 19.3 | 0.99× |
-| `groupby_first` | — | 20.0 | 19.3 | 0.97× |
-| `groupby_having_count` | 142.6 | 19.3 | 19.4 | 1.01× |
-| `groupby_having_hidden_sum` | 175.7 | 24.5 | 24.1 | 0.98× |
-| `groupby_max` | 176.7 | 25.0 | 25.4 | 1.02× |
-| `groupby_min` | 175.7 | 25.1 | 25.4 | 1.01× |
-| `groupby_multi_reducer` | 191.4 | 33.7 | 32.7 | 0.97× |
-| `groupby_select_sum` | 207.1 | 36.8 | 36.7 | 1.00× |
-| `groupby_sum` | 172.4 | 18.8 | 18.8 | 1.00× |
-| `groupby_where_count` | 75.7 | 14.7 | 15.0 | 1.02× |
-| `groupby_where_sum` | 86.7 | 14.3 | 14.7 | 1.03× |
-| `indexed_lookup` | 1454.5 | 204673.2 | 472.2 | 0.00× |
-| `join_count` | 38.0 | 121.2 | 64.0 | 0.53× |
+| `groupby_average` | 171.0 | 30.3 | 30.5 | 1.01× |
+| `groupby_count` | 143.6 | 19.8 | 19.5 | 0.98× |
+| `groupby_first` | — | 18.6 | 19.3 | 1.04× |
+| `groupby_having_count` | 142.7 | 19.2 | 19.3 | 1.01× |
+| `groupby_having_hidden_sum` | 177.5 | 24.5 | 24.1 | 0.98× |
+| `groupby_max` | 176.1 | 25.2 | 25.4 | 1.01× |
+| `groupby_min` | 176.4 | 25.1 | 25.4 | 1.01× |
+| `groupby_multi_reducer` | 191.6 | 32.5 | 32.6 | 1.00× |
+| `groupby_select_sum` | 210.3 | 36.8 | 36.5 | 0.99× |
+| `groupby_sum` | 173.0 | 18.8 | 18.8 | 1.00× |
+| `groupby_where_count` | 75.6 | 14.8 | 15.0 | 1.01× |
+| `groupby_where_sum` | 86.9 | 14.3 | 14.8 | 1.03× |
+| `indexed_lookup` | 1499.4 | 204476.6 | 495.4 | 0.00× |
+| `join_count` | 38.2 | 122.4 | 64.1 | 0.52× |
 | `last_match` | 0.00 | 5.9 | 14.0 | 2.37× |
-| `long_count_aggregate` | 29.3 | 4.2 | 4.2 | 1.00× |
-| `max_aggregate` | 30.8 | 6.1 | 6.9 | 1.13× |
-| `min_aggregate` | 30.4 | 6.2 | 6.9 | 1.11× |
-| `order_take_desc` | 38.1 | 15.9 | 20.1 | 1.26× |
+| `long_count_aggregate` | 29.6 | 4.3 | 4.1 | 0.95× |
+| `max_aggregate` | 30.5 | 6.1 | 6.9 | 1.13× |
+| `min_aggregate` | 30.4 | 6.4 | 6.9 | 1.08× |
+| `order_take_desc` | 37.8 | 16.0 | 20.1 | 1.26× |
 | `reverse_take` | 0.10 | 0.00 | 9.3 | — |
-| `select_count` | 0.10 | 0.00 | 2.2 | — |
-| `select_where` | 194.2 | 11.1 | 19.5 | 1.76× |
-| `select_where_count` | 32.5 | 5.2 | 7.4 | 1.42× |
-| `select_where_order_take` | 36.4 | 12.2 | 14.9 | 1.22× |
-| `select_where_sum` | 37.0 | 7.5 | 7.5 | 1.00× |
+| `select_count` | 0.10 | 0.00 | 2.9 | — |
+| `select_where` | 193.4 | 11.2 | 22.2 | 1.98× |
+| `select_where_count` | 32.6 | 5.2 | 7.4 | 1.42× |
+| `select_where_order_take` | 36.3 | 12.2 | 14.8 | 1.21× |
+| `select_where_sum` | 37.1 | 7.8 | 7.5 | 0.96× |
 | `single_match` | 0.00 | 2.9 | 5.5 | 1.90× |
 | `skip_take` | 0.50 | 0.10 | 0.20 | 2.00× |
-| `skip_while_match` | 3.4 | 5.3 | 5.3 | 1.00× |
-| `sort_first` | 37.9 | 11.1 | 13.4 | 1.21× |
-| `sort_take` | 38.1 | 16.4 | 20.3 | 1.24× |
-| `sum_aggregate` | 30.0 | 2.2 | 2.1 | 0.95× |
-| `sum_where` | 32.8 | 4.3 | 4.3 | 1.00× |
-| `take_count` | 3.6 | 0.20 | 0.40 | 2.00× |
+| `skip_while_match` | 3.5 | 5.3 | 5.3 | 1.00× |
+| `sort_first` | 37.9 | 11.5 | 13.5 | 1.17× |
+| `sort_take` | 38.1 | 16.4 | 20.4 | 1.24× |
+| `sum_aggregate` | 30.1 | 2.2 | 2.1 | 0.95× |
+| `sum_where` | 33.0 | 4.3 | 4.3 | 1.00× |
+| `take_count` | 3.7 | 0.20 | 0.40 | 2.00× |
 | `take_count_filtered` | — | 0.20 | 0.20 | 1.00× |
 | `take_sum_aggregate` | — | 0.10 | 0.10 | 1.00× |
-| `take_while_match` | 7.9 | 2.5 | 2.5 | 1.00× |
-| `to_array_filter` | 70.1 | 11.7 | 11.9 | 1.02× |
-| `zip_dot_product` | — | 8.1 | 4.8 | 0.59× |
+| `take_while_match` | 8.0 | 2.5 | 2.5 | 1.00× |
+| `to_array_filter` | 70.3 | 11.8 | 11.8 | 1.00× |
+| `zip_dot_product` | — | 8.0 | 4.8 | 0.60× |
 
 ## JIT
-
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | Decs vs Array |
 |---|---:|---:|---:|---:|
-| `aggregate_match` | 34.4 | 0.40 | 0.70 | 1.75× |
-| `all_match` | 27.4 | 0.30 | 0.20 | 0.67× |
+| `aggregate_match` | 34.3 | 0.40 | 0.70 | 1.75× |
+| `all_match` | 27.4 | 0.40 | 0.20 | 0.50× |
 | `any_match` | 0.00 | 0.00 | 0.00 | — |
-| `average_aggregate` | 29.7 | 1.0 | 3.6 | 3.60× |
-| `bare_order_where` | 185.9 | 33.7 | 35.0 | 1.04× |
-| `chained_where` | 35.9 | 0.60 | 0.80 | 1.33× |
+| `average_aggregate` | 29.8 | 1.0 | 3.6 | 3.60× |
+| `bare_order_where` | 187.4 | 33.8 | 35.0 | 1.04× |
+| `chained_where` | 36.1 | 0.60 | 0.80 | 1.33× |
 | `contains_match` | 0.00 | 0.20 | 0.10 | 0.50× |
-| `count_aggregate` | 29.0 | 0.40 | 0.60 | 1.50× |
+| `count_aggregate` | 29.1 | 0.40 | 0.60 | 1.50× |
 | `distinct_by_count` | 40.8 | 2.1 | 2.1 | 1.00× |
-| `distinct_count` | 41.0 | 2.1 | 2.1 | 1.00× |
+| `distinct_count` | 41.2 | 2.1 | 2.1 | 1.00× |
 | `distinct_take` | 0.00 | 0.00 | 0.00 | — |
 | `element_at_match` | 0.00 | 0.00 | 0.00 | — |
 | `first_match` | 0.00 | 0.00 | 0.00 | — |
 | `first_or_default_match` | 0.00 | 0.00 | 0.00 | — |
-| `groupby_average` | 170.7 | 2.6 | 2.9 | 1.12× |
-| `groupby_count` | 141.1 | 2.4 | 2.5 | 1.04× |
+| `groupby_average` | 171.0 | 2.6 | 2.9 | 1.12× |
+| `groupby_count` | 142.0 | 2.4 | 2.5 | 1.04× |
 | `groupby_first` | — | 2.2 | 3.1 | 1.41× |
-| `groupby_having_count` | 147.0 | 2.4 | 2.5 | 1.04× |
-| `groupby_having_hidden_sum` | 174.1 | 2.5 | 2.8 | 1.12× |
-| `groupby_max` | 172.0 | 2.4 | 2.7 | 1.13× |
-| `groupby_min` | 174.2 | 2.4 | 2.7 | 1.13× |
-| `groupby_multi_reducer` | 191.1 | 2.7 | 3.0 | 1.11× |
-| `groupby_select_sum` | 198.8 | 3.2 | 3.7 | 1.16× |
-| `groupby_sum` | 173.6 | 2.4 | 2.7 | 1.13× |
-| `groupby_where_count` | 75.6 | 1.7 | 1.8 | 1.06× |
-| `groupby_where_sum` | 86.8 | 1.7 | 1.8 | 1.06× |
-| `indexed_lookup` | 1266.6 | 36139.0 | 104.1 | 0.00× |
-| `join_count` | 38.0 | 36.2 | 13.3 | 0.37× |
-| `last_match` | 0.00 | 0.60 | 1.4 | 2.33× |
-| `long_count_aggregate` | 29.3 | 0.40 | 0.60 | 1.50× |
-| `max_aggregate` | 30.6 | 0.60 | 0.50 | 0.83× |
-| `min_aggregate` | 30.7 | 0.60 | 0.50 | 0.83× |
-| `order_take_desc` | 37.8 | 0.70 | 1.4 | 2.00× |
+| `groupby_having_count` | 141.3 | 2.4 | 2.5 | 1.04× |
+| `groupby_having_hidden_sum` | 175.4 | 2.5 | 2.8 | 1.12× |
+| `groupby_max` | 172.6 | 2.4 | 2.7 | 1.13× |
+| `groupby_min` | 173.8 | 2.4 | 2.7 | 1.13× |
+| `groupby_multi_reducer` | 190.9 | 2.7 | 3.0 | 1.11× |
+| `groupby_select_sum` | 207.6 | 3.2 | 3.7 | 1.16× |
+| `groupby_sum` | 170.5 | 2.4 | 2.7 | 1.13× |
+| `groupby_where_count` | 76.1 | 1.7 | 1.8 | 1.06× |
+| `groupby_where_sum` | 87.0 | 1.7 | 1.9 | 1.12× |
+| `indexed_lookup` | 1258.1 | 35549.6 | 103.3 | 0.00× |
+| `join_count` | 37.9 | 36.1 | 13.4 | 0.37× |
+| `last_match` | 0.00 | 0.50 | 1.4 | 2.80× |
+| `long_count_aggregate` | 36.6 | 0.40 | 0.70 | 1.75× |
+| `max_aggregate` | 48.1 | 0.70 | 0.50 | 0.71× |
+| `min_aggregate` | 31.7 | 0.70 | 0.50 | 0.71× |
+| `order_take_desc` | 37.9 | 0.70 | 1.4 | 2.00× |
 | `reverse_take` | 0.00 | 0.00 | 1.1 | — |
 | `select_count` | 0.10 | 0.00 | 0.00 | — |
-| `select_where` | 105.6 | 4.1 | 5.5 | 1.34× |
+| `select_where` | 105.6 | 4.7 | 5.5 | 1.17× |
 | `select_where_count` | 32.4 | 0.40 | 0.60 | 1.50× |
 | `select_where_order_take` | 36.4 | 0.70 | 1.4 | 2.00× |
-| `select_where_sum` | 36.9 | 0.50 | 0.60 | 1.20× |
+| `select_where_sum` | 36.8 | 0.50 | 0.60 | 1.20× |
 | `single_match` | 0.00 | 0.40 | 1.1 | 2.75× |
 | `skip_take` | 0.30 | 0.00 | 0.00 | — |
-| `skip_while_match` | 3.5 | 0.40 | 0.40 | 1.00× |
-| `sort_first` | 37.5 | 0.40 | 1.3 | 3.25× |
-| `sort_take` | 38.0 | 0.70 | 1.4 | 2.00× |
-| `sum_aggregate` | 30.3 | 0.40 | 0.30 | 0.75× |
+| `skip_while_match` | 3.4 | 0.40 | 0.40 | 1.00× |
+| `sort_first` | 37.4 | 0.40 | 1.3 | 3.25× |
+| `sort_take` | 37.9 | 0.70 | 1.4 | 2.00× |
+| `sum_aggregate` | 29.9 | 0.40 | 0.40 | 1.00× |
 | `sum_where` | 33.0 | 0.40 | 0.60 | 1.50× |
 | `take_count` | 1.8 | 0.10 | 0.10 | 1.00× |
 | `take_count_filtered` | — | 0.00 | 0.00 | — |
 | `take_sum_aggregate` | — | 0.00 | 0.00 | — |
 | `take_while_match` | 8.0 | 0.20 | 0.30 | 1.50× |
-| `to_array_filter` | 48.3 | 3.2 | 3.4 | 1.06× |
+| `to_array_filter` | 48.5 | 3.3 | 3.4 | 1.03× |
 | `zip_dot_product` | — | 0.50 | 0.50 | 1.00× |
 
 ## Notes on missing lanes (the `—` cells)
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 89504c0cd..0faf8bf21 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -664,6 +664,7 @@ def private emit_accumulator_lane(
                                   var topExprs : array<Expression?>;
                                   var projection : Expression?;
                                   var whereCond : Expression?;
+                                  var postTakeWhereCond : Expression?;
                                   var intermediateBinds : array<Expression?>;
                                   var preCondStmts : array<Expression?>;
                                   var elementType : TypeDeclPtr;
@@ -745,6 +746,12 @@ def private emit_accumulator_lane(
     }
     perMatchStmts |> push_from <| build_accumulator_perelement_stmts(opName, accName, valBindName, firstName, cntName, valueExpr, workhorse, isDoubleAccType)
     prepend_binds(perMatchStmts, intermediateBinds)
+    // Theme 2 5c: post-take where wraps the per-match work (acc++ / += / cmp+update) BEFORE the take cap so the cap still ticks unconditionally per iteration.
+    if (postTakeWhereCond != null) {
+        var gated = wrap_with_condition(stmts_to_expr(perMatchStmts), postTakeWhereCond)
+        perMatchStmts |> clear
+        perMatchStmts |> push(gated)
+    }
     wrap_with_ranges(perMatchStmts, skipExpr, takeExpr, skipWhileCond, takeWhileCond, names)
     var loopBody = prepend_precond(wrap_with_condition(stmts_to_expr(perMatchStmts), whereCond), preCondStmts)
     // Collect all body statements into one list so they share scope when spliced via $b.
@@ -789,6 +796,7 @@ def private emit_early_exit_lane(
                                  var topExprs : array<Expression?>;
                                  var projection : Expression?;
                                  var whereCond : Expression?;
+                                 var postTakeWhereCond : Expression?;
                                  var intermediateBinds : array<Expression?>;
                                  var preCondStmts : array<Expression?>;
                                  var elementType : TypeDeclPtr;
@@ -1088,6 +1096,12 @@ def private emit_early_exit_lane(
         return null
     }
     prepend_binds(perMatchStmts, intermediateBinds)
+    // Theme 2 5c: post-take where wraps the per-match work BEFORE the take cap so the cap still ticks unconditionally per iteration.
+    if (postTakeWhereCond != null) {
+        var gated = wrap_with_condition(stmts_to_expr(perMatchStmts), postTakeWhereCond)
+        perMatchStmts |> clear
+        perMatchStmts |> push(gated)
+    }
     wrap_with_ranges(perMatchStmts, skipExpr, takeExpr, skipWhileCond, takeWhileCond, names)
     var loopBody = prepend_precond(wrap_with_condition(stmts_to_expr(perMatchStmts), whereCond), preCondStmts)
     // Single-$b body so all stmts (skip/take counters + prelude + for + tail) share scope
@@ -1243,6 +1257,8 @@ def private plan_order_family(var expr : Expression?) : Expression? {
     var firstName : string
     var firstDefaultExpr : Expression?
     var hasOrder = false
+    var selectLam : Expression?
+    var selectElemType : TypeDeclPtr
     let at = calls[0]._0.at
     let itName = qn("it", at)
     for (i in 0 .. length(calls)) {
@@ -1274,13 +1290,20 @@ def private plan_order_family(var expr : Expression?) : Expression? {
             if (arg == null || arg._type == null || arg._type.baseType != Type.tInt) return null
             takeExpr = clone_expression(arg)
         } elif (name == "first" || name == "first_or_default") {
-            // order + first → min/max (O(N) instead of sort + index). Must be terminal.
+            // order + first → min/max (O(N) instead of sort + index). Must be terminal (no select after).
             if (!hasOrder || takeExpr != null || firstName != "" || i != length(calls) - 1) return null
             firstName = name
             if (name == "first_or_default") {
                 if ((cll._0.arguments |> length) < 2) return null
                 firstDefaultExpr = clone_expression(cll._0.arguments[1])
             }
+        } elif (name == "select") {
+            // Terminal _select after take/first: project at return, heap cmp stays on source type.
+            if (i != length(calls) - 1 || !hasOrder
+                    || cll._0._type == null || cll._0._type.firstType == null) return null
+            selectLam = cll._0.arguments[1]
+            if (selectLam == null) return null
+            selectElemType = clone_type(cll._0._type.firstType)
         } else {
             return null
         }
@@ -1299,6 +1322,8 @@ def private plan_order_family(var expr : Expression?) : Expression? {
     // Streaming-min / bounded-heap fast paths (mirror of plan_decs_order_family). When the key is inline-able, skip the materialize-all + min_by/top_n* dispatch in favor of a per-walk state (single best for first[_or_default], heap of size N for take). For first[_or_default]: avoids the per-element `invoke(keyLambda, x)` cost in min_by_impl (~28 ns/op win on 100K-row sort_first). For take(N): avoids materializing the full filtered set before top_n_by (~7-9 ns/op win on sort_take / select_where_order_take).
     let useBoundedHeap = takeExpr != null && inlineCmp != null && firstName == ""
     let useStreamingMin = firstName != "" && inlineCmp != null
+    // Terminal _select only splices on inline-cmp / where_+order paths; direct calls would re-emit the cascade.
+    if (selectLam != null && !useStreamingMin && !useBoundedHeap && whereCond == null) return null
     if (useStreamingMin) {
         let bestName = qn("order_best", at)
         let seenName = qn("order_seen", at)
@@ -1325,27 +1350,43 @@ def private plan_order_family(var expr : Expression?) : Expression? {
             }
         }
         var emission : Expression?
+        let outElemType = (selectLam != null) ? selectElemType : elemType
         if (firstName == "first") {
-            emission = qmacro(invoke($($i(srcName) : $t(srcParamType)) : $t(elemType) {
+            var firstRetExpr : Expression?
+            if (selectLam != null) {
+                firstRetExpr = peel_lambda_replace_var(selectLam, qmacro($i(bestName)))
+            } else {
+                firstRetExpr = qmacro($i(bestName))
+            }
+            emission = qmacro(invoke($($i(srcName) : $t(srcParamType)) : $t(outElemType) {
                 var $i(bestName) = default<$t(elemType)>
                 var $i(seenName) = false
                 for ($i(itName) in $i(srcName)) {
                     $e(perElement)
                 }
                 panic("sequence contains no elements") if (!$i(seenName))
-                return $i(bestName)
+                return $e(firstRetExpr)
             }, $e(topExpr)))
         } else {
             let dBindName = qn("order_d", at)
-            emission = qmacro(invoke($($i(srcName) : $t(srcParamType)) : $t(elemType) {
+            var bestRetExpr : Expression?
+            var dRetExpr : Expression?
+            if (selectLam != null) {
+                bestRetExpr = peel_lambda_replace_var(selectLam, qmacro($i(bestName)))
+                dRetExpr = peel_lambda_replace_var(selectLam, qmacro($i(dBindName)))
+            } else {
+                bestRetExpr = qmacro($i(bestName))
+                dRetExpr = qmacro($i(dBindName))
+            }
+            emission = qmacro(invoke($($i(srcName) : $t(srcParamType)) : $t(outElemType) {
                 let $i(dBindName) = $e(firstDefaultExpr)
                 var $i(bestName) = default<$t(elemType)>
                 var $i(seenName) = false
                 for ($i(itName) in $i(srcName)) {
                     $e(perElement)
                 }
-                return $i(bestName) if ($i(seenName))
-                return $i(dBindName)
+                return $e(bestRetExpr) if ($i(seenName))
+                return $e(dRetExpr)
             }, $e(topExpr)))
         }
         return finalize_invoke(emission, at)
@@ -1378,16 +1419,39 @@ def private plan_order_family(var expr : Expression?) : Expression? {
             }
         }
         // No `reserve(takeN)` on the bounded buf — matches the upstream top_n_by_with_cmp iterator-variant policy (linq.das:482-484). Caller may pass takeN >> actual source size, so pre-reserving N risks a large upfront allocation for no win.
-        var emission : Expression? = qmacro(invoke($($i(srcName) : $t(srcParamType)) : array<$t(bufElemType)> {
-            let $i(takeNName) = $e(takeExpr)
-            var $i(bhBufName) : array<$t(bufElemType)>
-            return <- $i(bhBufName) if ($i(takeNName) <= 0)
-            for ($i(itName) in $i(srcName)) {
-                $e(perElement)
-            }
-            _::order_inplace($i(bhBufName), $e(inlineCmp))
-            return <- $i(bhBufName)
-        }, $e(topExpr)))
+        var emission : Expression?
+        if (selectLam != null) {
+            // Terminal _select projects ≤K heap survivors at return (heap holds raw type for cmp).
+            let outBufName = qn("order_proj_buf", at)
+            let elemName = qn("order_proj_e", at)
+            var projBody = peel_lambda_replace_var(selectLam, qmacro($i(elemName)))
+            emission = qmacro(invoke($($i(srcName) : $t(srcParamType)) : array<$t(selectElemType)> {
+                let $i(takeNName) = $e(takeExpr)
+                var $i(bhBufName) : array<$t(bufElemType)>
+                var $i(outBufName) : array<$t(selectElemType)>
+                return <- $i(outBufName) if ($i(takeNName) <= 0)
+                for ($i(itName) in $i(srcName)) {
+                    $e(perElement)
+                }
+                _::order_inplace($i(bhBufName), $e(inlineCmp))
+                $i(outBufName) |> reserve(length($i(bhBufName)))
+                for ($i(elemName) in $i(bhBufName)) {
+                    $i(outBufName) |> push_clone($e(projBody))
+                }
+                return <- $i(outBufName)
+            }, $e(topExpr)))
+        } else {
+            emission = qmacro(invoke($($i(srcName) : $t(srcParamType)) : array<$t(bufElemType)> {
+                let $i(takeNName) = $e(takeExpr)
+                var $i(bhBufName) : array<$t(bufElemType)>
+                return <- $i(bhBufName) if ($i(takeNName) <= 0)
+                for ($i(itName) in $i(srcName)) {
+                    $e(perElement)
+                }
+                _::order_inplace($i(bhBufName), $e(inlineCmp))
+                return <- $i(bhBufName)
+            }, $e(topExpr)))
+        }
         if (needIterWrap) {
             emission = qmacro($e(emission).to_sequence_move())
         }
@@ -1490,6 +1554,13 @@ def private plan_order_family(var expr : Expression?) : Expression? {
             $e(loopBody)
         }
     }
+    // Terminal _select projects at return; buffer/scalar carries source type so cmp/sort sees raw.
+    let elemName = qn("order_proj_e", at)
+    let outBufName = qn("order_proj_buf", at)
+    var projBody : Expression?
+    if (selectLam != null) {
+        projBody = peel_lambda_replace_var(selectLam, qmacro($i(elemName)))
+    }
     if (firstName == "first") {
         // where + order + first → min/max on prefilter buffer. Empty buf must panic to match eager `first()` semantics; min/max return uninitialized refs on empty.
         stmts |> push <| qmacro_expr() {
@@ -1501,8 +1572,15 @@ def private plan_order_family(var expr : Expression?) : Expression? {
         } else {
             minMaxCall = qmacro($c(minMaxName)($i(bufName)))
         }
-        stmts |> push <| qmacro_expr() {
-            return $e(minMaxCall)
+        if (selectLam != null) {
+            stmts |> push_from <| qmacro_block_to_array() {
+                let $i(elemName) = $e(minMaxCall)
+                return $e(projBody)
+            }
+        } else {
+            stmts |> push <| qmacro_expr() {
+                return $e(minMaxCall)
+            }
         }
     } elif (firstName == "first_or_default") {
         // No min_by_or_default helper exists; route through top_n*(_, 1, _) + first_or_default for the empty-buf case.
@@ -1514,8 +1592,18 @@ def private plan_order_family(var expr : Expression?) : Expression? {
         } else {
             topNCall = qmacro($c(topNName)($i(bufName), 1))
         }
-        stmts |> push <| qmacro_expr() {
-            return _::first_or_default($e(topNCall), $e(firstDefaultExpr))
+        if (selectLam != null) {
+            // first_or_default + select: bind default once (side-effect order), project both branches.
+            let dBindName = qn("order_d", at)
+            stmts |> push_from <| qmacro_block_to_array() {
+                let $i(dBindName) = $e(firstDefaultExpr)
+                let $i(elemName) = _::first_or_default($e(topNCall), $i(dBindName))
+                return $e(projBody)
+            }
+        } else {
+            stmts |> push <| qmacro_expr() {
+                return _::first_or_default($e(topNCall), $e(firstDefaultExpr))
+            }
         }
     } elif (takeExpr == null) {
         // Sort the prefilter buffer in place and return it. order*_inplace is void
@@ -1529,8 +1617,19 @@ def private plan_order_family(var expr : Expression?) : Expression? {
             sortCall = qmacro($c(inplaceName)($i(bufName)))
         }
         stmts |> push(sortCall)
-        stmts |> push <| qmacro_expr() {
-            return <- $i(bufName)
+        if (selectLam != null) {
+            stmts |> push_from <| qmacro_block_to_array() {
+                var $i(outBufName) : array<$t(selectElemType)>
+                $i(outBufName) |> reserve(length($i(bufName)))
+                for ($i(elemName) in $i(bufName)) {
+                    $i(outBufName) |> push_clone($e(projBody))
+                }
+                return <- $i(outBufName)
+            }
+        } else {
+            stmts |> push <| qmacro_expr() {
+                return <- $i(bufName)
+            }
         }
     } else {
         // top_n* on the prefilter buffer.
@@ -1542,8 +1641,21 @@ def private plan_order_family(var expr : Expression?) : Expression? {
         } else {
             topNCall = qmacro($c(topNName)($i(bufName), $e(takeExpr)))
         }
-        stmts |> push <| qmacro_expr() {
-            return <- $e(topNCall)
+        if (selectLam != null) {
+            let topResName = qn("order_top_res", at)
+            stmts |> push_from <| qmacro_block_to_array() {
+                var $i(topResName) <- $e(topNCall)
+                var $i(outBufName) : array<$t(selectElemType)>
+                $i(outBufName) |> reserve(length($i(topResName)))
+                for ($i(elemName) in $i(topResName)) {
+                    $i(outBufName) |> push_clone($e(projBody))
+                }
+                return <- $i(outBufName)
+            }
+        } else {
+            stmts |> push <| qmacro_expr() {
+                return <- $e(topNCall)
+            }
         }
     }
     var bodyBlock = new ExprBlock(at = at)
@@ -1578,6 +1690,8 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
     let accName = qn("acc", at)
     let names <- make_range_names(at)
     var whereCond : Expression?
+    // postTakeWhereCond — Theme 2 5c: gates per-element contribution AFTER the take cap fires. Distinct from whereCond (which wraps the entire take/skip body); this preserves take.where semantics ("first N elements, then filter") that auto-rewriting can't reproduce.
+    var postTakeWhereCond : Expression?
     var projection : Expression?
     var intermediateBinds : array<Expression?>
     // preConditionStmts evaluate UNCONDITIONALLY per element, BEFORE the where filter —
@@ -1599,8 +1713,8 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
         var cll & = unsafe(calls[i])
         let opName = cll._1.name
         if (opName == "where_") {
-            // skip/take/skip_while/take_while-after-where is rejected — canonical chain order is
-            if (seenSkip || seenSkipWhile || seenTakeWhile || seenTake) return null
+            // Theme 2 5c — `take(N)._where(p)` allowed (routed to postTakeWhereCond, gates contribution only); other prior range ops still bail; single post-take where in v1.
+            if (seenSkip || seenSkipWhile || seenTakeWhile || (seenTake && postTakeWhereCond != null)) return null
             var predicate : Expression?
             if (seenSelect) {
                 // Phase 3d / single-eval: where-after-select. Bind the current projection
@@ -1621,7 +1735,9 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
             } else {
                 predicate = peel_lambda_rename_var(cll._0.arguments[1], itName)
             }
-            if (whereCond == null) {
+            if (seenTake) {
+                postTakeWhereCond = predicate
+            } elif (whereCond == null) {
                 whereCond = predicate
             } else {
                 whereCond = qmacro($e(whereCond) && $e(predicate))
@@ -1694,7 +1810,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
         laneTops |> push(top)
         var laneSrcs : array<string>
         laneSrcs |> push(srcName)
-        return emit_accumulator_lane(lastName, laneTops, projection, whereCond,
+        return emit_accumulator_lane(lastName, laneTops, projection, whereCond, postTakeWhereCond,
             intermediateBinds, preCondStmts, elementType, laneSrcs, accName, itName, names,
             skipExpr, takeExpr, skipWhileCond, takeWhileCond, at)
     }
@@ -1709,7 +1825,7 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
         laneTops |> push(top)
         var laneSrcs : array<string>
         laneSrcs |> push(srcName)
-        return emit_early_exit_lane(lastName, laneTops, projection, whereCond,
+        return emit_early_exit_lane(lastName, laneTops, projection, whereCond, postTakeWhereCond,
             intermediateBinds, preCondStmts, elementType, terminatorCall, laneSrcs, itName, names,
             skipExpr, takeExpr, skipWhileCond, takeWhileCond, at)
     }
@@ -1724,29 +1840,34 @@ def private plan_loop_or_count(var expr : Expression?) : Expression? {
                 var $i(finalBindName) = $e(projection)
             }
         }
-        stmts |> push <| qmacro_expr() {
+        // Theme 2 5c: when postTakeWhereCond is set, gate JUST the acc++ — the take cap still ticks unconditionally above.
+        var incExpr = qmacro_expr() {
             $i(accName) ++
         }
+        stmts |> push(wrap_with_condition(incExpr, postTakeWhereCond))
         prepend_binds(stmts, intermediateBinds)
         wrap_with_ranges(stmts, skipExpr, takeExpr, skipWhileCond, takeWhileCond, names)
         loopBody = prepend_precond(wrap_with_condition(stmts_to_expr(stmts), whereCond), preCondStmts)
     } else {
         // Array lane. `push_clone` is the safe append everywhere: for workhorse types it's a
         var stmts : array<Expression?>
+        var pushExpr : Expression?
         if (projection != null) {
-            stmts |> push <| qmacro_expr() {
+            pushExpr = qmacro_expr() {
                 $i(accName) |> push_clone($e(projection))
             }
-        } elif (whereCond != null || skipExpr != null || takeExpr != null
+        } elif (whereCond != null || postTakeWhereCond != null || skipExpr != null || takeExpr != null
                 || skipWhileCond != null || takeWhileCond != null) {
             // Identity push: `it` aliases the source element. Reached when chain is bare
-            stmts |> push <| qmacro_expr() {
+            pushExpr = qmacro_expr() {
                 $i(accName) |> push_clone($i(itName))
             }
         } else {
             // identity chain — nothing to fuse; let the caller fall through.
             return null
         }
+        // Theme 2 5c: postTakeWhereCond gates JUST the push — same shape as counter lane.
+        stmts |> push(wrap_with_condition(pushExpr, postTakeWhereCond))
         prepend_binds(stmts, intermediateBinds)
         wrap_with_ranges(stmts, skipExpr, takeExpr, skipWhileCond, takeWhileCond, names)
         loopBody = prepend_precond(wrap_with_condition(stmts_to_expr(stmts), whereCond), preCondStmts)
@@ -1781,6 +1902,8 @@ def private plan_reverse(var expr : Expression?) : Expression? {
     var hasReverse = false
     var seenSelect = false
     var takeExpr : Expression?
+    var terminalSelectLam : Expression?
+    var terminalSelectElemType : TypeDeclPtr
     let at = calls[0]._0.at
     let srcName = qn("source", at)
     let itName  = qn("it", at)
@@ -1797,9 +1920,19 @@ def private plan_reverse(var expr : Expression?) : Expression? {
             if (hasReverse || seenSelect) return null
             whereCond = merge_where_cond(whereCond, peel_lambda_rename_var(cll._0.arguments[1], itName))
         } elif (name == "select") {
-            if (hasReverse || seenSelect) return null
-            seenSelect = true
-            projection = peel_lambda_rename_var(cll._0.arguments[1], itName)
+            if (!hasReverse && !seenSelect) {
+                // Pre-reverse select: existing path (buffer holds projected values).
+                seenSelect = true
+                projection = peel_lambda_rename_var(cll._0.arguments[1], itName)
+            } elif (hasReverse && !seenSelect && terminalSelectLam == null && i == length(calls) - 1) {
+                // Terminal post-reverse select: project at return (R1-R4 buf or first scalar).
+                terminalSelectLam = cll._0.arguments[1]
+                if (terminalSelectLam == null
+                        || cll._0._type == null || cll._0._type.firstType == null) return null
+                terminalSelectElemType = clone_type(cll._0._type.firstType)
+            } else {
+                return null
+            }
         } elif (name == "reverse") {
             if (hasReverse) return null
             hasReverse = true
@@ -1813,7 +1946,9 @@ def private plan_reverse(var expr : Expression?) : Expression? {
             return null
         }
     }
-    if (!hasReverse || (takeExpr != null && terminatorName != "")) return null
+    // count + terminal _select would drop projection side effects (count ≡ count after pure select). Defer.
+    if (!hasReverse || (takeExpr != null && terminatorName != "")
+            || (terminalSelectLam != null && terminatorName == "count")) return null
     var body : Expression?
     if (terminatorName == "count") {
         // Reverse is identity for count — counter loop, no buffer. Side-effecting projection still fires per match.
@@ -1853,6 +1988,16 @@ def private plan_reverse(var expr : Expression?) : Expression? {
             $i(foundName) = true
         }
         var perElement = wrap_with_condition(matchBlock, whereCond)
+        // Terminal _select: `last` stays source-typed; project (and the default) at return.
+        var lastRetExpr : Expression?
+        var dRetExpr : Expression?
+        if (terminalSelectLam != null) {
+            lastRetExpr = peel_lambda_replace_var(terminalSelectLam, qmacro($i(lastName)))
+            dRetExpr = peel_lambda_replace_var(terminalSelectLam, qmacro($i(dBindName)))
+        } else {
+            lastRetExpr = qmacro($i(lastName))
+            dRetExpr = qmacro($i(dBindName))
+        }
         if (terminatorName == "first") {
             body = qmacro_block() {
                 var $i(foundName) = false
@@ -1863,7 +2008,7 @@ def private plan_reverse(var expr : Expression?) : Expression? {
                 if (!$i(foundName)) {
                     panic("sequence contains no elements")
                 }
-                return $i(lastName)
+                return $e(lastRetExpr)
             }
         } else {
             body = qmacro_block() {
@@ -1873,14 +2018,22 @@ def private plan_reverse(var expr : Expression?) : Expression? {
                 for ($i(itName) in $i(srcName)) {
                     $e(perElement)
                 }
-                return $i(foundName) ? $i(lastName) : $i(dBindName)
+                return $i(foundName) ? $e(lastRetExpr) : $e(dRetExpr)
             }
         }
     } else {
         // R1-R4 path: buffer + reverse_inplace + optional resize + return buffer.
         let needIterWrap = expr._type.isIterator
         var bufElemType = strip_const_ref(clone_type(reverseCall._type.firstType))
+        // Terminal _select projects buffer survivors at return (after resize trims to take(N)).
+        let outBufName = qn("rev_proj_buf", at)
+        let elemName = qn("rev_proj_e", at)
+        var projBody : Expression?
+        if (terminalSelectLam != null) {
+            projBody = peel_lambda_replace_var(terminalSelectLam, qmacro($i(elemName)))
+        }
         let canBackwardIndex = (takeExpr != null && projection == null && whereCond == null
+                && terminalSelectLam == null
                 && (top._type.isGoodArrayType || top._type.isArray))
         if (canBackwardIndex) {
             // R6: visit only the last takeN indices — skips full-source push + O(length) reverse_inplace.
@@ -1923,16 +2076,36 @@ def private plan_reverse(var expr : Expression?) : Expression? {
                     $i(bufName) |> resize($e(takeExpr) <= 0 ? 0 : ($e(takeExpr) < length($i(bufName)) ? $e(takeExpr) : length($i(bufName))))
                 }
             }
-            var returnExpr = buffer_return(bufName, needIterWrap)
-            body = qmacro_block() {
-                var $i(bufName) : array<$t(bufElemType)>
-                $b(reserveStmts)
-                for ($i(itName) in $i(srcName)) {
-                    $e(pushExpr)
+            if (terminalSelectLam != null) {
+                // Post-reverse projection: outBuf returned in place of bufName.
+                var returnExpr = buffer_return(outBufName, needIterWrap)
+                body = qmacro_block() {
+                    var $i(bufName) : array<$t(bufElemType)>
+                    $b(reserveStmts)
+                    for ($i(itName) in $i(srcName)) {
+                        $e(pushExpr)
+                    }
+                    _::reverse_inplace($i(bufName))
+                    $b(resizeStmts)
+                    var $i(outBufName) : array<$t(terminalSelectElemType)>
+                    $i(outBufName) |> reserve(length($i(bufName)))
+                    for ($i(elemName) in $i(bufName)) {
+                        $i(outBufName) |> push_clone($e(projBody))
+                    }
+                    $e(returnExpr)
+                }
+            } else {
+                var returnExpr = buffer_return(bufName, needIterWrap)
+                body = qmacro_block() {
+                    var $i(bufName) : array<$t(bufElemType)>
+                    $b(reserveStmts)
+                    for ($i(itName) in $i(srcName)) {
+                        $e(pushExpr)
+                    }
+                    _::reverse_inplace($i(bufName))
+                    $b(resizeStmts)
+                    $e(returnExpr)
                 }
-                _::reverse_inplace($i(bufName))
-                $b(resizeStmts)
-                $e(returnExpr)
             }
         }
     }
@@ -2730,6 +2903,7 @@ def private plan_group_by_core(var calls : array<tuple<ExprCall?; LinqCall?>>;
                                var keyBlock : Expression?;
                                var groupProjCall : ExprCall?;
                                var havingCall : ExprCall?;
+                               var trailingWhereCall : ExprCall?;
                                terminatorName : string;
                                exprIsIterator : bool;
                                at : LineInfo;
@@ -2745,6 +2919,7 @@ def private plan_group_by_core(var calls : array<tuple<ExprCall?; LinqCall?>>;
     let bindName   = qn("{prefix}gpb", at)
     let cntName    = qn("{prefix}cnt", at)
     let dummyName  = qn("{prefix}dummy", at)
+    let outName    = qn("{prefix}out", at)
     // Walk upstream where_/select* into segments. Each where guards everything AFTER it; a select after a where flushes a new segment so the projection bind lives inside the where's guard.
     var segBinds : array<array<Expression?>>
     var segWheres : array<Expression?>
@@ -2816,6 +2991,12 @@ def private plan_group_by_core(var calls : array<tuple<ExprCall?; LinqCall?>>;
         havingPred = rewrite_having_pred(rawPred, hbName, kvName, specs)
         if (havingPred == null || expr_uses_var(havingPred, hbName)) return null
     }
+    // Peel optional trailing _where (Theme 2). Predicate references the post-aggregate output tuple via outName.
+    var trailingWherePred : Expression?
+    if (trailingWhereCall != null) {
+        trailingWherePred = peel_lambda_rename_var(trailingWhereCall.arguments[1], outName)
+        if (trailingWherePred == null) return null
+    }
     let hasHidden = (specs |> length) > userVisibleSlotCount
     // Bare reducer + hidden slot needs typedecl(invoke(...)) growth inside a qmacro — can't grow that dynamically. Cascade.
     if (hasHidden && !usesNamedTuple) return null
@@ -2929,28 +3110,11 @@ def private plan_group_by_core(var calls : array<tuple<ExprCall?; LinqCall?>>;
     }
     // Adapter-specific source loop emission (for(it in src) for array, for_each_archetype + inner for for decs).
     stmts |> push(adapter_emit_source_loop(adapter, body, at))
-    // Terminator emission + retType derivation.
-    var retType : TypeDeclPtr
-    if (terminatorName == "count") {
-        retType = new TypeDecl(baseType = Type.tInt)
-        if (havingPred != null) {
-            stmts |> push <| qmacro_block() {
-                var $i(cntName) = 0
-                for ($i(kvName) in values($i(tabName))) {
-                    if ($e(havingPred)) {
-                        $i(cntName) ++
-                    }
-                }
-                return $i(cntName)
-            }
-        } else {
-            stmts |> push <| qmacro_expr() {
-                return length($i(tabName))
-            }
-        }
-    } else {
-        // to_array lane (implicit when no count terminator): build result buffer by walking the table.
-        var outputExpr : Expression?
+    // Compute output tuple + bufElemType — needed by to_array always, and by count when trailingWherePred is present (predicate is bound against the constructed output).
+    let needOutput = terminatorName != "count" || trailingWherePred != null
+    var outputExpr : Expression?
+    var bufElemType : TypeDeclPtr
+    if (needOutput) {
         if (!usesNamedTuple) {
             if (hasAvg) {
                 outputExpr = mk_avg_divide_expr(at, kvName, 1)
@@ -2989,12 +3153,57 @@ def private plan_group_by_core(var calls : array<tuple<ExprCall?; LinqCall?>>;
                 $i(kvName)
             }
         }
-        var bufElemType = clone_type(groupProjBody._type)
+        bufElemType = clone_type(groupProjBody._type)
         if (bufElemType != null) {
             bufElemType.flags.constant = false
             bufElemType.flags.ref = false
         }
-        // retType matches exprIsIterator: iterator<bufElemType> via to_sequence_move(buf) tail, or array<bufElemType> via raw buf return. Both paths use buffer_return(bufName, exprIsIterator) for the final stmt.
+    }
+    // Terminator emission + retType derivation.
+    var retType : TypeDeclPtr
+    if (terminatorName == "count") {
+        retType = new TypeDecl(baseType = Type.tInt)
+        if (havingPred != null && trailingWherePred != null) {
+            stmts |> push <| qmacro_block() {
+                var $i(cntName) = 0
+                for ($i(kvName) in values($i(tabName))) {
+                    if ($e(havingPred)) {
+                        let $i(outName) : $t(bufElemType) = $e(outputExpr)
+                        if ($e(trailingWherePred)) {
+                            $i(cntName) ++
+                        }
+                    }
+                }
+                return $i(cntName)
+            }
+        } elif (trailingWherePred != null) {
+            stmts |> push <| qmacro_block() {
+                var $i(cntName) = 0
+                for ($i(kvName) in values($i(tabName))) {
+                    let $i(outName) : $t(bufElemType) = $e(outputExpr)
+                    if ($e(trailingWherePred)) {
+                        $i(cntName) ++
+                    }
+                }
+                return $i(cntName)
+            }
+        } elif (havingPred != null) {
+            stmts |> push <| qmacro_block() {
+                var $i(cntName) = 0
+                for ($i(kvName) in values($i(tabName))) {
+                    if ($e(havingPred)) {
+                        $i(cntName) ++
+                    }
+                }
+                return $i(cntName)
+            }
+        } else {
+            stmts |> push <| qmacro_expr() {
+                return length($i(tabName))
+            }
+        }
+    } else {
+        // to_array lane: walk table → buf; iterator-typed context wraps via buffer_return(..., true).
         if (exprIsIterator) {
             retType = new TypeDecl(baseType = Type.tIterator)
             retType.firstType = clone_type(bufElemType)
@@ -3006,7 +3215,27 @@ def private plan_group_by_core(var calls : array<tuple<ExprCall?; LinqCall?>>;
             var $i(bufName) : array<$t(bufElemType)>
             $i(bufName) |> reserve(length($i(tabName)))
         }
-        if (havingPred != null) {
+        if (havingPred != null && trailingWherePred != null) {
+            stmts |> push <| qmacro_expr() {
+                for ($i(kvName) in values($i(tabName))) {
+                    if ($e(havingPred)) {
+                        let $i(outName) : $t(bufElemType) = $e(outputExpr)
+                        if ($e(trailingWherePred)) {
+                            $i(bufName) |> push_clone($i(outName))
+                        }
+                    }
+                }
+            }
+        } elif (trailingWherePred != null) {
+            stmts |> push <| qmacro_expr() {
+                for ($i(kvName) in values($i(tabName))) {
+                    let $i(outName) : $t(bufElemType) = $e(outputExpr)
+                    if ($e(trailingWherePred)) {
+                        $i(bufName) |> push_clone($i(outName))
+                    }
+                }
+            }
+        } elif (havingPred != null) {
             stmts |> push <| qmacro_expr() {
                 for ($i(kvName) in values($i(tabName))) {
                     if ($e(havingPred)) {
@@ -3042,6 +3271,13 @@ def private plan_group_by(var expr : Expression?) : Expression? {
             calls |> pop
         }
     }
+    // Optional: trailing _where AFTER _select(reducer) — SQL HAVING shape, predicate on post-aggregate tuple. Theme 2 (closes audit 4a). Distinct from `having_` (which lives between group_by_lazy and select and can lift hidden reducer slots): trailing _where binds the constructed output tuple and gates the buf-emit loop.
+    var trailingWhereCall : ExprCall?
+    if (!empty(calls) && calls.back()._1.name == "where_") {
+        trailingWhereCall = calls.back()._0
+        if ((trailingWhereCall.arguments |> length) < 2) return null
+        calls |> pop
+    }
     // Required tail: select(group_proj) — without it the chain yields raw buckets (no fusion).
     if (empty(calls) || calls.back()._1.name != "select") return null
     var groupProjCall = calls.back()._0
@@ -3068,7 +3304,7 @@ def private plan_group_by(var expr : Expression?) : Expression? {
         arraySrcName := qn("source", at),
         decsBridge = null
     )
-    return plan_group_by_core(calls, keyBlock, groupProjCall, havingCall, terminatorName, expr._type.isIterator, at, adapter)
+    return plan_group_by_core(calls, keyBlock, groupProjCall, havingCall, trailingWhereCall, terminatorName, expr._type.isIterator, at, adapter)
 }
 
 // ── decs eager-bridge unroll (Approach Z — for_each_archetype + nested _fold) ───────
@@ -4512,6 +4748,13 @@ def private plan_decs_group_by(var expr : Expression?) : Expression? {
             calls |> pop
         }
     }
+    // Optional: trailing _where AFTER _select(reducer) — SQL HAVING shape on post-aggregate tuple. Theme 2 (closes audit 4e). Decs mirror of the array-side pop.
+    var trailingWhereCall : ExprCall?
+    if (!empty(calls) && calls.back()._1.name == "where_") {
+        trailingWhereCall = calls.back()._0
+        if ((trailingWhereCall.arguments |> length) < 2) return null
+        calls |> pop
+    }
     // Required tail: select(group_proj) — without it the chain yields raw buckets (no fusion).
     if (empty(calls) || calls.back()._1.name != "select") return null
     var groupProjCall = calls.back()._0
@@ -4538,7 +4781,7 @@ def private plan_decs_group_by(var expr : Expression?) : Expression? {
         arraySrcName := "",
         decsBridge = bridge
     )
-    return plan_group_by_core(calls, keyBlock, groupProjCall, havingCall, terminatorName, expr._type.isIterator, at, adapter)
+    return plan_group_by_core(calls, keyBlock, groupProjCall, havingCall, trailingWhereCall, terminatorName, expr._type.isIterator, at, adapter)
 }
 
 // ── decs order family splice (Slice 5d — buffer + order_inplace/top_n/min_by/max_by) ───────
@@ -4559,6 +4802,8 @@ def private plan_decs_order_family(var expr : Expression?) : Expression? {
     var takeExpr : Expression?
     var firstName : string = ""
     var firstDefaultExpr : Expression?
+    var selectLam : Expression?
+    var selectElemType : TypeDeclPtr
     for (i in 0 .. length(calls)) {
         var cll & = unsafe(calls[i])
         let name = cll._1.name
@@ -4591,6 +4836,13 @@ def private plan_decs_order_family(var expr : Expression?) : Expression? {
                 if ((cll._0.arguments |> length) < 2) return null
                 firstDefaultExpr = clone_expression(cll._0.arguments[1])
             }
+        } elif (name == "select") {
+            // Terminal _select after take(N): heap/sort sees raw tuple, project at return.
+            if (i != length(calls) - 1 || !hasOrder
+                    || cll._0._type == null || cll._0._type.firstType == null) return null
+            selectLam = cll._0.arguments[1]
+            if (selectLam == null) return null
+            selectElemType = clone_type(cll._0._type.firstType)
         } else {
             return null
         }
@@ -4681,21 +4933,46 @@ def private plan_decs_order_family(var expr : Expression?) : Expression? {
         }
         var forExprNode = build_decs_inner_for_pruned(bridge, tupName, perElement, at)
         var bhStmts : array<Expression?>
-        bhStmts |> reserve(7)
+        bhStmts |> reserve(10)
         // No `reserve(takeN)` on the bounded buf — matches the policy in linq.das top_n_by_with_cmp iterator variant. Caller may pass takeN >> actual source size, and the decs cardinality is unknown ahead of the walk; pre-reserving N slots would risk a large upfront allocation for no win (fill phase grows geometrically to min(N, M) in O(log) reallocs anyway).
-        bhStmts |> push_from <| qmacro_block_to_array() {
-            let $i(takeNName) = $e(takeExpr)
-            var $i(bufName) : array<$t(elemType)>
-            return <- $i(bufName) if ($i(takeNName) <= 0)
-            for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
-                $e(forExprNode)
-            })
-            _::order_inplace($i(bufName), $e(inlineCmp))
-            return <- $i(bufName)
+        if (selectLam != null) {
+            // Terminal _select projects ≤K heap survivors at return (heap holds raw tuples).
+            let outBufName = qn("decs_proj_buf", at)
+            let elemName = qn("decs_proj_e", at)
+            var projBody = peel_lambda_replace_var(selectLam, qmacro($i(elemName)))
+            bhStmts |> push_from <| qmacro_block_to_array() {
+                let $i(takeNName) = $e(takeExpr)
+                var $i(bufName) : array<$t(elemType)>
+                var $i(outBufName) : array<$t(selectElemType)>
+                return <- $i(outBufName) if ($i(takeNName) <= 0)
+                for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
+                    $e(forExprNode)
+                })
+                _::order_inplace($i(bufName), $e(inlineCmp))
+                $i(outBufName) |> reserve(length($i(bufName)))
+                for ($i(elemName) in $i(bufName)) {
+                    $i(outBufName) |> push_clone($e(projBody))
+                }
+                return <- $i(outBufName)
+            }
+            emission = qmacro(invoke($() : array<$t(selectElemType)> {
+                $b(bhStmts)
+            }))
+        } else {
+            bhStmts |> push_from <| qmacro_block_to_array() {
+                let $i(takeNName) = $e(takeExpr)
+                var $i(bufName) : array<$t(elemType)>
+                return <- $i(bufName) if ($i(takeNName) <= 0)
+                for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
+                    $e(forExprNode)
+                })
+                _::order_inplace($i(bufName), $e(inlineCmp))
+                return <- $i(bufName)
+            }
+            emission = qmacro(invoke($() : array<$t(elemType)> {
+                $b(bhStmts)
+            }))
         }
-        emission = qmacro(invoke($() : array<$t(elemType)> {
-            $b(bhStmts)
-        }))
         return finalize_decs_emission(emission, at, needIterWrap)
     }
     var perElement : Expression? = qmacro_expr() {
@@ -4769,12 +5046,29 @@ def private plan_decs_order_family(var expr : Expression?) : Expression? {
             sortCall = qmacro($c(inplaceName)($i(bufName)))
         }
         bodyStmts |> push(sortCall)
-        bodyStmts |> push <| qmacro_expr() {
-            return <- $i(bufName)
+        if (selectLam != null) {
+            let outBufName = qn("decs_proj_buf", at)
+            let elemName = qn("decs_proj_e", at)
+            var projBody = peel_lambda_replace_var(selectLam, qmacro($i(elemName)))
+            bodyStmts |> push_from <| qmacro_block_to_array() {
+                var $i(outBufName) : array<$t(selectElemType)>
+                $i(outBufName) |> reserve(length($i(bufName)))
+                for ($i(elemName) in $i(bufName)) {
+                    $i(outBufName) |> push_clone($e(projBody))
+                }
+                return <- $i(outBufName)
+            }
+            emission = qmacro(invoke($() : array<$t(selectElemType)> {
+                $b(bodyStmts)
+            }))
+        } else {
+            bodyStmts |> push <| qmacro_expr() {
+                return <- $i(bufName)
+            }
+            emission = qmacro(invoke($() : array<$t(elemType)> {
+                $b(bodyStmts)
+            }))
         }
-        emission = qmacro(invoke($() : array<$t(elemType)> {
-            $b(bodyStmts)
-        }))
     } else {
         // order + take → top_n* dispatch on the buffer.
         var topNCall : Expression?
@@ -4785,12 +5079,31 @@ def private plan_decs_order_family(var expr : Expression?) : Expression? {
         } else {
             topNCall = qmacro($c(topNName)($i(bufName), $e(takeExpr)))
         }
-        bodyStmts |> push <| qmacro_expr() {
-            return <- $e(topNCall)
+        if (selectLam != null) {
+            let topResName = qn("decs_top_res", at)
+            let outBufName = qn("decs_proj_buf", at)
+            let elemName = qn("decs_proj_e", at)
+            var projBody = peel_lambda_replace_var(selectLam, qmacro($i(elemName)))
+            bodyStmts |> push_from <| qmacro_block_to_array() {
+                var $i(topResName) <- $e(topNCall)
+                var $i(outBufName) : array<$t(selectElemType)>
+                $i(outBufName) |> reserve(length($i(topResName)))
+                for ($i(elemName) in $i(topResName)) {
+                    $i(outBufName) |> push_clone($e(projBody))
+                }
+                return <- $i(outBufName)
+            }
+            emission = qmacro(invoke($() : array<$t(selectElemType)> {
+                $b(bodyStmts)
+            }))
+        } else {
+            bodyStmts |> push <| qmacro_expr() {
+                return <- $e(topNCall)
+            }
+            emission = qmacro(invoke($() : array<$t(elemType)> {
+                $b(bodyStmts)
+            }))
         }
-        emission = qmacro(invoke($() : array<$t(elemType)> {
-            $b(bodyStmts)
-        }))
     }
     // Bare order + take both return array; wrap to iterator when the user's outer context demands it. first/first_or_default return scalar — no wrap.
     return finalize_decs_emission(emission, at, needIterWrap && firstName == "")
@@ -4826,6 +5139,8 @@ def private plan_decs_reverse(var expr : Expression?) : Expression? {
     var hasReverse = false
     var seenSelect = false
     var takeExpr : Expression?
+    var terminalSelectLam : Expression?
+    var terminalSelectElemType : TypeDeclPtr
     for (i in 0 .. length(calls)) {
         var cll & = unsafe(calls[i])
         let name = cll._1.name
@@ -4835,10 +5150,18 @@ def private plan_decs_reverse(var expr : Expression?) : Expression? {
             if (pred == null) return null
             whereCond = merge_where_cond(whereCond, pred)
         } elif (name == "select") {
-            if (hasReverse || seenSelect) return null
-            seenSelect = true
-            projection = peel_lambda_rename_var(cll._0.arguments[1], tupName)
-            if (projection == null) return null
+            if (!hasReverse && !seenSelect) {
+                seenSelect = true
+                projection = peel_lambda_rename_var(cll._0.arguments[1], tupName)
+                if (projection == null) return null
+            } elif (hasReverse && !seenSelect && terminalSelectLam == null && i == length(calls) - 1) {
+                terminalSelectLam = cll._0.arguments[1]
+                if (terminalSelectLam == null
+                        || cll._0._type == null || cll._0._type.firstType == null) return null
+                terminalSelectElemType = clone_type(cll._0._type.firstType)
+            } else {
+                return null
+            }
         } elif (name == "reverse") {
             if (hasReverse) return null
             hasReverse = true
@@ -4851,7 +5174,8 @@ def private plan_decs_reverse(var expr : Expression?) : Expression? {
             return null
         }
     }
-    if (!hasReverse || (takeExpr != null && terminatorName != "")) return null
+    if (!hasReverse || (takeExpr != null && terminatorName != "")
+            || (terminalSelectLam != null && terminatorName == "count")) return null
     let archName = bridge.archName
     if (terminatorName == "count") {
         // Reverse is identity for count — counter loop, no buffer. Side-effecting projection still fires per match.
@@ -4914,9 +5238,20 @@ def private plan_decs_reverse(var expr : Expression?) : Expression? {
             }
         }
         var forExprNode = build_decs_inner_for_pruned(bridge, tupName, perElement, at)
+        // Terminal _select: `last` stays source-typed; project at return.
+        let outElemType = (terminalSelectLam != null) ? terminalSelectElemType : lastType
+        var lastRetExpr : Expression?
+        var dRetExpr : Expression?
+        if (terminalSelectLam != null) {
+            lastRetExpr = peel_lambda_replace_var(terminalSelectLam, qmacro($i(lastName)))
+            dRetExpr = peel_lambda_replace_var(terminalSelectLam, qmacro($i(dBindName)))
+        } else {
+            lastRetExpr = qmacro($i(lastName))
+            dRetExpr = qmacro($i(dBindName))
+        }
         var emission : Expression?
         if (terminatorName == "first") {
-            emission = qmacro(invoke($() : $t(lastType) {
+            emission = qmacro(invoke($() : $t(outElemType) {
                 var $i(foundName) = false
                 var $i(lastName) : $t(lastType) = default<$t(lastType)>
                 for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
@@ -4925,17 +5260,17 @@ def private plan_decs_reverse(var expr : Expression?) : Expression? {
                 if (!$i(foundName)) {
                     panic("sequence contains no elements")
                 }
-                return $i(lastName)
+                return $e(lastRetExpr)
             }))
         } else {
-            emission = qmacro(invoke($() : $t(lastType) {
+            emission = qmacro(invoke($() : $t(outElemType) {
                 let $i(dBindName) = $e(terminatorCall.arguments[1])
                 var $i(foundName) = false
                 var $i(lastName) : $t(lastType) = default<$t(lastType)>
                 for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
                     $e(forExprNode)
                 })
-                return $i(foundName) ? $i(lastName) : $i(dBindName)
+                return $i(foundName) ? $e(lastRetExpr) : $e(dRetExpr)
             }))
         }
         emission.force_at(at)
@@ -4947,7 +5282,7 @@ def private plan_decs_reverse(var expr : Expression?) : Expression? {
     let needIterWrap = expr._type.isIterator
     var bufElemType = strip_const_ref(clone_type(projection != null ? projection._type : bridge.elementType))
     // Skip-into-tail fast path: `reverse |> take(N) |> to_array` with no where/select. Walk archetypes once to sum `arch.size` (cheap, no entity load), compute skip = total - takeN, then for_each_archetype_find skips whole archetypes whose size still fits below the skip threshold and short-circuits once the buffer reaches takeN. `where` would invalidate the size-based skip (count after filter is unknown without iterating); `select` would only affect element shape, not count, but is skipped here to keep v1 minimal.
-    if (takeExpr != null && whereCond == null && projection == null) {
+    if (takeExpr != null && whereCond == null && projection == null && terminalSelectLam == null) {
         let takeNName = qn("take_n", at)
         let totalName = qn("decs_total", at)
         let actualName = qn("decs_actual", at)
@@ -5031,15 +5366,36 @@ def private plan_decs_reverse(var expr : Expression?) : Expression? {
             $i(bufName) |> resize($i(takeNName) <= 0 ? 0 : ($i(takeNName) < length($i(bufName)) ? $i(takeNName) : length($i(bufName))))
         }
     }
-    var emission : Expression? = qmacro(invoke($() : array<$t(bufElemType)> {
-        var $i(bufName) : array<$t(bufElemType)>
-        for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
-            $e(forExprNode)
-        })
-        _::reverse_inplace($i(bufName))
-        $b(resizeStmts)
-        return <- $i(bufName)
-    }))
+    var emission : Expression?
+    if (terminalSelectLam != null) {
+        let outBufName = qn("decs_rev_proj_buf", at)
+        let elemName = qn("decs_rev_proj_e", at)
+        var projBody = peel_lambda_replace_var(terminalSelectLam, qmacro($i(elemName)))
+        emission = qmacro(invoke($() : array<$t(terminalSelectElemType)> {
+            var $i(bufName) : array<$t(bufElemType)>
+            for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
+                $e(forExprNode)
+            })
+            _::reverse_inplace($i(bufName))
+            $b(resizeStmts)
+            var $i(outBufName) : array<$t(terminalSelectElemType)>
+            $i(outBufName) |> reserve(length($i(bufName)))
+            for ($i(elemName) in $i(bufName)) {
+                $i(outBufName) |> push_clone($e(projBody))
+            }
+            return <- $i(outBufName)
+        }))
+    } else {
+        emission = qmacro(invoke($() : array<$t(bufElemType)> {
+            var $i(bufName) : array<$t(bufElemType)>
+            for_each_archetype($e(bridge.reqHashExpr), $e(bridge.erqExpr), $($i(archName) : Archetype) {
+                $e(forExprNode)
+            })
+            _::reverse_inplace($i(bufName))
+            $b(resizeStmts)
+            return <- $i(bufName)
+        }))
+    }
     return finalize_decs_emission(emission, at, needIterWrap)
 }
 
@@ -5277,10 +5633,28 @@ def private plan_decs_join(var expr : Expression?) : Expression? {
             calls |> pop
         }
     }
+    // Trailing _select composes with the join's result lambda: bind once, project once. Element type is derived later from expr._type.firstType (set by the user's downstream typer) — no need to record it here.
+    var selectLam : Expression?
+    if (terminatorName != "count" && !empty(calls) && calls.back()._1.name == "select") {
+        var selCall = calls.back()._0
+        selectLam = selCall.arguments[1]
+        if (selectLam == null || selCall._type == null || selCall._type.firstType == null) return null
+        calls |> pop
+    }
+    // Trailing _where (Theme 2 — closes audit 8a + C6). Predicate references join-result fields, so we bind the result once per pair and gate the push/incr. Comes BEFORE select in chain order, so pop AFTER select.
+    var whereLam : Expression?
+    if (!empty(calls) && calls.back()._1.name == "where_") {
+        var wCall = calls.back()._0
+        if (wCall.arguments |> length < 2) return null
+        whereLam = wCall.arguments[1]
+        if (whereLam == null) return null
+        calls |> pop
+    }
     // Must end on a single `join` call now — interleaved where/select unsupported in v1.
     if (empty(calls) || calls.back()._1.name != "join") return null
     var joinCall = calls.back()._0
     calls |> pop
+    // Iterator-typed context bails regardless of selectLam: the emission below returns `array<elemType>` and is not wrapped to iterator. (Currently user-unreachable when selectLam != null — `_select` after `_join` can't infer its result_selector without a downstream terminator — but kept for defensive splice hygiene.)
     if (!empty(calls) || (terminatorName == "" && !expr._type.isGoodArrayType)) return null
     // Both sides must be from_decs_template eager bridges.
     var bridgeA = extract_decs_bridge(top)
@@ -5330,8 +5704,28 @@ def private plan_decs_join(var expr : Expression?) : Expression? {
         preludeStmts |> push <| qmacro_expr() {
             var $i(cntName) : int = 0
         }
-        probeStmts |> push <| qmacro_expr() {
-            $i(cntName) += length($i(arrName))
+        if (whereLam == null) {
+            // Fast path: bucket-length sum.
+            probeStmts |> push <| qmacro_expr() {
+                $i(cntName) += length($i(arrName))
+            }
+        } else {
+            // HAVING-shape: bind result, evaluate predicate, conditional incr.
+            var resultLam = joinCall.arguments[4]
+            if (resultLam == null || resultLam._type == null || resultLam._type.firstType == null) return null
+            var resultBody = peel_lambda_rename_2vars(resultLam, tupAName, bElemName)
+            if (resultBody == null) return null
+            let joinResultType = strip_const_ref(clone_type(resultLam._type.firstType))
+            let resBindName = qn("decs_jres", at)
+            var wherePred = peel_lambda_replace_var(whereLam, qmacro($i(resBindName)))
+            probeStmts |> push <| qmacro_expr() {
+                for ($i(bElemName) in $i(arrName)) {
+                    let $i(resBindName) : $t(joinResultType) = $e(resultBody)
+                    if ($e(wherePred)) {
+                        $i(cntName) += 1
+                    }
+                }
+            }
         }
         returnStmt = qmacro_expr() {
             return $i(cntName)
@@ -5346,9 +5740,41 @@ def private plan_decs_join(var expr : Expression?) : Expression? {
         preludeStmts |> push <| qmacro_expr() {
             var $i(bufName) : array<$t(resultType)>
         }
-        probeStmts |> push <| qmacro_expr() {
-            for ($i(bElemName) in $i(arrName)) {
-                $i(bufName) |> push_clone($e(resultBody))
+        let needBind = selectLam != null || whereLam != null
+        if (needBind) {
+            // Bind join result once per pair (side effects once), then optionally filter / project.
+            let joinResultType = strip_const_ref(clone_type(resultLam._type.firstType))
+            let resBindName = qn("decs_jres", at)
+            var pushExpr : Expression?
+            if (selectLam != null) {
+                var projBody = peel_lambda_replace_var(selectLam, qmacro($i(resBindName)))
+                pushExpr = qmacro($i(bufName) |> push_clone($e(projBody)))
+            } else {
+                pushExpr = qmacro($i(bufName) |> push_clone($i(resBindName)))
+            }
+            if (whereLam != null) {
+                var wherePred = peel_lambda_replace_var(whereLam, qmacro($i(resBindName)))
+                probeStmts |> push <| qmacro_expr() {
+                    for ($i(bElemName) in $i(arrName)) {
+                        let $i(resBindName) : $t(joinResultType) = $e(resultBody)
+                        if ($e(wherePred)) {
+                            $e(pushExpr)
+                        }
+                    }
+                }
+            } else {
+                probeStmts |> push <| qmacro_expr() {
+                    for ($i(bElemName) in $i(arrName)) {
+                        let $i(resBindName) : $t(joinResultType) = $e(resultBody)
+                        $e(pushExpr)
+                    }
+                }
+            }
+        } else {
+            probeStmts |> push <| qmacro_expr() {
+                for ($i(bElemName) in $i(arrName)) {
+                    $i(bufName) |> push_clone($e(resultBody))
+                }
             }
         }
         returnStmt = qmacro_expr() {
@@ -5398,8 +5824,8 @@ def private plan_zip(var expr : Expression?) : Expression? {
     if (empty(calls) || calls[0]._1.name != "zip") return null
     var zipCall = calls[0]._0
     let zipArgCount = zipCall.arguments |> length
-    // Z6 bail: result-selector form (3-arg zip = 2 sources + selector) yields scalar element stream — different splice shape, defer.
-    if (zipArgCount != 2) return null
+    // 3-arg zip(a, b, sel): pre-lower the selector into `projection` (peeled with it._0/_1 binds).
+    if (zipArgCount != 2 && zipArgCount != 3) return null
     // Identify recognized terminator. Counter: count/long_count. Accumulator: sum/min/max/average. Early-exit: first/first_or_default/any/all/contains. Anything else: treat as no-terminator (bare → ARRAY lane); unrecognized chain op bails inside the chain walk.
     var lastName = ""
     var intermediateEnd = length(calls)
@@ -5467,6 +5893,29 @@ def private plan_zip(var expr : Expression?) : Expression? {
     var seenTakeWhile = false
     var seenTake = false
     var allProjectionsPure = true
+    // Pre-lower 3-arg zip(a,b,sel) → seeded projection (2-arg lambda replaced with it._0/_1).
+    if (zipArgCount == 3) {
+        var resultLam = zipCall.arguments[2]
+        if (resultLam == null || !(resultLam is ExprMakeBlock)) return null
+        var mblk = resultLam as ExprMakeBlock
+        var blk = mblk._block as ExprBlock
+        if (blk == null || blk.arguments |> length != 2 || blk.list |> length != 1
+                || !(blk.list[0] is ExprReturn)) return null
+        var ret = blk.list[0] as ExprReturn
+        if (ret.subexpr == null || ret.subexpr._type == null) return null
+        var projBody = clone_expression(ret.subexpr)
+        var projElemType = strip_const_ref(clone_type(ret.subexpr._type))
+        var zipRules : Template
+        zipRules |> replaceVariable(string(blk.arguments[0].name), qmacro($i(itName)._0))
+        zipRules |> replaceVariable(string(blk.arguments[1].name), qmacro($i(itName)._1))
+        projBody = apply_template(zipRules, projBody.at, projBody)
+        projection = projBody
+        elementType = projElemType
+        seenSelect = true
+        if (has_sideeffects(projection)) {
+            allProjectionsPure = false
+        }
+    }
     // Z3 chain walk: fuse where_/select/take/skip/take_while/skip_while between zip and terminator. Predicates/projections receive the tuple element via peel_lambda_replace_var substitution with `(itA, itB)` — typer collapses `t._0/_1` to the raw iter vars.
     for (i in 1 .. intermediateEnd) {
         var cll & = unsafe(calls[i])
@@ -5546,7 +5995,7 @@ def private plan_zip(var expr : Expression?) : Expression? {
         var intermediateBinds : array<Expression?>
         var laneTops <- [srcAExpr, srcBExpr]
         let laneSrcs <- [srcAName, srcBName]
-        return emit_accumulator_lane(lastName, laneTops, projection, whereCond,
+        return emit_accumulator_lane(lastName, laneTops, projection, whereCond, null,
             intermediateBinds, preCondStmts, elementType, laneSrcs, accName, itName, names,
             skipExpr, takeExpr, skipWhileCond, takeWhileCond, at)
     }
@@ -5559,7 +6008,7 @@ def private plan_zip(var expr : Expression?) : Expression? {
         let terminatorCall = calls.back()._0
         var laneTops <- [srcAExpr, srcBExpr]
         let laneSrcs <- [srcAName, srcBName]
-        return emit_early_exit_lane(lastName, laneTops, projection, whereCond,
+        return emit_early_exit_lane(lastName, laneTops, projection, whereCond, null,
             intermediateBinds, preCondStmts, elementType, terminatorCall, laneSrcs, itName, names,
             skipExpr, takeExpr, skipWhileCond, takeWhileCond, at)
     }
diff --git a/doc/source/reference/linq_fold_patterns.rst b/doc/source/reference/linq_fold_patterns.rst
index aa15680cf..4696368a2 100644
--- a/doc/source/reference/linq_fold_patterns.rst
+++ b/doc/source/reference/linq_fold_patterns.rst
@@ -67,9 +67,9 @@ Source-side entry points
    * - ``each(array<T>)``
      - ``peel_each``
      - Strips the ``each`` wrapper; subsequent chain plans see the raw ``array<T>`` source.
-   * - ``zip(a, b)`` / ``zip(a, b, c)``
+   * - ``zip(a, b)`` / ``zip(a, b, sel)``
      - ``plan_zip``
-     - Two- or three-source zip. Splice fuses zip + select + aggregate.
+     - Two-source zip. The three-argument form ``zip(a, b, sel)`` is pre-lowered to ``zip(a, b) |> _select(sel-as-tuple)`` so the standard zip+select fusion fires (closes the dot-product idiom).
    * - ``from_decs_template(type<T>)``
      - ``plan_decs_unroll`` etc.
      - Surfaces a ``[decs_template]`` schema. Decs splices fire.
@@ -108,6 +108,9 @@ Array-source patterns
    * - ``._where(P).take(N).count()`` / ``.sum()``
      - ``plan_loop_or_count`` (counter / accumulator with ``takeExpr``)
      - Bounded counter/accumulator; loop exits at N matches.
+   * - ``.take(N)._where(P).<terminator>`` (counter / accumulator / early-exit / array)
+     - ``plan_loop_or_count`` (``postTakeWhereCond`` gate)
+     - Take cap ticks unconditionally; ``where`` gates only the per-element contribution. Preserves the "first N elements, then keep matching" semantic that ``where.take`` cannot express. Single trailing ``where`` only — skip / skip_while / take_while + where still cascade.
    * - ``._where(P).take_while(P2).<...>`` / ``.skip_while(P2).<...>``
      - ``plan_loop_or_count`` (predicate-driven ranges)
      - ``take_while`` exits on first non-match; ``skip_while`` toggles state.
@@ -117,6 +120,9 @@ Array-source patterns
    * - ``._order_by(K).take(N).to_array()``
      - ``plan_order_family`` (bounded-heap)
      - ``spliced_push_heap`` fill + replace, ``spliced_pop_heap`` on replace, ``order_inplace`` at end. Buffer of size N.
+   * - ``._order_by(K).take(N)._select(F).to_array()`` / ``.first()._select(F)`` / ``.first_or_default()._select(F)``
+     - ``plan_order_family`` (terminal ``_select``)
+     - Bounded-heap / streaming-min holds the raw element; projection ``F`` runs ≤K times at return. Closes the natural "take top-K then project" idiom.
    * - ``._order_by(K).to_array()`` / ``.order_by_descending(K).to_array()`` / ``.order(K).to_array()`` / ``.order_descending(K).to_array()``
      - ``plan_order_family`` (full-sort fallback)
      - Materializes + sorts. No bounded-heap shortcut.
@@ -128,10 +134,16 @@ Array-source patterns
      - Per-key bucket reducer; single hash, one entry per group.
    * - ``._group_by(K)._having(P)._select(...).to_array()``
      - ``plan_group_by`` → ``plan_group_by_core``
-     - HAVING filter applied after the per-key reduce.
+     - HAVING filter on the bucket reference (pre-aggregate); can lift hidden reducer slots referenced by ``P`` but absent from the select.
+   * - ``._group_by(K)._select(reduce)._where(P).to_array()`` / ``.count()``
+     - ``plan_group_by`` → ``plan_group_by_core`` (trailing ``where`` as HAVING)
+     - HAVING filter on the constructed post-aggregate tuple (predicate references ``_.AggField`` by name). Distinct from ``_having(P)`` and orthogonal — both can fire on the same chain.
    * - ``.reverse().take(N).to_array()`` (with no ``where`` / ``select``)
      - ``plan_reverse`` (two-pass)
      - Sum archetype sizes, then walk tail-first with skip-counter and early-exit.
+   * - ``.reverse().take(N)._select(F).to_array()`` / ``.reverse()._select(F).first()``
+     - ``plan_reverse`` (terminal ``_select``)
+     - Projection runs ≤K times at return on the R1-R4 buffer or on the surviving ``last`` value. NOT accepted: ``reverse._select.take`` — user must reorder to ``reverse.take._select``.
 
 Decs-source patterns
 ====================
@@ -168,6 +180,9 @@ identical — only the source iteration changes.
    * - ``from_decs_template(...)._order_by(K).take(N).to_array()``
      - ``plan_decs_order_family`` (bounded-heap)
      - Same heap pattern as the array variant; buffer size N.
+   * - ``from_decs_template(...)._order_by(K).take(N)._select(F).to_array()``
+     - ``plan_decs_order_family`` (terminal ``_select``)
+     - Decs mirror of ``plan_order_family``'s terminal ``_select`` — heap holds raw element, projection runs ≤K times at return.
    * - ``from_decs_template(...).min_by(K)`` / ``.max_by(K)``
      - ``plan_decs_unroll`` → ``emit_decs_min_max_by``
      - Streaming-min/max with key.
@@ -177,13 +192,49 @@ identical — only the source iteration changes.
    * - ``from_decs_template(...).reverse().take(N).to_array()``
      - ``plan_decs_reverse``
      - Whole-archetype skip + partial-archetype skip-counter + early-exit.
+   * - ``from_decs_template(...).reverse().take(N)._select(F).to_array()`` / ``.reverse()._select(F).first()``
+     - ``plan_decs_reverse`` (terminal ``_select``)
+     - Decs mirror of ``plan_reverse``'s terminal ``_select``. Skip-into-tail fast path is gated off when ``_select`` is present.
    * - ``from_decs_template(...)._group_by(K)._select(reduce).to_array()``
      - ``plan_decs_group_by`` → ``plan_group_by_core``
      - Shared bucket-reducer with the array path; differs only in the per-element source.
+   * - ``from_decs_template(...)._group_by(K)._select(reduce)._where(P).to_array()`` / ``.count()``
+     - ``plan_decs_group_by`` → ``plan_group_by_core`` (trailing ``where`` as HAVING)
+     - Decs mirror of the array-side post-aggregate HAVING. Same predicate-on-output-tuple semantics.
    * - ``from_decs_template(...)._take_while(P).<...>`` / ``._skip_while(P).<...>``
      - ``plan_decs_unroll`` (predicate-driven ranges)
      - Hoists ``skippingName`` state across archetypes.
 
+Decs-decs equi-join
+-------------------
+
+``plan_decs_join`` is the hashed equi-join splice over two
+``from_decs_template`` sources. It collects the right side into a
+``table<KEY; array<TUPB>>`` in one ``for_each_archetype`` pass, then
+walks the left side and probes via ``table.get``. The key must be a
+primitive (``int*`` / ``uint*`` / ``float`` / ``double`` / ``bool`` /
+``string``); tuple keys cascade to the standard ``join_impl``.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 35 25 40
+
+   * - Chain shape
+     - Splice arm
+     - Notes
+   * - ``from_decs_template(A) |> _join(from_decs_template(B), ka, kb, result) |> count()``
+     - ``plan_decs_join``
+     - Hash-fill + probe; ``count`` bumped by bucket length per hit. No per-pair invoke.
+   * - ``from_decs_template(A) |> _join(...) |> to_array()``
+     - ``plan_decs_join``
+     - Hash-fill + probe; ``result`` lambda inlined at the push site (no per-pair invoke into ``join_impl``).
+   * - ``from_decs_template(A) |> _join(...) |> _select(F) |> to_array()``
+     - ``plan_decs_join`` (terminal ``_select``)
+     - Single bind of the join result per matched pair, then projection.
+   * - ``from_decs_template(A) |> _join(...) |> _where(P) |> count() / to_array()``
+     - ``plan_decs_join`` (trailing ``_where``)
+     - Bind join result, evaluate predicate, gate ``count++`` / ``push_clone``. Composes with the trailing ``_select`` form (filter then project, single bind per pair).
+
 Zip patterns
 ============
 
@@ -219,9 +270,13 @@ Common cases that fall back:
 - **Mixed-source operators** like ``union(a, b)``, ``except(a, b)``,
   ``intersect(a, b)``, ``concat(a, b)`` after the first source has
   been transformed (e.g. ``each(a)._select(F).union(b)``).
-- **Join terminators**: ``_join`` / ``_left_join`` / ``_right_join`` /
-  ``_full_outer_join`` / ``_cross_join``. The join itself does not yet
-  splice; downstream ``.count()`` / ``.sum()`` chains fall back.
+- **Joins other than decs-decs equi-join**: ``_left_join`` /
+  ``_right_join`` / ``_full_outer_join`` / ``_cross_join`` don't splice;
+  array-source ``_join`` also falls back. Only the decs-decs primitive-key
+  ``_join`` shape catalogued above splices (via ``plan_decs_join``);
+  tuple keys, non-primitive keys, mixed array/decs sources, or chain ops
+  beyond a single trailing ``_where`` / ``_select`` all cascade to
+  ``join_impl``.
 - **Aggregations on lazy groupings**: ``_group_by_lazy(K)._select(F)``
   with a non-bucket-reducing ``_select``.
 - **Materialization-only chains** that the standard linq surface
diff --git a/tests/linq/test_linq_fold_terminal_select.das b/tests/linq/test_linq_fold_terminal_select.das
new file mode 100644
index 000000000..0b735732b
--- /dev/null
+++ b/tests/linq/test_linq_fold_terminal_select.das
@@ -0,0 +1,195 @@
+options gen2
+
+require math
+require strings
+require daslib/linq
+require daslib/linq_boost
+require daslib/linq_fold
+require daslib/decs
+require daslib/decs_boost
+require dastest/testing_boost public
+
+struct Sound {
+    id   : int
+    x    : float
+    rank : int
+}
+
+[decs_template(prefix = "ds_")]
+struct DecsSound {
+    id : int
+    x : float
+}
+
+[decs_template(prefix = "dc_")]
+struct DecsCar {
+    id : int
+    dealer_id : int
+    name : string
+}
+
+[decs_template(prefix = "dd_")]
+struct DecsDealer {
+    id : int
+    name : string
+}
+
+def make_sounds() : array<Sound> {
+    return <- [
+        Sound(id = 1, x = 3.0, rank = 1),
+        Sound(id = 2, x = 1.0, rank = 2),
+        Sound(id = 3, x = 4.0, rank = 3),
+        Sound(id = 4, x = 1.0, rank = 4),
+        Sound(id = 5, x = 5.0, rank = 5)
+    ]
+}
+
+[test]
+def test_order_take_select_array(t : T?) {
+    t |> run("plan_order_family: take + terminal _select") @(tt : T?) {
+        let sounds <- make_sounds()
+        // Closest 3 by |x|; return their ids.
+        unsafe {
+            let ids <- _fold(each(sounds)._order_by(abs(_.x)).take(3)._select(_.id).to_array())
+            // Sounds with smallest |x|: id 2 (1.0), id 4 (1.0), id 1 (3.0).
+            tt |> equal(length(ids), 3)
+            tt |> equal(true, ids[0] == 2 || ids[0] == 4)
+            tt |> equal(true, ids[2] == 1)
+        }
+    }
+}
+
+[test]
+def test_where_order_take_select_array(t : T?) {
+    t |> run("plan_order_family: where + take + terminal _select") @(tt : T?) {
+        let sounds <- make_sounds()
+        unsafe {
+            let ids <- _fold(each(sounds)._where(_.rank >= 2)._order_by(_.x).take(2)._select(_.id).to_array())
+            // After filter rank>=2: ids 2,3,4,5 (x = 1,4,1,5). Top 2 by x: ids 2 (1.0), 4 (1.0).
+            tt |> equal(length(ids), 2)
+            for (id in ids) {
+                tt |> equal(true, id == 2 || id == 4)
+            }
+        }
+    }
+}
+
+[test]
+def test_where_order_bare_select_array(t : T?) {
+    t |> run("plan_order_family: where + bare order + terminal _select") @(tt : T?) {
+        let sounds <- make_sounds()
+        unsafe {
+            let ranks <- _fold(each(sounds)._where(_.id != 3)._order_by(_.x)._select(_.rank).to_array())
+            // After filter id!=3: ids 1,2,4,5 (x = 3,1,1,5). Sorted by x: (2 or 4, 2 or 4, 1, 5).
+            tt |> equal(length(ranks), 4)
+            tt |> equal(ranks[3], 5)
+        }
+    }
+}
+
+[test]
+def test_order_take_select_decs(t : T?) {
+    t |> run("plan_decs_order_family: take + terminal _select") @(tt : T?) {
+        restart()
+        create_entities(5) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsSound(id = i + 1, x = float((i % 2) * 5 + 1)))
+        }
+        unsafe {
+            let ids <- _fold(from_decs_template(type<DecsSound>)._order_by(_.x).take(2)._select(_.id).to_array())
+            tt |> equal(length(ids), 2)
+        }
+        restart()
+    }
+}
+
+[test]
+def test_reverse_take_select_array(t : T?) {
+    t |> run("plan_reverse: where + reverse + take + terminal _select") @(tt : T?) {
+        let sounds <- make_sounds()
+        unsafe {
+            let ids <- _fold(each(sounds)._where(_.rank > 0).reverse().take(2)._select(_.id).to_array())
+            // After filter (all 5), reverse: 5,4,3,2,1. take 2: 5,4.
+            tt |> equal(length(ids), 2)
+            tt |> equal(ids[0], 5)
+            tt |> equal(ids[1], 4)
+        }
+    }
+}
+
+[test]
+def test_reverse_select_first_array(t : T?) {
+    t |> run("plan_reverse: where + reverse + _select + first") @(tt : T?) {
+        let sounds <- make_sounds()
+        unsafe {
+            let id = _fold(each(sounds)._where(_.rank > 0).reverse()._select(_.id).first())
+            // Reverse of ids 1..5 is 5,4,3,2,1; first = 5.
+            tt |> equal(id, 5)
+        }
+    }
+}
+
+[test]
+def test_reverse_take_select_decs(t : T?) {
+    t |> run("plan_decs_reverse: reverse + take + terminal _select") @(tt : T?) {
+        restart()
+        create_entities(4) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsSound(id = i + 1, x = float(i)))
+        }
+        unsafe {
+            let ids <- _fold(from_decs_template(type<DecsSound>).reverse().take(2)._select(_.id).to_array())
+            tt |> equal(length(ids), 2)
+        }
+        restart()
+    }
+}
+
+[test]
+def test_join_select_to_array(t : T?) {
+    t |> run("plan_decs_join: join + terminal _select") @(tt : T?) {
+        restart()
+        create_entities(3) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsCar(id = i + 1, dealer_id = i % 2 + 1, name = "Car{i}"))
+        }
+        create_entities(2) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsDealer(id = i + 1, name = "Dealer{i}"))
+        }
+        unsafe {
+            let names <- _fold(from_decs_template(type<DecsCar>) |> _join(from_decs_template(type<DecsDealer>),
+                                                                          $(l, r) => l.dealer_id == r.id,
+                                                                          $(l, r) => (CarName = l.name, DealerName = r.name))
+                                                                 |> _select(_.CarName)
+                                                                 |> to_array())
+            // 3 cars × 1 matching dealer each = 3 results.
+            tt |> equal(length(names), 3)
+        }
+        restart()
+    }
+}
+
+[test]
+def test_zip_3arg_sum(t : T?) {
+    t |> run("plan_zip: 3-arg zip + sum") @(tt : T?) {
+        let a <- [1, 2, 3, 4]
+        let b <- [10, 20, 30, 40]
+        unsafe {
+            let total = _fold(each(a) |> zip(each(b), $(x, y : int) => x * y) |> sum())
+            // 1*10 + 2*20 + 3*30 + 4*40 = 10 + 40 + 90 + 160 = 300.
+            tt |> equal(total, 300)
+        }
+    }
+}
+
+[test]
+def test_zip_3arg_to_array(t : T?) {
+    t |> run("plan_zip: 3-arg zip + to_array") @(tt : T?) {
+        let a <- [1, 2, 3]
+        let b <- [10, 20, 30]
+        unsafe {
+            let r <- _fold(each(a) |> zip(each(b), $(x, y : int) => x + y) |> to_array())
+            tt |> equal(length(r), 3)
+            tt |> equal(r[0], 11)
+            tt |> equal(r[1], 22)
+            tt |> equal(r[2], 33)
+        }
+    }
+}
diff --git a/tests/linq/test_linq_fold_theme2_trailing_where.das b/tests/linq/test_linq_fold_theme2_trailing_where.das
new file mode 100644
index 000000000..41c3ddfd7
--- /dev/null
+++ b/tests/linq/test_linq_fold_theme2_trailing_where.das
@@ -0,0 +1,257 @@
+options gen2
+
+require math
+require strings
+require daslib/linq
+require daslib/linq_boost
+require daslib/linq_fold
+require daslib/decs
+require daslib/decs_boost
+require dastest/testing_boost public
+
+// Theme 2 (audit `benchmarks/sql/linq_fold_chain_audit.md`): trailing `_where` extensions
+// across plan_decs_join (8a, C6), plan_group_by_core (4a, 4e), plan_loop_or_count (5c).
+
+struct Item {
+    category : string
+    price    : int
+}
+
+struct ActItem {
+    active : bool
+    score  : int
+}
+
+[decs_template(prefix = "dc_")]
+struct DecsCar {
+    id : int
+    dealer_id : int
+    name : string
+}
+
+[decs_template(prefix = "dd_")]
+struct DecsDealer {
+    id : int
+    name : string
+}
+
+[decs_template(prefix = "di_")]
+struct DecsItem {
+    category : string
+    price : int
+}
+
+def make_items() : array<Item> {
+    return <- [
+        Item(category = "A", price = 100),
+        Item(category = "A", price = 300),
+        Item(category = "B", price = 200),
+        Item(category = "B", price = 800),
+        Item(category = "B", price = 200),
+        Item(category = "C", price = 50)
+    ]
+}
+
+def make_act_items() : array<ActItem> {
+    return <- [
+        ActItem(active = true,  score = 10),
+        ActItem(active = false, score = 20),
+        ActItem(active = true,  score = 30),
+        ActItem(active = false, score = 40),
+        ActItem(active = true,  score = 50),
+        ActItem(active = false, score = 60),
+        ActItem(active = true,  score = 70),
+        ActItem(active = true,  score = 80)
+    ]
+}
+
+// ── plan_decs_join trailing _where (closes 8a, C6) ─────────────────────────────
+
+[test]
+def test_join_where_count(t : T?) {
+    t |> run("plan_decs_join: trailing _where + count (probe 8a)") @(tt : T?) {
+        restart()
+        create_entities(4) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsCar(id = i + 1, dealer_id = i % 2 + 1, name = "Car{i}"))
+        }
+        create_entities(2) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsDealer(id = i + 1, name = "Dealer{i}"))
+        }
+        unsafe {
+            let filtered = _fold(from_decs_template(type<DecsCar>) |> _join(from_decs_template(type<DecsDealer>),
+                                                                            $(l, r) => l.dealer_id == r.id,
+                                                                            $(l, r) => (CarName = l.name, DealerName = r.name))
+                                                                   |> _where(_.DealerName == "Dealer0")
+                                                                   |> count())
+            // 4 cars × dealer_id 1,2,1,2; matching Dealer0 (id=1) → cars i=0,2 → 2 results.
+            tt |> equal(filtered, 2)
+        }
+        restart()
+    }
+}
+
+[test]
+def test_join_where_to_array(t : T?) {
+    t |> run("plan_decs_join: trailing _where + to_array") @(tt : T?) {
+        restart()
+        create_entities(4) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsCar(id = i + 1, dealer_id = i % 2 + 1, name = "Car{i}"))
+        }
+        create_entities(2) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsDealer(id = i + 1, name = "Dealer{i}"))
+        }
+        unsafe {
+            let rows <- _fold(from_decs_template(type<DecsCar>) |> _join(from_decs_template(type<DecsDealer>),
+                                                                         $(l, r) => l.dealer_id == r.id,
+                                                                         $(l, r) => (CarName = l.name, DealerName = r.name))
+                                                                |> _where(_.DealerName == "Dealer1")
+                                                                |> to_array())
+            tt |> equal(length(rows), 2)
+        }
+        restart()
+    }
+}
+
+[test]
+def test_join_where_select_to_array(t : T?) {
+    t |> run("plan_decs_join: trailing _where + _select + to_array (combination)") @(tt : T?) {
+        restart()
+        create_entities(4) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsCar(id = i + 1, dealer_id = i % 2 + 1, name = "Car{i}"))
+        }
+        create_entities(2) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            apply_decs_template(cmp, DecsDealer(id = i + 1, name = "Dealer{i}"))
+        }
+        unsafe {
+            let names <- _fold(from_decs_template(type<DecsCar>) |> _join(from_decs_template(type<DecsDealer>),
+                                                                          $(l, r) => l.dealer_id == r.id,
+                                                                          $(l, r) => (CarName = l.name, DealerName = r.name))
+                                                                 |> _where(_.DealerName == "Dealer0")
+                                                                 |> _select(_.CarName)
+                                                                 |> to_array())
+            tt |> equal(length(names), 2)
+        }
+        restart()
+    }
+}
+
+// ── plan_group_by_core trailing _where as HAVING (closes 4a, 4e) ───────────────
+
+[test]
+def test_groupby_having_where_to_array(t : T?) {
+    t |> run("plan_group_by: trailing _where (HAVING) on post-aggregate tuple (probe 4a)") @(tt : T?) {
+        let items <- make_items()
+        unsafe {
+            let rows <- _fold(each(items)
+                ._group_by(_.category)
+                ._select((Cat = _._0, Total = _._1 |> select(@(i : Item) => i.price) |> sum))
+                ._where(_.Total > 500) |> to_array())
+            // Category totals: A=400, B=1200, C=50. Filter Total > 500 → only B.
+            tt |> equal(length(rows), 1)
+        }
+    }
+}
+
+[test]
+def test_groupby_having_where_count(t : T?) {
+    t |> run("plan_group_by: trailing _where + count") @(tt : T?) {
+        let items <- make_items()
+        unsafe {
+            let n = _fold(each(items)
+                ._group_by(_.category)
+                ._select((Cat = _._0, Total = _._1 |> select(@(i : Item) => i.price) |> sum))
+                ._where(_.Total > 100)
+                |> count())
+            // Buckets with Total > 100: A (400), B (1200) → 2.
+            tt |> equal(n, 2)
+        }
+    }
+}
+
+[test]
+def test_groupby_having_where_decs(t : T?) {
+    t |> run("plan_decs_group_by: trailing _where (probe 4e)") @(tt : T?) {
+        restart()
+        create_entities(6) $(eid : EntityId; i : int; var cmp : ComponentMap) {
+            let cats = ["A", "A", "B", "B", "B", "C"]
+            let prices = [100, 300, 200, 800, 200, 50]
+            apply_decs_template(cmp, DecsItem(category = cats[i], price = prices[i]))
+        }
+        unsafe {
+            let rows <- _fold(from_decs_template(type<DecsItem>)
+                ._group_by(_.category)
+                ._select((Cat = _._0, Total = _._1 |> select(@(i : tuple<category : string; price : int>) => i.price) |> sum))
+                ._where(_.Total > 500)
+                |> to_array())
+            // Same data as array: A=400, B=1200, C=50. Filter keeps B.
+            tt |> equal(length(rows), 1)
+        }
+        restart()
+    }
+}
+
+// ── plan_loop_or_count take.where (closes 5c, all 4 lanes) ─────────────────────
+
+[test]
+def test_take_where_count(t : T?) {
+    t |> run("plan_loop_or_count: take(N)._where(p).count() (counter lane, probe 5c)") @(tt : T?) {
+        let items <- make_act_items()
+        unsafe {
+            // First 5: active = [T,F,T,F,T]. Count active = 3.
+            // Semantic distinction from _where(p).take(5): would be 5 (5 active in items).
+            let r1 = _fold(each(items).take(5)._where(_.active).count())
+            tt |> equal(r1, 3)
+            let r2 = _fold(each(items)._where(_.active).take(5).count())
+            tt |> equal(r2, 5)
+        }
+    }
+}
+
+[test]
+def test_take_where_sum(t : T?) {
+    t |> run("plan_loop_or_count: take(N)._where(p).sum() (accumulator lane)") @(tt : T?) {
+        let items <- make_act_items()
+        unsafe {
+            // _select projects scores. take(5) → [10,20,30,40,50]. _where(>0) keeps all. sum=150.
+            let s = _fold(each(items)._select(_.score).take(5)._where(_ > 0).sum())
+            tt |> equal(s, 150)
+        }
+    }
+}
+
+[test]
+def test_take_where_first(t : T?) {
+    t |> run("plan_loop_or_count: take(N)._where(p).first_or_default() (early-exit lane)") @(tt : T?) {
+        let items <- make_act_items()
+        unsafe {
+            // take(3) → first 3 (T,F,T). First active = 10.
+            let f = _fold(each(items).take(3)._where(_.active).first_or_default(ActItem(active = false, score = -1)))
+            tt |> equal(f.score, 10)
+        }
+    }
+}
+
+[test]
+def test_take_where_to_array(t : T?) {
+    t |> run("plan_loop_or_count: take(N)._where(p).to_array() (array lane)") @(tt : T?) {
+        let items <- make_act_items()
+        unsafe {
+            // take(5) → first 5 (T,F,T,F,T). where(active) → 3 items, scores 10,30,50.
+            let arr <- _fold(each(items).take(5)._where(_.active).to_array())
+            tt |> equal(length(arr), 3)
+            tt |> equal(arr[0].score, 10)
+            tt |> equal(arr[2].score, 50)
+        }
+    }
+}
+
+[test]
+def test_take_zero_where(t : T?) {
+    t |> run("plan_loop_or_count: take(0)._where edge case") @(tt : T?) {
+        let items <- make_act_items()
+        unsafe {
+            let r = _fold(each(items).take(0)._where(_.active).count())
+            tt |> equal(r, 0)
+        }
+    }
+}