Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5b10e64
linq_fold: Phase 2A loop planner (where|select array + counter lanes)
borisbat May 16, 2026
41d8ce1
linq_fold: peel each(<array>) + reserve + workhorse push — to_array_f…
borisbat May 16, 2026
d4586a1
linq_fold: fuse chained workhorse selects + drop emplace from emission
borisbat May 16, 2026
6226a1e
linq_fold: counter lane evaluates projection per iteration
borisbat May 16, 2026
6cda3c7
linq_fold: update select_count benchmark header comment
borisbat May 16, 2026
52a2d40
linq_fold: extract peel helper + tighten length check
borisbat May 16, 2026
3f0f890
tests/fio: regression coverage for ref_time_ticks() ns normalization
borisbat May 16, 2026
d805df7
Merge pull request #2689 from GaijinEntertainment/bbatkin/linq-fold-p…
borisbat May 16, 2026
a39e155
Merge pull request #2690 from GaijinEntertainment/bbatkin/perf-time-r…
borisbat May 16, 2026
c6a9d79
macro_boost: add has_sideeffects + counter-lane elision
borisbat May 16, 2026
f77a072
mouse-data/docs: 16 new + 1 updated card from linq_fold + Phase 2A se…
borisbat May 16, 2026
371c6d7
mouse-data/docs: 5 cards from dasImgui PR #38 (CI matrix resurrection)
borisbat May 16, 2026
6aa2110
has_sideeffects: blacklist mutation ops, trust func flags over op all…
borisbat May 16, 2026
0d842c9
Potential fix for pull request finding
borisbat May 16, 2026
c24e4b5
docs(mouse-data): update ref_time_ticks Windows row and narrative for…
Copilot May 16, 2026
b99d5bf
Merge pull request #2691 from GaijinEntertainment/bbatkin/has-sideeff…
borisbat May 17, 2026
6711570
daslang-live: add -project_root flag (mirror daslang.exe)
borisbat May 17, 2026
78405f0
site/blog: roadmap update + are-we-there-yet post
borisbat May 17, 2026
9b5b917
daslang-live: accept -project-root (dashed) alias for symmetry
borisbat May 17, 2026
380455c
Potential fix for pull request finding
borisbat May 17, 2026
cb12ed3
Merge pull request #2692 from GaijinEntertainment/bbatkin/mouse-cards…
borisbat May 17, 2026
2756dd7
Merge pull request #2693 from GaijinEntertainment/bbatkin/daslang-liv…
borisbat May 17, 2026
c8ad450
examples/graphics: modernize Fourier viz to dasImgui boost-v2 + harness
borisbat May 17, 2026
dc5e755
Merge pull request #2694 from GaijinEntertainment/bbatkin/blog-roadma…
borisbat May 17, 2026
39b1a45
Merge pull request #2695 from GaijinEntertainment/bbatkin/examples-gr…
borisbat May 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 33 additions & 3 deletions benchmarks/sql/LINQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@ See `~/.claude/plans/keen-hopping-balloon.md` for the long-form plan.
|---|---|---|
| 0 | Rename `_fold` → `_old_fold` in linq_boost; extract `_fold` and `_old_fold` into new `daslib/linq_fold.das` module; `linq_boost` `require linq_fold public` for re-export | ✅ done |
| 1 | Benchmark suite: 24 files under `benchmarks/sql/`, each 4-way (m1 `_sql` / m3 plain linq / m3f_old `_old_fold` / m3f `_fold`) at 100K rows; baseline numbers captured | ✅ done |
| 2 | Splice planner + initial operators (`count`, `sum`, `to_array`, `where` with literal-lambda inlining); pattern tests for "spliced" vs "fell back" | ⏳ next |
| 3+ | Per-operator splice PRs: `select`, terminal aggregates with early-exit (`first`, `any`, `all`, `min`, `max`, `average`), `take`/`skip`/chained `where`, then buffer-required ops (`distinct`, `sort`, `groupby`, `zip`, `join`) | ⏳ |
| 2A | Loop planner — `_fold` emits explicit for-loops for `[where_*][select?]` (array lane) and `[where_*][select?] |> count` (counter lane); anything else falls through unfolded. No comprehensions, no dispatch back to `_old_fold`. | ✅ done |
| 2B | Aggregate accumulators: `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`. Also `take`/`skip` in counter/array lane and chained-`_select|_select` fusion (needs `ExprRef2Value`-aware projection substitution) | ⏳ next |
| 3+ | Buffer-required operators: `distinct`, `sort`, `reverse`, `groupby`, `zip`, `join`. Once we go array, we stay array | ⏳ |
| 4 | Final coverage pass + docs; full 4-way comparison table refresh; parity-test sweep | ⏳ |

## Baselines (100K rows, INTERP mode)
Expand Down Expand Up @@ -69,7 +70,36 @@ Notation: `—` means the variant is not applicable for this benchmark (operator

- **m1 vs m3** shows the SQLite-vs-in-memory-LINQ cost gap. SQL wins on `indexed_lookup` (b-tree) and on sorted-take patterns (engine partial-sort + LIMIT). Arrays win on raw aggregates where the SQL overhead exceeds the in-memory work.
- **m3 vs m3f_old** shows what the *current* `_fold` macro already achieves. Big wins on the patterns it explicitly recognizes (`where+count` 6×, `where+select+to_array` ~4×, `chained_where+count` 2.6×). Negligible difference where it falls through to the default emitter.
- **m3f vs m3f_old** is the target of Phase 2+. Currently identical by construction. Each PR in the splice series adds a splice path for one operator family and updates this table with the new ratio.
- **m3f vs m3f_old** is the target of Phase 2+. Each PR in the splice series adds a path for one operator family and updates this table with the new ratio.

## Phase 2A — Loop planner (2026-05-16)

`_fold` now emits explicit for-loops for two narrow shape families instead of comprehensions. Anything outside scope falls through unfolded to raw linq (no dispatch to `_old_fold` or `fold_linq_default`).

**In scope:** `[where_*][select*]` (array lane) and `[where_*][select*] |> count` (counter lane). Chained `_where|_where|...` fuses via `&&`. Chained `_select|_select|...` fuses via intermediate `var v_N = projection_N` let-bindings — each next lambda's `_` is renamed straight to the prior binding's name, no expression substitution needed (which would have hit the ExprRef2Value-wrapper problem documented in `skills/das_macros.md`). Chained selects currently require all projections to be workhorse; non-workhorse intermediates would need `:=` (clone) since `<-` (move) can corrupt source for lvalue projections — deferred to Phase 2B.

**Out of scope (falls through):** `_select|_where`, `sum`, `min`, `max`, `average`, `first`, `any`, `all`, `long_count`, `_order`, `_distinct`, `_take`, `_skip`, `_zip`, `_reverse`, etc.

### Phase 2A deltas (100K rows, INTERP)

| Benchmark | Shape | m3f_old | m3f (Phase 2A) | Delta |
|---|---|---:|---:|---|
| count_aggregate | `where → count` | 5 | 4 | parity-ish (1ns improvement from `each(<array>)` peel) |
| chained_where | `where → where → count` | 17 | 6 | **2.8× faster** (fuses chained wheres into single `&&` predicate; small gain from peel + const-ref param) |
| select_count | `select → count` | 15 | 0 | **∞ faster** — when the projection is pure (`has_sideeffects == false`) and the source has length, the counter lane shortcuts to `length(src)` and elides the loop entirely. See [macro_boost::has_sideeffects](../../daslib/macro_boost.das) and `linq_fold.das:plan_loop_or_count` |
| to_array_filter | `where → select → to_array` | 11 | 10 | parity (after `each(<array>)` peel + reserve + workhorse `push`) |

Shapes outside Phase 2A scope now compile to plain linq (`m3f ≈ m3`). This is an intentional regression vs the historical `_old_fold` numbers — Boris's call ("we let it fall through unfolded, and we see performance issues. im ok being slower until we fix") as the forcing function for Phase 2B+. The previous "m3f = m3f_old (identical by construction)" baseline assumed `_fold` would dispatch to `_old_fold` on the unmatched path; Phase 2A drops that dispatch.

### Three small things that closed the to_array_filter gap

The first cut was 18% slower than the comprehension. Three independent fixes brought it to parity:

1. **Workhorse decision at macro time, not runtime.** The first emission used `static_if (typeinfo is_workhorse(projection))` inside the qmacro so the compiler picked copy- vs move-init. The projection's `_type` is already resolved when the planner runs, so the macro now reads `projection._type.isWorkhorseType` directly and emits exactly one branch — less AST, no static_if to fold away.
2. **Pre-reserve when the source has a known length.** ExprArrayComprehension lowering reserves the result array to the source's length to avoid growth reallocs; the explicit loop has to do the same explicitly. The planner emits `acc |> reserve(length(src))` when the source isn't an iterator.
3. **Peel `each(<array>)` at macro time.** The benchmark source `each(arr)` reports as `iterator<T>`, so the reserve from (2) wouldn't fire. The planner now detects `each(<expr>)` where the inner expression has length and unwraps it — the emitted loop iterates the array directly. `for (it in arr)` and `for (it in each(arr))` yield the same element refs; the wrapper iterator is incidental in fold context.

A fourth simplification dropped `emplace` from the emission entirely. emplace **moves** out of its argument and can corrupt the source when the projection returns a ref into it (e.g. `_._field`). The safe pattern is `push` for workhorse (cheap copy) and `push_clone` for non-workhorse (deep clone). No intermediate `var v = projection; emplace(v)` is needed in either case — the planner pushes the projection expression directly.

## Operator-coverage checklist (parity tests)

Expand Down
75 changes: 75 additions & 0 deletions benchmarks/sql/select_count.das
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
options gen2
options persistent_heap

require _common public

// _select |> count — projection followed by counter. The final count value doesn't depend
// on the projection, but plain LINQ `count(select(src, f))` still evaluates `f` per element
// so user-visible side effects fire. Phase-2A `_fold` matches that: the counter lane binds
// the final projection to a discardable local per matched element (side effects preserved)
// and skips array materialization. The optimizer DCEs the binding for pure projections
// like `_.price * 2`, leaving a bare-loop counter for the common case. `_old_fold` lacks a
// [select, count] pattern in g_foldSeq so it falls to the default nested-pass form
// (pass_0 = select(...); count(pass_0)) — materializing the same way m3 does.

def run_m1(b : B?; n : int) {
with_sqlite(":memory:") $(db) {
fixture_db(db, n)
b |> run("m1_sql/{n}", n) {
let c = _sql(db |> select_from(type<Car>) |> count())
if (c == 0) {
b->failNow()
}
}
}
}

def run_m3(b : B?; n : int) {
let arr <- fixture_array(n)
b |> run("m3_array/{n}", n) {
let c = arr |> _select(_.price * 2) |> count()
if (c == 0) {
b->failNow()
}
}
}

def run_m3f_old(b : B?; n : int) {
let arr <- fixture_array(n)
b |> run("m3f_old_array_fold/{n}", n) {
let c = _old_fold(each(arr)._select(_.price * 2).count())
if (c == 0) {
b->failNow()
}
}
}

def run_m3f(b : B?; n : int) {
let arr <- fixture_array(n)
b |> run("m3f_array_fold/{n}", n) {
let c = _fold(each(arr)._select(_.price * 2).count())
if (c == 0) {
b->failNow()
}
}
}

[benchmark]
def select_count_m1(b : B?) {
run_m1(b, 100000)
}

[benchmark]
def select_count_m3(b : B?) {
run_m3(b, 100000)
}

[benchmark]
def select_count_m3f_old(b : B?) {
run_m3f_old(b, 100000)
}

[benchmark]
def select_count_m3f(b : B?) {
run_m3f(b, 100000)
}
Loading
Loading