Skip to content

Commit 1e079ed

Browse files
authored
Merge pull request GaijinEntertainment#2812 from GaijinEntertainment/bbatkin/from-decs-lambda-arg-splice
tests + docs: from_decs lambda-arg splice validation + full m4 matrix snapshot (INTERP + JIT)
2 parents 77ec239 + 7d39f19 commit 1e079ed

4 files changed

Lines changed: 399 additions & 6 deletions

File tree

benchmarks/sql/M4_DECS_EXPANSION.md

Lines changed: 187 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -362,13 +362,197 @@ Two of the three Cat-C benchmarks that were skipped during the original m4 expan
362362

363363
| benchmark | shape | m1 sql | m3 | m3f (array splice) | m4 (was) | m4 (Wave 4b, now) | m4 vs best other lane |
364364
|---|---|---:|---:|---:|---:|---:|---:|
365-
| indexed_lookup | `_where(_.id == K).count()` → eid lookup | 1461 | 2,076,904 | 197,117 || **227** | 6.4× faster than m1 sql |
365+
| indexed_lookup | `_where(_.id == K).count()` → eid lookup | 1461 | 2,076,904 | 197,117 || **489** | 3× faster than m1 sql |
366366
| zip_dot_product | `zip(xs,ys).select(_._0 * _._1).sum()` → intra-archetype || 53 | 7 || **10** | within 1.4× of m3f |
367367

368-
**indexed_lookup**: Uses the existing `query(eid, $(...))` call macro ([decs_boost.das:315](../../daslib/decs_boost.das#L315)), which wraps `for_eid_archetype` ([decs.das:666](../../daslib/decs.das#L666)). entityLookup is a flat hash on EntityId.id — single hash + generation check + archetype dispatch. New fixture helper `fixture_decs_capture_mid(n) : EntityId` ([_common.das](_common.das)) captures the n/2-th entity's eid mid-`create_entities` callback so the bench has a real eid to look up. The 227 ns/op figure includes the macro-time-generated request/erq lookup; the lookup itself is essentially the hash plus one block invocation.
368+
**indexed_lookup**: Uses the existing `query(eid, $(...))` call macro ([decs_boost.das:315](../../daslib/decs_boost.das#L315)), which wraps `for_eid_archetype` ([decs.das:666](../../daslib/decs.das#L666)). entityLookup is a flat hash on EntityId.id — single hash + generation check + archetype dispatch. New fixture helper `fixture_decs_capture_mid(n) : EntityId` ([_common.das](_common.das)) captures the n/2-th entity's eid mid-`create_entities` callback so the bench has a real eid to look up. The 489 ns/op figure includes the macro-time-generated request/erq lookup, the hash, the archetype dispatch, AND a real `car_id` field read inside the block (drift-protection assertion against the expected id). An earlier draft measured 227 ns/op but the block was a no-op `found = 1`, so `car_id`'s `get_ro` fetch was being eliminated — that was not a fair lookup cost.
369369

370370
**zip_dot_product**: No new surface at all. `from_decs_template(type<DecsCar>)._select(_.price * _.year).sum()` is the natural intra-archetype zip — multi-iter for over the archetype's two int columns. Wave 4 component pruning keeps the price + year `get_ro` slots and drops the other 4 components, so per-element cost is two int reads plus the multiply, matching the m3f two-column zip. The 3 ns gap to m3f covers `for_each_archetype` dispatch overhead.
371371

372372
**Net coverage:** 47 → 49 m4 lanes (`indexed_lookup` + `zip_dot_product`). Only `join_count` remains skipped — it's a cross-archetype join, which requires real design (eid linkage between archetypes) and is appropriately deferred past Wave 4b.
373373

374-
**No daslib changes.** Pure benchmark additions + one helper in `_common.das`. The Wave 4b PR is documentation that the existing decs surface already covers these chain shapes — the team's pre-Wave-4b suspicion that we'd need new helpers turned out to be incorrect.
374+
**No new decs surface needed for these lanes.** Pure benchmark additions + one helper in `_common.das`. The Wave 4b PR is documentation that the existing decs surface already covers these chain shapes — the team's pre-Wave-4b suspicion that we'd need new helpers turned out to be incorrect. (The same PR also bundled 6 small daslib/linq_fold cleanups in its commit 1 — those were unrelated PR #2806 review follow-ups, not Wave 4b code.)
375+
376+
## Update — from_decs lambda-arg form splice validation (2026-05-22)
377+
378+
While planning a follow-up to extend the splice to the lambda-arg form (`from_decs($(args){})`), an empirical spike showed the splice was already firing on it. Both `from_decs` ([decs_boost.das:632](../../daslib/decs_boost.das#L632)) and `from_decs_template` ([decs_boost.das:706](../../daslib/decs_boost.das#L706)) expand to byte-identical invoke-block shapes:
379+
380+
```
381+
invoke($() {
382+
var res : array<tuple<...>>
383+
query(...)
384+
return res.to_sequence()
385+
})
386+
```
387+
388+
`extract_decs_bridge` ([linq_fold.das:3015](../../daslib/linq_fold.das#L3015)) pattern-matches the post-expansion AST, not the call macro that produced it — so it transparently recognizes both forms with zero additional code. The eager-bridge prose in earlier planning docs about this form needing splice work was incorrect.
389+
390+
Added 11 tests to `tests/linq/test_linq_from_decs.das` covering count / where+count / select+sum / take+count / distinct+count / to_array / group_by+count for parity, plus splice-shape gates confirming the AST matches the type-arg form, plus Wave 4 component pruning gates confirming the walker drops unused multi-component `get_ro` slots on this form. All 169 tests pass.
391+
392+
**Practical effect for users:** the lambda-arg form is the right choice when the user wants custom component prefixes that don't match a `[decs_template]` struct (or when they want to query components across structs not annotated with one). It now gets the full splice + pruning treatment automatically.
393+
394+
## Full m4 matrix — post-Wave-4b snapshot (2026-05-22)
395+
396+
Re-ran the entire benchmarks/sql suite (199 benchmarks across 51 families, all on INTERP, 100K rows). Numbers in ns/op. Lower is better. `` means no lane defined for that family.
397+
398+
### Aggregates — m4 within Wave 4 multi-component get_ro floor of m3f
399+
400+
| benchmark | m1 sql | m3 | m3f | m4 | m4/m3f |
401+
|---|---:|---:|---:|---:|---:|
402+
| sum_aggregate | 29 | 29 | 2 | 7 | 3.5× |
403+
| sum_where | 32 | 43 | 4 | 7 | 1.8× |
404+
| count_aggregate | 29 | 28 | 4 | 6 | 1.5× |
405+
| long_count_aggregate | 29 | 29 | 4 | 6 | 1.5× |
406+
| max_aggregate | 30 | 36 | 6 | 10 | 1.7× |
407+
| min_aggregate | 30 | 38 | 5 | 10 | 2.0× |
408+
| average_aggregate | 29 | 32 | 5 | 11 | 2.2× |
409+
| select_where_count | 32 | 57 | 5 | 9 | 1.8× |
410+
| select_where_sum | 37 | 73 | 8 | 10 | 1.3× |
411+
| select_where | 189 | 28 | 11 | 22 | 2.0× |
412+
| chained_where | 36 | 45 | 6 | 10 | 1.7× |
413+
| all_match | 27 | 20 | 3 | 6 | 2.0× |
414+
| contains_match | 0 | 29 | 2 | 4 | 2.0× |
415+
| any_match | 0 | 0 | 0 | 0 | matches |
416+
| take_while_match | 8 | 23 | 2 | 3 | 1.5× |
417+
| skip_while_match | 3 | 20 | 5 | 7 | 1.4× |
418+
| to_array_filter | 69 | 42 | 11 | 15 | 1.4× |
419+
420+
### Group-by — m4 close to m3f, m4 beats m1 sql 10/11
421+
422+
| benchmark | m1 sql | m3 | m3f | m4 | m4/m3f | m1/m4 |
423+
|---|---:|---:|---:|---:|---:|---:|
424+
| groupby_sum | 171 | 100 | 36 | 50 | 1.4× | **3.4× m4 win** |
425+
| groupby_count | 143 | 66 | 36 | 50 | 1.4× | **2.9× m4 win** |
426+
| groupby_average | 173 | 102 | 51 | 62 | 1.2× | **2.8× m4 win** |
427+
| groupby_max | 172 | 103 | 43 | 56 | 1.3× | **3.1× m4 win** |
428+
| groupby_min | 176 | 107 | 43 | 57 | 1.3× | **3.1× m4 win** |
429+
| groupby_multi_reducer | 191 | 138 | 55 | 63 | 1.1× | **3.0× m4 win** |
430+
| groupby_having_count | 141 | 73 | 36 | 50 | 1.4× | **2.8× m4 win** |
431+
| groupby_having_hidden_sum | 177 | 103 | 39 | 57 | 1.5× | **3.1× m4 win** |
432+
| groupby_where_count | 79 | 65 | 24 | 37 | 1.5× | **2.1× m4 win** |
433+
| groupby_where_sum | 87 | 80 | 24 | 37 | 1.5× | **2.4× m4 win** |
434+
| groupby_select_sum || 117 | 61 | 64 | 1.1× ||
435+
| groupby_first || 70 | 35 | 48 | 1.4× ||
436+
437+
### Order / sort / reverse — m4 within 2× of m3f
438+
439+
| benchmark | m1 sql | m3 | m3f | m4 | m4/m3f |
440+
|---|---:|---:|---:|---:|---:|
441+
| sort_take | 38 | 726 | 22 | 51 | 2.3× |
442+
| order_take_desc | 38 | 721 | 22 | 52 | 2.4× |
443+
| select_where_order_take | 36 | 355 | 21 | 34 | 1.6× |
444+
| sort_first | 37 | 721 | 41 | 71 | 1.7× |
445+
| bare_order_where | 270 | 351 | 120 | 130 | 1.1× |
446+
| reverse_take | 0 | 22 | 0 | 48 | (m3f DCE-fold; m4 honest) |
447+
448+
### Distinct — m4 close to m3f
449+
450+
| benchmark | m1 sql | m3 | m3f | m4 | m4/m3f |
451+
|---|---:|---:|---:|---:|---:|
452+
| distinct_count | 40 | 44 | 15 | 20 | 1.3× |
453+
| distinct_take | 0 | 30 | 0 | 0 | matches |
454+
455+
### Cat-C — the formerly-skipped lanes
456+
457+
| benchmark | m1 sql | m3 | m3f | m4 | note |
458+
|---|---:|---:|---:|---:|---|
459+
| indexed_lookup | 1449 | 2,014,895 | 219,259 | **481** | **3× m4 win over SQL b-tree** (Wave 4b) |
460+
| zip_dot_product || 52 | 7 | 10 | intra-archetype zip via Wave 4 pruning (Wave 4b) |
461+
| join_count || 116 | 121 || cross-archetype join — deferred past Wave 4b |
462+
463+
### Lanes where m4 currently lags m3f (follow-up candidates)
464+
465+
These are terminators where the decs splice doesn't fire today; m4 falls through to the eager-bridge `array<tuple>` path. Same chain shapes on the array side splice cleanly (m3f ≈ 0-6 ns). Candidates for a future "Wave 5 — terminal splice for decs" PR.
466+
467+
| benchmark | m3f | m4 | gap |
468+
|---|---:|---:|---:|
469+
| single_match | 2 | 80 | 40× |
470+
| last_match | 5 | 85 | 17× |
471+
| aggregate_match | 6 | 82 | 13× |
472+
| element_at_match | 0 | 34 | 34× |
473+
| first_or_default_match | 0 | 0 | matches (already on splice) |
474+
| first_match | 0 | 0 | matches (already on splice) |
475+
476+
**Headlines this snapshot:** m4 beats SQL on 10/11 group-by benchmarks. `indexed_lookup` m4 lane closes the Cat-C gap with a 3× win. m4 is within 1.1-2.4× of m3f on 39/44 lanes where both are defined; the 5 outliers (single/last/aggregate/element_at/reverse_take) all share the same root cause — terminator not yet on the splice for decs — and form a coherent follow-up scope.
477+
478+
### Same suite under JIT (2026-05-22)
479+
480+
Same 199 benchmarks, JIT mode (`-jit` flag, LLVM codegen). Per-bench DLL codegen + cache adds setup time but per-call cost drops to near-zero for tight loops.
481+
482+
#### Aggregates — JIT lifts m4 to m3f for most lanes
483+
484+
| benchmark | m1 sql | m3 | m3f | m4 |
485+
|---|---:|---:|---:|---:|
486+
| sum_aggregate | 30 | 7 | 0 | 0 |
487+
| sum_where | 32 | 13 | 0 | 0 |
488+
| count_aggregate | 29 | 9 | 0 | 0 |
489+
| long_count_aggregate | 29 | 9 | 0 | 0 |
490+
| max_aggregate | 31 | 8 | 0 | 0 |
491+
| min_aggregate | 31 | 8 | 0 | 0 |
492+
| average_aggregate | 30 | 7 | 0 | 3 |
493+
| select_where_count | 32 | 15 | 0 | 0 |
494+
| select_where_sum | 37 | 15 | 0 | 0 |
495+
| select_where | 106 | 9 | 4 | 4 |
496+
| chained_where | 36 | 13 | 0 | 0 |
497+
| all_match | 27 | 4 | 0 | 0 |
498+
| contains_match | 0 | 7 | 0 | 0 |
499+
| any_match | 0 | 0 | 0 | 0 |
500+
| take_while_match | 7 | 5 | 0 | 0 |
501+
| skip_while_match | 3 | 5 | 0 | 0 |
502+
| to_array_filter | 47 | 13 | 3 | 3 |
503+
504+
#### Group-by JIT — m4 beats m1 sql by 12-19×
505+
506+
| benchmark | m1 sql | m3 | m3f | m4 | m1/m4 |
507+
|---|---:|---:|---:|---:|---:|
508+
| groupby_sum | 171 | 26 | 8 | 10 | **17.1× m4 win** |
509+
| groupby_count | 141 | 16 | 8 | 10 | **14.1× m4 win** |
510+
| groupby_average | 170 | 25 | 9 | 11 | **15.5× m4 win** |
511+
| groupby_max | 172 | 25 | 8 | 10 | **17.2× m4 win** |
512+
| groupby_min | 174 | 26 | 8 | 10 | **17.4× m4 win** |
513+
| groupby_multi_reducer | 188 | 37 | 8 | 10 | **18.8× m4 win** |
514+
| groupby_having_count | 141 | 22 | 8 | 10 | **14.1× m4 win** |
515+
| groupby_having_hidden_sum | 174 | 30 | 8 | 10 | **17.4× m4 win** |
516+
| groupby_where_count | 71 | 19 | 4 | 6 | **11.8× m4 win** |
517+
| groupby_where_sum | 82 | 23 | 4 | 6 | **13.7× m4 win** |
518+
| groupby_select_sum || 36 | 13 | 14 ||
519+
| groupby_first || 20 | 7 | 9 ||
520+
521+
#### Order / sort / reverse JIT
522+
523+
| benchmark | m1 sql | m3 | m3f | m4 |
524+
|---|---:|---:|---:|---:|
525+
| sort_take | 38 | 218 | 9 | 19 |
526+
| order_take_desc | 38 | 214 | 9 | 18 |
527+
| select_where_order_take | 36 | 109 | 8 | 10 |
528+
| sort_first | 38 | 219 | 8 | 16 |
529+
| bare_order_where | 183 | 109 | 33 | 33 |
530+
| reverse_take | 0 | 7 | 0 | 9 |
531+
532+
#### Distinct JIT
533+
534+
| benchmark | m1 sql | m3 | m3f | m4 |
535+
|---|---:|---:|---:|---:|
536+
| distinct_count | 41 | 9 | 2 | 2 |
537+
| distinct_take | 0 | 9 | 0 | 0 |
538+
539+
#### Cat-C JIT
540+
541+
| benchmark | m1 sql | m3 | m3f | m4 | note |
542+
|---|---:|---:|---:|---:|---|
543+
| indexed_lookup | 1250 | 498461 | 34974 | **102** | **12.3× m4 win over SQL b-tree** |
544+
| zip_dot_product || 10 | 0 | 0 | matches m3f |
545+
| join_count || 35 | 35 || still deferred |
546+
547+
#### Lanes where m4 lags m3f under JIT (same root cause as INTERP)
548+
549+
JIT closes the gap but doesn't eliminate it — terminator still goes through eager bridge for decs on these shapes.
550+
551+
| benchmark | m3f | m4 | gap |
552+
|---|---:|---:|---:|
553+
| single_match | 0 | 16 | 16 ns |
554+
| last_match | 0 | 17 | 17 ns |
555+
| aggregate_match | 0 | 17 | 17 ns |
556+
| element_at_match | 0 | 9 | 9 ns |
557+
558+
**JIT headlines:** m4 beats m1 sql by 12-19× on the entire group-by family. `indexed_lookup` JIT win grows from 3× (INTERP) to 12.3× (JIT) because the m4 hot path becomes ~one hash lookup. Most aggregate m4 lanes collapse to 0 ns/op at JIT measurement floor — same as m3f. Same 4-lane outlier set as INTERP; same future scope.

benchmarks/sql/indexed_lookup.das

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,16 +53,19 @@ def run_m3f(b : B?; n : int) {
5353
// --- m4: decs eid lookup (O(1) hash) ---
5454
// query(eid, $(...)) wraps for_eid_archetype, which hashes the eid into entityLookup
5555
// and dispatches the block once if the archetype matches. The block-arg `car_id` is
56-
// the request-shape witness — any field of DecsCar would do; we use id to mirror m1/m3.
56+
// the request-shape witness AND a fixture-drift gate: fixture_decs_capture_mid sets
57+
// id = i+1 on the n/2-th entity, so reading car_id back gives us a non-trivial value
58+
// that proves the lookup actually visited the expected entity (not a coincidence).
5759
def run_m4(b : B?; n : int) {
5860
let target_eid = fixture_decs_capture_mid(n)
61+
let expected_id = n / 2 + 1
5962
b |> run("m4_decs_eid/{n}") {
6063
var found = 0
6164
query(target_eid) $(car_id : int) {
62-
found = 1
65+
found = car_id
6366
}
6467
b |> accept(found)
65-
if (found == 0) {
68+
if (found != expected_id) {
6669
b->failNow()
6770
}
6871
}

daslib/decs_boost.das

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -647,6 +647,15 @@ class FromDecsMacro : AstCallMacro {
647647
//! res |> push((index, text))
648648
//! })
649649
//! return res.to_sequence()
650+
//!
651+
//! When this iterator is consumed by a daslib/linq_fold chain (`_fold(from_decs(...).where_(...).count())`,
652+
//! etc.) the chain splice in plan_decs_unroll recognizes the expanded shape via extract_decs_bridge
653+
//! and fuses where/select/take/distinct/group_by/order_by/reverse + the final terminator into a
654+
//! single for_each_archetype loop with inline accumulator — no array<tuple> materialization. The
655+
//! splice is shape-based: the same recognizer matches both this macro and from_decs_template
656+
//! (which differs only in deriving the field list from a [decs_template] struct). Wave 4 component
657+
//! pruning also fires: if the chain only references a subset of the lambda's declared args, the
658+
//! emitted for-loop drops the unused get_ro slots.
650659
def override visit(prog : ProgramPtr; mod : Module?; var expr : ExprCallMacro?) : ExpressionPtr {
651660
macro_verify(length(expr.arguments) == 1, prog, expr.at, "expecting from_decs($(block_with_arguments))")
652661
macro_verify(expr.arguments[0] is ExprMakeBlock, prog, expr.at, "expecting from_decs($(block_with_arguments))")

0 commit comments

Comments
 (0)