You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**indexed_lookup**: Uses the existing `query(eid, $(...))` call macro ([decs_boost.das:315](../../daslib/decs_boost.das#L315)), which wraps `for_eid_archetype` ([decs.das:666](../../daslib/decs.das#L666)). entityLookup is a flat hash on EntityId.id — single hash + generation check + archetype dispatch. New fixture helper `fixture_decs_capture_mid(n) : EntityId` ([_common.das](_common.das)) captures the n/2-th entity's eid mid-`create_entities` callback so the bench has a real eid to look up. The 227 ns/op figure includes the macro-time-generated request/erq lookup; the lookup itself is essentially the hash plus one block invocation.
368
+
**indexed_lookup**: Uses the existing `query(eid, $(...))` call macro ([decs_boost.das:315](../../daslib/decs_boost.das#L315)), which wraps `for_eid_archetype` ([decs.das:666](../../daslib/decs.das#L666)). entityLookup is a flat hash on EntityId.id — single hash + generation check + archetype dispatch. New fixture helper `fixture_decs_capture_mid(n) : EntityId` ([_common.das](_common.das)) captures the n/2-th entity's eid mid-`create_entities` callback so the bench has a real eid to look up. The 489 ns/op figure includes the macro-time-generated request/erq lookup, the hash, the archetype dispatch, AND a real `car_id` field read inside the block (drift-protection assertion against the expected id). An earlier draft measured 227 ns/op but the block was a no-op `found = 1`, so `car_id`'s `get_ro` fetch was being eliminated — that was not a fair lookup cost.
369
369
370
370
**zip_dot_product**: No new surface at all. `from_decs_template(type<DecsCar>)._select(_.price * _.year).sum()` is the natural intra-archetype zip — multi-iter for over the archetype's two int columns. Wave 4 component pruning keeps the price + year `get_ro` slots and drops the other 4 components, so per-element cost is two int reads plus the multiply, matching the m3f two-column zip. The 3 ns gap to m3f covers `for_each_archetype` dispatch overhead.
371
371
372
372
**Net coverage:** 47 → 49 m4 lanes (`indexed_lookup` + `zip_dot_product`). Only `join_count` remains skipped — it's a cross-archetype join, which requires real design (eid linkage between archetypes) and is appropriately deferred past Wave 4b.
373
373
374
-
**No daslib changes.** Pure benchmark additions + one helper in `_common.das`. The Wave 4b PR is documentation that the existing decs surface already covers these chain shapes — the team's pre-Wave-4b suspicion that we'd need new helpers turned out to be incorrect.
374
+
**No new decs surface needed for these lanes.** Pure benchmark additions + one helper in `_common.das`. The Wave 4b PR is documentation that the existing decs surface already covers these chain shapes — the team's pre-Wave-4b suspicion that we'd need new helpers turned out to be incorrect. (The same PR also bundled 6 small daslib/linq_fold cleanups in its commit 1 — those were unrelated PR #2806 review follow-ups, not Wave 4b code.)
375
+
376
+
## Update — from_decs lambda-arg form splice validation (2026-05-22)
377
+
378
+
While planning a follow-up to extend the splice to the lambda-arg form (`from_decs($(args){})`), an empirical spike showed the splice was already firing on it. Both `from_decs` ([decs_boost.das:632](../../daslib/decs_boost.das#L632)) and `from_decs_template` ([decs_boost.das:706](../../daslib/decs_boost.das#L706)) expand to byte-identical invoke-block shapes:
379
+
380
+
```
381
+
invoke($() {
382
+
var res : array<tuple<...>>
383
+
query(...)
384
+
return res.to_sequence()
385
+
})
386
+
```
387
+
388
+
`extract_decs_bridge` ([linq_fold.das:3015](../../daslib/linq_fold.das#L3015)) pattern-matches the post-expansion AST, not the call macro that produced it — so it transparently recognizes both forms with zero additional code. The eager-bridge prose in earlier planning docs about this form needing splice work was incorrect.
389
+
390
+
Added 11 tests to `tests/linq/test_linq_from_decs.das` covering count / where+count / select+sum / take+count / distinct+count / to_array / group_by+count for parity, plus splice-shape gates confirming the AST matches the type-arg form, plus Wave 4 component pruning gates confirming the walker drops unused multi-component `get_ro` slots on this form. All 169 tests pass.
391
+
392
+
**Practical effect for users:** the lambda-arg form is the right choice when the user wants custom component prefixes that don't match a `[decs_template]` struct (or when they want to query components across structs not annotated with one). It now gets the full splice + pruning treatment automatically.
393
+
394
+
## Full m4 matrix — post-Wave-4b snapshot (2026-05-22)
395
+
396
+
Re-ran the entire benchmarks/sql suite (199 benchmarks across 51 families, all on INTERP, 100K rows). Numbers in ns/op. Lower is better. `—` means no lane defined for that family.
397
+
398
+
### Aggregates — m4 within Wave 4 multi-component get_ro floor of m3f
399
+
400
+
| benchmark | m1 sql | m3 | m3f | m4 | m4/m3f |
401
+
|---|---:|---:|---:|---:|---:|
402
+
| sum_aggregate | 29 | 29 | 2 | 7 | 3.5× |
403
+
| sum_where | 32 | 43 | 4 | 7 | 1.8× |
404
+
| count_aggregate | 29 | 28 | 4 | 6 | 1.5× |
405
+
| long_count_aggregate | 29 | 29 | 4 | 6 | 1.5× |
406
+
| max_aggregate | 30 | 36 | 6 | 10 | 1.7× |
407
+
| min_aggregate | 30 | 38 | 5 | 10 | 2.0× |
408
+
| average_aggregate | 29 | 32 | 5 | 11 | 2.2× |
409
+
| select_where_count | 32 | 57 | 5 | 9 | 1.8× |
410
+
| select_where_sum | 37 | 73 | 8 | 10 | 1.3× |
411
+
| select_where | 189 | 28 | 11 | 22 | 2.0× |
412
+
| chained_where | 36 | 45 | 6 | 10 | 1.7× |
413
+
| all_match | 27 | 20 | 3 | 6 | 2.0× |
414
+
| contains_match | 0 | 29 | 2 | 4 | 2.0× |
415
+
| any_match | 0 | 0 | 0 | 0 | matches |
416
+
| take_while_match | 8 | 23 | 2 | 3 | 1.5× |
417
+
| skip_while_match | 3 | 20 | 5 | 7 | 1.4× |
418
+
| to_array_filter | 69 | 42 | 11 | 15 | 1.4× |
419
+
420
+
### Group-by — m4 close to m3f, m4 beats m1 sql 10/11
### Lanes where m4 currently lags m3f (follow-up candidates)
464
+
465
+
These are terminators where the decs splice doesn't fire today; m4 falls through to the eager-bridge `array<tuple>` path. Same chain shapes on the array side splice cleanly (m3f ≈ 0-6 ns). Candidates for a future "Wave 5 — terminal splice for decs" PR.
**Headlines this snapshot:** m4 beats SQL on 10/11 group-by benchmarks. `indexed_lookup` m4 lane closes the Cat-C gap with a 3× win. m4 is within 1.1-2.4× of m3f on 39/44 lanes where both are defined; the 5 outliers (single/last/aggregate/element_at/reverse_take) all share the same root cause — terminator not yet on the splice for decs — and form a coherent follow-up scope.
477
+
478
+
### Same suite under JIT (2026-05-22)
479
+
480
+
Same 199 benchmarks, JIT mode (`-jit` flag, LLVM codegen). Per-bench DLL codegen + cache adds setup time but per-call cost drops to near-zero for tight loops.
481
+
482
+
#### Aggregates — JIT lifts m4 to m3f for most lanes
#### Lanes where m4 lags m3f under JIT (same root cause as INTERP)
548
+
549
+
JIT closes the gap but doesn't eliminate it — terminator still goes through eager bridge for decs on these shapes.
550
+
551
+
| benchmark | m3f | m4 | gap |
552
+
|---|---:|---:|---:|
553
+
| single_match | 0 | 16 | 16 ns |
554
+
| last_match | 0 | 17 | 17 ns |
555
+
| aggregate_match | 0 | 17 | 17 ns |
556
+
| element_at_match | 0 | 9 | 9 ns |
557
+
558
+
**JIT headlines:** m4 beats m1 sql by 12-19× on the entire group-by family. `indexed_lookup` JIT win grows from 3× (INTERP) to 12.3× (JIT) because the m4 hot path becomes ~one hash lookup. Most aggregate m4 lanes collapse to 0 ns/op at JIT measurement floor — same as m3f. Same 4-lane outlier set as INTERP; same future scope.
0 commit comments