Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
102d77f
phase 8c (part 1/N): decs Archetype.size + iterator widening
borisbat May 20, 2026
1d4100e
phase 8c (part 2/N): decs entityLookup cascade -- long_length + UINT_…
borisbat May 20, 2026
dfdde53
phase 8c (part 3/N): c.data resize cascade + real >INT_MAX runtime probe
borisbat May 20, 2026
590ad24
phase 8c (part 4/N): rewrite huge probe to actually exercise c.data >…
borisbat May 20, 2026
6a644c0
phase 8c (part 5/N): formatter whitespace -- default<X> .field
borisbat May 20, 2026
46c69a9
phase 8c (part 6/N): runtime equal instead of static_assert for type …
borisbat May 20, 2026
54472cc
phase 8c (part 7/N): clean pre-existing lint in tests/decs/test_arche…
borisbat May 20, 2026
5e69ff4
phase 8c (part 8/N): CI fix + Copilot review fixes
borisbat May 20, 2026
c70b89e
linq_fold: align average semantics with linq.das (empty-source guard,…
borisbat May 20, 2026
1a37247
phase 8c (part 9/N): aot_builtin.h _i64 decl + Copilot C3/C4 off-by-ones
borisbat May 20, 2026
7e4056b
Merge pull request #2769 from GaijinEntertainment/bbatkin/linq-fold-a…
borisbat May 20, 2026
7818bb4
Merge pull request #2768 from GaijinEntertainment/bbatkin/longarr-pha…
borisbat May 20, 2026
c2d0b47
linq_fold: plan_decs_unroll Slice 5a — take/skip ranges on decs
borisbat May 20, 2026
b419b01
M4 doc: fix Slice 5a bench-impact claim (not DCE — ast_dump confirms …
borisbat May 20, 2026
6ea1ca6
linq_fold: Slice 5a Copilot review — explicit state for any/all/conta…
borisbat May 20, 2026
36658ef
phase 8d: builtin.das push/emplace/push_clone/resize_and_init int|int64
borisbat May 20, 2026
9ec72fc
Merge pull request #2770 from GaijinEntertainment/bbatkin/linq-fold-d…
borisbat May 20, 2026
291b7f1
phase 8d (part 2/N): CI fixes -- lock_data unsafe + resize_and_init h…
borisbat May 20, 2026
1266cbc
phase 8d (part 3/N): lint -- exit 0 when all inputs were intentionall…
borisbat May 20, 2026
e4204ca
linq_fold: plan_decs_unroll Slice 5b — take_while/skip_while on decs
borisbat May 20, 2026
3d37189
Merge pull request #2771 from GaijinEntertainment/bbatkin/longarr-bui…
borisbat May 21, 2026
e2228a5
linq_fold: Slice 5b Copilot review — rewrite take_while tests to actu…
borisbat May 21, 2026
2149d06
phase 8e: daslib boost widenings (array_boost / json_boost / strings_…
borisbat May 21, 2026
2bc7ba6
array_helper: static_if dispatch by is_array (fixed-array vs array<T>)
borisbat May 21, 2026
2554891
Merge pull request #2772 from GaijinEntertainment/bbatkin/linq-fold-d…
borisbat May 21, 2026
e46461c
Merge pull request #2773 from GaijinEntertainment/bbatkin/longarr-das…
borisbat May 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions benchmarks/sql/M4_DECS_EXPANSION.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,3 +264,32 @@ PR #2742's accumulator + early-exit terminator work on `plan_zip` was orphaned o
| zip_dot_product | — | 53 | 58 | **7** | 8.3× |

`zip(xs, ys)._select(_._0 * _._1).sum()` now fuses to a single multi-iter for-loop with inline accumulator, zero alloc. Falls in line with the rest of the accumulator-class benchmarks.

## Update — Slice 5a take/skip on decs (2026-05-20, plan_decs_unroll + DecsRangeInfo)

`plan_decs_unroll` now recognizes trailing `take(N)` / `skip(N)` after the where/select chain. New `extract_decs_ranges` peels them into `DecsRangeInfo`; counter inits hoist above `for_each_archetype` (so they span archetypes); per-element guards (take-cap → return true, skip-counter → continue, take++) wrap `perElement` BEFORE chain so ranges apply to the post-`where_` stream. When `takeExpr != null` the outer call switches to `for_each_archetype_find` with a `: bool` lambda so the take-cap stop propagates across archetypes.

Affected emit paths (all 4 non-bare-count): `emit_decs_accumulator`, `emit_decs_early_exit`, `emit_decs_min_max_by`, `emit_decs_to_array`. Bare `count` via arch.size shortcut still bails on any chain ops including ranges.

**Coverage:** take, skip, skip+take, where+take, select+take+sum, take+first, take+to_array, take(-1) short-circuit, skip-beyond-end, AST-shape gates for take→`_find` routing + skip-only→`for_each_archetype` routing. +11 tests.

| benchmark | shape | m4 (old) | m4 (new) | m3f (array splice) |
|---|---|---:|---:|---:|
| take_count | `.take(N).count()` | 36 | 0 | 0 |
| skip_take | `.skip(N).take(M).count()` | 37 | 0 | 0 |
| take_count_filtered | `_where.take(N).count()` | — | 0 | 0 |
| take_sum_aggregate | `_select.take(N).sum()` | — | 0 | 0 |

m4 splice rounds to 0 ns/op alongside the m3f array splice — same shape (inline counter + early-exit), same measurement floor. Not DCE: `ast_dump --mode source` confirms `for_each_archetype_find` is emitted with `decs_takec >= 1000 → return true` and `++decs_takec; push_clone(decs_buf, decs_tup)` actively building the full 1000-element result array per bench iteration. Old m4 baseline (36-37 ns/op via eager bridge) → new 0 ns/op (~sub-1 ns/iter, indistinguishable from m3f array splice) ≈ 36× actual win, just below the bench's `body_time / n_iters` resolution floor.

## Update — Slice 5b take_while/skip_while on decs (2026-05-20, plan_decs_unroll + predicate-driven ranges)

`plan_decs_unroll` now recognizes trailing `take_while(pred)` / `skip_while(pred)` after the where/skip prefix. `DecsRangeInfo` gains `skipWhileCond` / `takeWhileCond`; `extract_decs_ranges` walks the suffix in canonical order (`skip → skip_while → take_while → take`) and bails when a `select` appears in the prefix (mirrors array-side `seenSelect` bail at `linq_fold.das:1615/1623`). Predicates peel against the source tuple (`tupName`), so the post-where stream is visible but selects are forbidden — same shape as array side. `skipping` flag hoists at invoke scope (one-way; flips false on first non-matching elem, persists across archetypes). When `takeWhileCond != null` the outer call switches to `for_each_archetype_find` with a `: bool` lambda just like `take(N)`, and `useExplicitState` in `emit_decs_early_exit` extends so `any/all/contains + take_while` route through explicit `foundName` (distinguishes "real match" from "take_while-stop" — both produce inner `return true`).

| benchmark | shape | m1 sql | m3 | m3f (array splice) | m4 (old, eager bridge) | m4 (new, splice) | m3f→m4 gap | win vs baseline |
|---|---|---:|---:|---:|---:|---:|---:|---:|
| take_while_match | `._take_while(_.id < 50K).count()` | 7 | 23 | 2 | 55 | **8** | 6× | 6.9× |

m4 lands close to m3f (8 vs 2 ns/op — within Wave 4 known multi-component get_ro overhead). Splice fires; `ast_dump --mode source` confirms `for_each_archetype_find` with `if !(decs_tup.id < 50000) return true else ++decs_acc`.

**Coverage:** take_while, skip_while, skip_while+take_while, where+take_while, take_while+sum, take_while+first, take_while+to_array, take_while always-true (no break) / always-false (immediate break), skip_while always-true (drops all) / always-false (immediate done), skip+take_while, skip_while+take, take_while+any/all/contains (regression guards for explicit-state routing under take_while), AST shape gates for take_while→`_find` routing + skip_while-only→`for_each_archetype` routing. +21 tests (60 → 81 in file).
165 changes: 84 additions & 81 deletions daslib/array_boost.das
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,19 @@ require daslib/contracts
def private array_helper(var arr : auto implicit ==const; a : auto(TT)) : array<TT -const -#> {
//! helper for temp_array with var argument
var res : array<TT -const -#>
let lenA = _::length(arr)
if (lenA >= 1) {
unsafe {
_builtin_make_temp_array(res, addr(arr[0]), lenA)
static_if (typeinfo is_array(arr)) {
let lenA = _::long_length(arr)
if (lenA >= 1_l) {
unsafe {
_builtin_make_temp_array_i64(res, addr(arr[0]), lenA)
}
}
} else {
let lenA = _::length(arr)
if (lenA >= 1) {
unsafe {
_builtin_make_temp_array(res, addr(arr[0]), lenA)
}
}
}
return <- res
Expand All @@ -33,10 +42,19 @@ def private array_helper(var arr : auto implicit ==const; a : auto(TT)) : array<
def private array_helper(arr : auto implicit ==const; a : auto(TT)) : array<TT -const -#> {
//! helper for temp_array with const argument
var res : array<TT -const -#>
let lenA = _::length(arr)
if (lenA >= 1) {
unsafe {
_builtin_make_temp_array(res, addr(arr[0]), lenA)
static_if (typeinfo is_array(arr)) {
let lenA = _::long_length(arr)
if (lenA >= 1_l) {
unsafe {
_builtin_make_temp_array_i64(res, addr(arr[0]), lenA)
}
}
} else {
let lenA = _::length(arr)
if (lenA >= 1) {
unsafe {
_builtin_make_temp_array(res, addr(arr[0]), lenA)
}
}
}
return <- res
Expand Down Expand Up @@ -75,109 +93,94 @@ def empty(v : auto(VecT)) {
}

[unsafe_operation, template(a), unused_argument(a)]
def public temp_array(var data : auto? ==const; lenA : int; a : auto(TT)) : array<TT -const -#> {
def public temp_array(var data : auto? ==const; lenA : int | int64; a : auto(TT)) : array<TT -const -#> {
//! creates a temporary array from the given data pointer and length
//! Important requirements are:
//!
//! * data pointer is valid and points to a memory block of at least lenA elements
//! * each element follows the next one directly, with the stride equal to size of the element
//! * data memory does not change within the lifetime of the returned array
var res : array<TT -const -#>
if (lenA >= 1) {
unsafe(_builtin_make_temp_array(res, data, lenA))
static_if (typeinfo is_int(lenA)) {
if (lenA >= 1) {
unsafe(_builtin_make_temp_array(res, data, lenA))
}
} else {
if (lenA >= 1_l) {
unsafe(_builtin_make_temp_array_i64(res, data, lenA))
}
}
return <- res
}

[unsafe_operation, template(a), unused_argument(a)]
def public temp_array(data : auto? ==const; lenA : int; a : auto(TT)) : array<TT -const -#> const {
def public temp_array(data : auto? ==const; lenA : int | int64; a : auto(TT)) : array<TT -const -#> const {
//! creates a temporary array from the given data pointer and length
//! Important requirements are:
//!
//! * data pointer is valid and points to a memory block of at least lenA elements
//! * each element follows the next one directly, with the stride equal to size of the element
//! * data memory does not change within the lifetime of the returned array
var res : array<TT -const -#>
if (lenA >= 1) {
unsafe(_builtin_make_temp_array(res, data, lenA))
}
return <- res
}

[unsafe_operation, template(a), unused_argument(a)]
def public temp_array(var data : auto? ==const; lenA : int64; a : auto(TT)) : array<TT -const -#> {
//! int64 overload — creates a temporary array from the given data pointer and length
var res : array<TT -const -#>
if (lenA >= 1_l) {
unsafe(_builtin_make_temp_array_i64(res, data, lenA))
}
return <- res
}

[unsafe_operation, template(a), unused_argument(a)]
def public temp_array(data : auto? ==const; lenA : int64; a : auto(TT)) : array<TT -const -#> const {
//! int64 overload — creates a temporary array from the given data pointer and length
var res : array<TT -const -#>
if (lenA >= 1_l) {
unsafe(_builtin_make_temp_array_i64(res, data, lenA))
}
return <- res
}

def array_view(bytes : array<auto(TT)> ==const; offset, length : int; blk : block<(view : array<TT>#) : void>) {
//! creates a view of the array, which is a temporary array that is valid only within the block
unsafe {
if (offset < 0 || (offset + length) > length(bytes)) {
panic("array_view: out of range")
static_if (typeinfo is_int(lenA)) {
if (lenA >= 1) {
unsafe(_builtin_make_temp_array(res, data, lenA))
}
__builtin_array_lock_mutable(bytes)
var res : array<TT>#
_builtin_make_temp_array(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock_mutable(bytes)
}
}

def array_view(var bytes : array<auto(TT)> ==const; offset, length : int; blk : block<(var view : array<TT>#) : void>) {
//! creates a view of the array, which is a temporary array that is valid only within the block
unsafe {
if (offset < 0 || (offset + length) > length(bytes)) {
panic("array_view: out of range")
} else {
if (lenA >= 1_l) {
unsafe(_builtin_make_temp_array_i64(res, data, lenA))
}
__builtin_array_lock(bytes)
var res : array<TT>#
_builtin_make_temp_array(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock(bytes)
}
return <- res
}

def array_view(bytes : array<auto(TT)> ==const; offset, length : int64; blk : block<(view : array<TT>#) : void>) {
//! int64 overload — creates a view of the array
def array_view(bytes : array<auto(TT)> ==const; offset, length : int | int64; blk : block<(view : array<TT>#) : void>) {
//! creates a view of the array, which is a temporary array that is valid only within the block
unsafe {
// Validate non-negativity before the sum to avoid signed overflow UB,
// and so a negative length doesn't wrap to huge after the uint64 cast.
if (offset < 0_l || length < 0_l || uint64(offset) + uint64(length) > long_length(bytes)) {
panic("array_view: out of range")
static_if (typeinfo is_int(offset)) {
// Validate non-negativity before the sum to avoid signed overflow UB.
if (offset < 0 || length < 0 || uint(offset) + uint(length) > uint(_::length(bytes))) {
panic("array_view: out of range")
}
__builtin_array_lock_mutable(bytes)
var res : array<TT>#
_builtin_make_temp_array(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock_mutable(bytes)
} else {
if (offset < 0_l || length < 0_l || uint64(offset) + uint64(length) > uint64(_::long_length(bytes))) {
panic("array_view: out of range")
}
__builtin_array_lock_mutable(bytes)
var res : array<TT>#
_builtin_make_temp_array_i64(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock_mutable(bytes)
}
__builtin_array_lock_mutable(bytes)
var res : array<TT>#
_builtin_make_temp_array_i64(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock_mutable(bytes)
}
}

def array_view(var bytes : array<auto(TT)> ==const; offset, length : int64; blk : block<(var view : array<TT>#) : void>) {
//! int64 overload — creates a mutable view of the array
def array_view(var bytes : array<auto(TT)> ==const; offset, length : int | int64; blk : block<(var view : array<TT>#) : void>) {
//! creates a mutable view of the array, which is a temporary array that is valid only within the block
unsafe {
if (offset < 0_l || length < 0_l || uint64(offset) + uint64(length) > long_length(bytes)) {
panic("array_view: out of range")
static_if (typeinfo is_int(offset)) {
if (offset < 0 || length < 0 || uint(offset) + uint(length) > uint(_::length(bytes))) {
panic("array_view: out of range")
}
__builtin_array_lock(bytes)
var res : array<TT>#
_builtin_make_temp_array(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock(bytes)
} else {
if (offset < 0_l || length < 0_l || uint64(offset) + uint64(length) > uint64(_::long_length(bytes))) {
panic("array_view: out of range")
}
__builtin_array_lock(bytes)
var res : array<TT>#
_builtin_make_temp_array_i64(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock(bytes)
}
__builtin_array_lock(bytes)
var res : array<TT>#
_builtin_make_temp_array_i64(res, addr(bytes[offset]), length)
invoke(blk, res)
__builtin_array_unlock(bytes)
}
}
Loading
Loading