Skip to content

[pull] master from GaijinEntertainment:master#1017

Merged
pull[bot] merged 37 commits into
forksnd:masterfrom
GaijinEntertainment:master
May 20, 2026
Merged

[pull] master from GaijinEntertainment:master#1017
pull[bot] merged 37 commits into
forksnd:masterfrom
GaijinEntertainment:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

aleksisch and others added 30 commits May 20, 2026 16:54
STYLE024: ExprSafeAt (?[]) on table<> / array<> / pointer-to-(table|
array|pointer) requires unsafe per ast_infer_type.cpp (errors
unsafe_table_safe_index / unsafe_array_safe_index /
unsafe_pointer_safe_index), but the visitor only marked ExprAt. Add a
preVisitExprSafeAt mirroring the compiler's locality check so the
wrap is not flagged as redundant.

STYLE025: unsafe(expr) sets alwaysSafe only on its immediate child
(ds2_parser.ypp:2275, no descent). When the only unsafe leaf sits
inside a let-ref binding to a non-local non-temporary RHS (e.g.
var s & = *reinterpret<T?>(raw)), the let-ref binding itself
requires unsafe at statement level (ast_infer_type.cpp:4989), and
no single expression-form wrap can satisfy both the let-ref check
and the buried leaf's own unsafe check. Detect this via a stack
frame (count + has_non_local_let_ref) propagated alongside the
existing leaf count, and stay silent when the flag is set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #2746 (Phase 2+3+4) added unfused SimNode_ArrayAt_I64 / _U64 for
int64-/uint64-indexed array access. The existing fusion engine in
simulate_fusion_at_array.cpp hardcoded evalInt(context) and a
uint32_t(...) narrowing in every IMPLEMENT_OP2_SET_NODE macro, so
it could not fire on the new int64-indexed nodes - every arr[i64]
access fell off the fast path.

This commit adds parallel _I64 and _U64 fusion families:

* SimNode_Op2ArrayAt_I64 / _U64 base structs alongside the existing
  SimNode_Op2ArrayAt.
* Three new sections each (ArrayAtR2V scalar / vector / ArrayAt PTR)
  for I64 and U64. Each section redefines the IMPLEMENT_OP2_SET_NODE
  family to:
    - read the right operand via r.subexpr->evalInt64(context)
      [or evalUInt64] instead of evalInt
    - read the right operand from a register as int64_t [or uint64_t]
      instead of uint32_t
    - bounds-check the int64 path as idx<0 || uint64_t(idx) >= size
    - bounds-check the uint64 path as idx >= size
    - keep uint64_t(idx) * uint64_t(stride) + offset arithmetic
* createFusionEngine_at_array() now registers all three families
  (existing int32, new int64, new uint64).

Table fusion needed no code change: SimNode_TableIndex<KeyType> is
template-parameterized and IMPLEMENT_SETOP_NUMERIC(TableIndex) already
registers int64_t and uint64_t key types. Adds test_fusion_table_i64.das
to lock in correctness for int64/uint64-keyed tables.

Fusion was confirmed firing via options log_nodes:

    (ArrayAt_I64LocConst #32 {3,0,0,0} 0x4 0x0)
    (ArrayAtR2V_I64LocConst_TT<int> #32 {5,0,0,0} 0x4 0x0)

Slice C (char* At fusion) was investigated and confirmed-empty: the
typer at src/ast/ast_infer_type.cpp:3088 rejects non-isIndex indices
for the fixed-array path (SimNode_At), so int64 indices never reach
that SimNode. SimNode_PtrAt for pointer-indexing has no fusion engine
at all (neither int32 nor int64). Out of scope.

Tests + benches:
* tests/long_array_table/test_fusion_arr_i64.das - 7 tests covering
  const/local/argument compute modes and float value type
* tests/long_array_table/test_fusion_table_i64.das - 5 tests
  exercising int64/uint64 keys + overwrite + via-argument
* benchmarks/fusion/bench_arr_at_i64.das,
  benchmarks/fusion/bench_table_index_i64.das - side-by-side
  int-vs-int64 indexing throughput, baseline for downstream phases

8952/8952 interpreter tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Original bench mixed for-loop (int) with while-loop (int64/uint64),
so the int64 numbers conflated fusion cost with while-loop harness
overhead. Rewrite all three index types to the same shape:

    for (i in range(N))        // int
    for (i in range64(N64))    // int64
    for (i in urange64(N64))   // uint64

Also adds the missing uint64-write subtest in bench_arr_at_i64.das
so the array bench has full read+write cross-product across all three
index types.

New numbers (per-op ns, interpreter):

  array read:   int=3  int64=4   uint64=3
  array write:  int=5  int64=14  uint64=9
  table read:   int=9  int64=10  uint64=10

Reads are within ~33% across all index types (uint64 read matches
int32 at parity). The int64-write gap (5 -> 14) is a real cost
discrepancy, not harness overhead — worth a follow-up look but
out of scope for the fusion-correctness PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous int64-write bench measured 14 ns/op vs int's 5 ns/op.
SimNode dump showed the gap was entirely in the BODY, not the index
fusion: `arr[i] = int(i) * 2` for an int64 i emits
`MulAnyConst(Cast_to_int(GetLocalR2V(i)), 2)` — an explicit narrowing
SimNode plus an unfused MulAnyConst (vs int's fused MulLocConst). The
LHS `ArrayAt_I64LocLoc` was firing fine — the cast/mul on the RHS was
the real cost.

Two changes to isolate just the index/fusion cost:

1. Write a constant (`arr[i] = 1`) instead of `int(i) * 2`. No cast
   in the body, so the bench measures the index path only.
2. Wrap the inner loop in `for (_j in range(OUTER))` (OUTER = 10).
   Each `b |> run` body now does OUTER * N inner ops, amortizing
   per-call harness overhead.

Apples-to-apples numbers (per-op ns, OUTER * N = 100000 ops/run):

  array read:   int=3  int64=3   uint64=3
  array write:  int=5  int64=5   uint64=5
  table read:   int=9  int64=10  uint64=10

All three index types at parity for array access; table reads within
1 ns. Confirms the Phase 5 fusion variants land int64/uint64 indexing
on the same fast path as int32.

Note: separate from Phase 5, `Mul(Cast_to_int(int64Local), Const)` is
not fused — int's `MulLocConst` doesn't match when the LHS is a cast
result. That's an independent fusion opportunity not covered by this
PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se5-fusion-i64

longarr phase 5: fusion variants for arr[i64] / arr[u64] indexing
cond ? T(a) : T(b) where both branches apply the same workhorse cast
emits two ExprCall nodes that do identical work. Hoist to
T(cond ? a : b) — one call instead of two, same evaluation semantics.

Suggested in the PR #2753 review by @aleksisch:
"string(a ? b : c) instead of a ? string(b) : string(c)? Can be added
to linter too btw. It's not first time such code was written."

The rule reuses PERF020's 15-name workhorse-cast set
(int/int8/int16/int64/uint/uint8/uint16/uint64/float/double/string/
bitfield/bitfield8/bitfield16/bitfield64) and fires when:

- Both ternary branches resolve to the same workhorse cast name.
- Both calls share the same target Type.
- The user argument on both branches has the same baseType — so the
  hoisted T(cond ? a : b) typechecks without an intermediate cast.

Different-arg-baseType cases (cond ? string(intV) : string(int64V))
intentionally do NOT fire — the rewrite would need a manual widen and
that is left to the author.

The rule fires anywhere, including inside closure bodies, matching
PERF020's stance: a redundant cast is redundant regardless of scope.

Argument-count gate accepts any >=1 to handle string(int) (bound with
explicit args({"value","hex","context","at"}) -> 4 daslang args),
which the original single-arg gate would have missed.

Drive-by: same-PR daslib sweep -- three perf_lint.das self-hits
(call.func.fromGeneric != null ? string(.fromGeneric.name) : string(.name))
hoisted to the PERF021-suggested form. Zero residual PERF021 hits in
daslib post-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot review flagged on PR #2759: the rule's first-arg-only check
mis-fires on `cond ? string(a, true) : string(b, false)`. The suggested
rewrite `string(cond ? a : b)` silently drops one branch's `hex=true` —
real semantic change. Reproduced locally.

Fix: add `cast_call_tail_args_equal(le, re)` — compares arguments[1..]
structurally between branches via `expr_equal_struct(.., require_pure=
false)`. Skips ExprFakeContext / ExprFakeLineInfo (auto-injected for
Context*/LineInfoArg* params; differ at every call site by design).
Wired into `check_perf021_ternary_cast_hoist` as a final gate after
the existing first-arg baseType check.

Fixture extended:
- `bad_same_hex_string`  — `string(a, true) : string(b, true)`  → fires
- `good_different_hex_string` — `string(a, true) : string(b, false)` → silent

`expect 31208:11` → `expect 31208:12`.

Also collapsed adjacent return-early guards in
check_perf021_ternary_cast_hoist and the tail-args loop per STYLE016.

Verified: dastest utils/lint/tests (29/29 pass), perf_lint.das
self-lint clean, daslib sweep 0 residual PERF021 hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With more checks we should be able disable some of them
project-wide. This commit introduces support for disabling
and enabling checks using command line.
It may be useful to add pre-push hook with das-fmt and lnter checks
mirroring CI behaviour.
…erf021-ternary-cast-hoist

lint: PERF021 — hoist common workhorse cast out of ternary
MemoryModel::allocate/free/reallocate at src/misc/memory_model.cpp:122/151/194-195
mask uint64 size with ~alignMask. alignMask was uint32_t, so ~alignMask zero-extends
to 0x00000000FFFFFFF0 when ANDed with the uint64 size — silently dropping the high
32 bits for any allocation >= 4 GB. A 4 GB+15 request became 16 bytes; the function
then took the shoe path with size=0, computed si = (0>>4)-1 = 0xFFFFFFFF, and
dereferenced chunks[0xFFFFFFFF] — a wild-address read that crashed the process.

Phase 1 widened the heap public API to uint64 but missed this field on both
MemoryModel and LinearChunkAllocator. Fix is a one-word widening of each;
the existing `(size + alignMask) & ~alignMask` lines pick up the wider type
automatically, and the `DAS_VERIFYF(s <= UINT32_MAX)` policy guard in
LinearChunkAllocator::allocate now fires correctly on >4 GB requests instead
of seeing a silently-truncated size.

Tests, all gated on DASLANG_HUGE_HEAP_TESTS=1:
 - tests-cpp/small/test_heap_64bit.cpp — new 4 GB-boundary test asserting
   bytesAllocated grows by >= 4 GB through PersistentHeapAllocator; existing
   5 GB test moved to persistent_heap (default LinearHeapAllocator is uint32-
   bounded by design and now panics with a clear message).
 - tests/long_array_table/test_huge_array_resize_index.das (5 GB array<uint8>)
 - tests/long_array_table/test_huge_array_iterate.das (2.2 GB, four iteration shapes)
 - tests/long_array_table/test_huge_array_push_emplace_clone.das (push past INT_MAX)
 - tests/long_array_table/test_huge_array_index_offset.das (~4.4 GB array<int>;
   exercises uint64 stride*idx address math)

All four daslang probes carry `options persistent_heap = true` (required for
>4 GB arrays) and an inline gate (`static_if (typeinfo sizeof(type<int?>) < 8)`
+ has_env_variable check) so they silent-skip in CI without the env var.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a third benchmark target alongside m1_sql / m3_array / m3f_array_fold:
m4_decs_fold runs the same chain shape through `_fold(from_decs_template(type<DecsCar>)...)`.
Gives a tri-platform comparison (SQL vs array vs decs) under one chain spec.

Shared scaffold in _common.das: `[decs_template(prefix="car_")] DecsCar` mirroring
Car's 6 fields + `fixture_decs(n)` parallel to `fixture_array(n)`. Each benchmark
file gains a `run_m4` + `[benchmark]` wrapper.

Lambda quirks: explicit `$(c : Car)` annotations don't match the decs tuple
element type, so m4 lanes use `_select(_.field)` macro form (auto-types via
macro expansion). first_or_default_match's sentinel is a named-tuple literal
matching the iter element shape.

Skipped (Cat C — need new decs surface): indexed_lookup (eid-lookup), join_count
(decs join design), zip_dot_product (decs zip surface). Tracked in
`benchmarks/sql/M4_DECS_EXPANSION.md` with full first-sweep results matrix,
Cat A/B split, suspect-0ns-list, and Wave 2-4 plans (Cat C surface adds /
Slice 5+ splice arms / per-chain component-narrowing perf).

Wave 1 results (100K, INTERP):
- Cat A m4 beats SQL on most aggregate/filter shapes (1.5-9x), ~3-5x slower than m3f
  due to 6-component multi-iter for-loop overhead even when chain reads one field
- Cat B m4 falls to eager bridge (~100-130 ns); becomes the regression guard for
  each plan_decs_unroll splice arm as Slice 5+ lands

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-block

Sweeps 197 b|>run blocks across 51 SQL benchmarks, inserting
`b |> accept(<result_var>)` immediately after the inner let-binding. Uses
the existing `[sideeffects]` helper `accept` from dastest/testing.das:172.

**Why:** Documents intent (result must escape) and protects against future
DCE if anyone modifies the chains. ast_dump verified the calls survive
compilation: e.g. take_count m3f lowers to a full spliced invoke + accept(b,
rows) + empty/failNow guard — the chain is genuinely running.

**Finding:** This sweep was initially aimed at the 11 m3f=0 ns/op cases
suspected of being DCE'd (select_count, take_count, take_count_filtered,
take_sum_aggregate, reverse_take, skip_take, distinct_take, any_match,
element_at_match, first_match, first_or_default_match). After the sweep,
those cells still report 0 ns/op. Investigation: ast_dump shows the spliced
loop body fully expanded and the accept call alive. The zeros are real —
dastest reports total_time / n where n=100000, and a body cost ≤~100us
divides to ≤1 ns/op and rounds to 0. For take(N)+to_array shapes the divisor
is 100000 but only TAKE_N=1000 elements are processed, so the unit
underreports actual per-element cost. Not a DCE artifact; matrix is honest.

The accept guards stay regardless — cheap insurance for future bench
changes, and the convention is uniform across all four lanes (m1/m3/m3f/m4).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extends plan_order_family at daslib/linq_fold.das:1230 to recognize `first`
and `first_or_default` alongside the existing `take(N)` terminator on
order_by / order_by_descending / order / order_descending chains.

**Why:** Prior to this change, `_fold(arr._order_by(key).first())` cascaded
to plan_loop_or_count which emitted the full O(N log N) sort + index lookup.
The bench `benchmarks/sql/sort_first.das` showed m3f=722 ns barely improving
on m3=713 ns. After the splice arm: m3f=42 ns (17× win, matches the m1 SQL
baseline of 37 ns within 14%). m4_decs_fold also improves 802→121 ns.

**Emission:**
- order_by + first → `min_by(top, key)` directly (O(N) single pass).
  Matches `min_by_impl`'s panic-on-empty semantics → identical to `order(...).first()`.
- order_by_descending + first → `max_by(top, key)`.
- bare order + first → `min(top)` (or `max` for descending).
- order_by + first_or_default(d) → `top_n_by(top, 1, key) |> first_or_default(d)`
  since no `min_by_or_default` helper exists. One extra 1-elem array allocation but
  cleanly handles the empty case.
- where_ + order_by + first / first_or_default: mirrors the existing prefilter-
  buffer pattern, calling min_by / max_by / top_n_by(_, 1, _) on the filtered buf.

**New helper:** `order_min_call_name(orderName, hasKey)` returns "min" / "max" /
"min_by" / "max_by" based on direction + key presence.

**Recognizer guard:** first/first_or_default must be terminal — `i != length(calls) - 1`
returns null so any trailing op cascades to tier-2.

**Tests:**
- tests/linq/test_linq_fold.das: 7 new parity cases under `test_fold_order_by_first`
  (order_by + first, order_by_descending + first, where + order_by + first,
  order_by + first_or_default with empty/non-empty/filtered-empty sources).
- tests/linq/test_linq_fold_ast.das: 4 new AST-shape gates confirming the splice
  emits min_by / max_by / top_n_by + first_or_default and DOES NOT emit
  order_by / first / first_or_default itself.

Full sweep: 1182 linq tests + 376 fold tests + 188 AST tests, all green.
MCP lint + CI lint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…matrix update

plan_order_family's order+first splice was emitting min_by directly. min_by returns an
uninitialized ref on empty source — silently swallows the panic that eager first()
guarantees. Fix per Copilot review on closed PR #2757:

- No-where + array source: wrap in invoke($(src){ panic if empty; return min_by(src,key) }, top).
  Zero allocation, one branch — preserves the 17× sort_first win.
- No-where + iterator source: emit top_n_by(_, 1, _) |> first() — bounded n=1 heap; first()
  panics on empty array.
- where + order + first: insert `panic if empty(buf)` stmt before the buffer min_by.

Two new regression tests assert that first() on empty array and on filtered-empty
source both panic. M4_DECS_EXPANSION.md gains a section logging the splice arm + the
sort_first 722→41 ns (17×) win.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Copilot review on PR #2760: the recognizer captured `orderKey` whenever
the order call had ≥2 args, but `hasKey` was only true for `order_by` /
`order_by_descending`. For `order(arr, cmp)` (and `order_descending(arr, cmp)`),
splice emitted bare `min(arr)` / `max(arr)` / `top_n(arr, N)` — silently dropping
the user-supplied comparator.

Same bug pre-existed for the `take` arm. Both are fixed by a single bail: when
the order call is `order` / `order_descending` AND argCount >= 2, return null
from the recognizer. Chain falls through to `fold_linq_default`, which rewrites
to `order_to_array(arr, cmp) |> first()` — semantics preserved.

3 functional regression tests (order/order_descending + first/take with cmp) and
1 AST gate (asserting no min/max/top_n splice + sort step survives) — all 3
functional tests failed before the fix (returned min instead of cmp-honoring
result) and pass after.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ssion

PR #2753's new PERF020 rule (`T(x)` where x is already T) caught three real
sloppy-codegen sources that all emitted unnecessary workhorse casts the
interpreter doesn't fold:

1. **daslib/linq.das average()** (iter + array overloads, lines 1529 / 1541) —
   `total += double(x)` fires when caller pre-casts the projection to double
   (e.g. `_select(double(_.price)).average()`). Wrap in static_if guarded by
   `typeinfo stripped_typename(x) == typeinfo stripped_typename(default<double>)`
   so the cast only emits when needed.

2. **linq_fold.das average splice** (plan_loop_or_count, line 734) — emitted
   `double(accName) / double(cntName)` unconditionally. accName carries accType,
   which for double-projected chains is already double. Branch on
   `accType.baseType == Type.tDouble` to skip the cast.

3. **linq_fold.das count-shortcut emissions** (emit_length_shortcut line 432 and
   the plan_zip length-shortcut line 3362) — emitted `int(length(...))` for
   count. length already returns int, so the cast is dead weight. Split into
   `length(...)` for count and `int64(length(...))` for long_count.

No semantic change. Closes the CI lint failure on PR #2760 (5 new + 2 of 4
pre-existing PERF020 warnings in changed files).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… compat)

mapfile is bash 4+ only. macOS ships bash 3.2.57 as /bin/bash (last GPLv2
version), so the hook fails on every Mac developer's first push with:

  .githooks/pre-push: line 71: mapfile: command not found

Replace with a portable `while read; CHANGED+=("$line"); done < <(...)` —
same semantics, works on bash 3.2 and bash 4+.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…se8a-alignmask

longarr phase 8a: alignMask uint32 truncation + huge-array probes
…ench-order-first

linq_fold: m4_decs_fold bench lane + anti-DCE accept sweep + order_by+first splice arm
…ok-bash32

.githooks/pre-push: portable read loop (bash 3.2 compat for macOS)
Extends plan_zip's terminator dispatch with sum/min/max/average +
first/first_or_default/any/all/contains, mirroring plan_loop_or_count's
lane emission via generalized helpers.

emit_accumulator_lane / emit_early_exit_lane now take multi-source via
parallel arrays (srcNames + topExprs). For-loop emission branches on
length: 1-source uses $i(itName); 2-source uses literal `itA, itB`
because qmacro for-loop iter-var position doesn't accept $i() splice
in the multi-iter form. New finalize_lane_emission helper handles the
1- vs 2-arg invoke wrap. finalize_invoke loops over all block args
to set can_shadow (was hardcoded to args[0]).

plan_zip threads `let it = (itA, itB)` via preCondStmts so itName
resolves to the tuple inside the loop body for where/projection/
predicate eval. Accumulator without projection bails to tier-2
(tuple has no += so sum/min/max/average wouldn't typecheck anyway).

Tests: 16 new (14 behavioral parity covering sum/min/max/average,
where+sum, where+long_count, first no-proj/proj, where+first,
first_or_default, any no-pred/empty/pred, all true/false,
contains hit/miss + 2 AST shape asserting single multi-iter for-loop
+ no surviving zip/sum/first calls). 220/220 ast + 369/369 fold
interpret tests green; 1169/1169 AOT sweep across tests/linq. Existing
single-source plan_loop_or_count behavior preserved (call sites wrap
params in 1-element arrays at the boundary).

Deferred: last/last_or_default/single/single_or_default/element_at/
element_at_or_default/aggregate on zip (TERMINAL_WALK lane); 3..8-ary
zip splice (Z4/Z5); any-no-pred length shortcut on zip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #2742 review #r3270242609: classify_terminator("long_count") returns
ACCUMULATOR, so my new ACCUMULATOR dispatch fired first and bailed
(projection == null), regressing `zip(...).long_count()` and
`zip(...).where(p).long_count()` to tier-2 cascade instead of the
existing COUNTER length-shortcut / counter loop.

Fix: gate the ACCUMULATOR branch with `!isCounter`. long_count now
flows through the existing COUNTER path uniformly (already handles
both bare via length shortcut and chain via counter loop).

Strengthened test_zip_long_count_uses_length_shortcut with a
count_call("long_count") == 0 assertion — the previous
count_inner_for_loops == 0 check was satisfied trivially by tier-2
passthrough (raw call chain has no for-loops in its immediate AST),
so the regression slipped through. Added
test_zip_where_long_count_emits_counter_loop to guard the chain case
analogously (for-loops == 1, no surviving long_count call).

Note: dropped a candidate `count_call("length") >= 2` assertion since
count_call doesn't recurse into ExprOp3 ternary (where the
length(srcA) < length(srcB) ? ... lives in the shortcut). The two
assertions above (for_loops + long_count) discriminate the three
paths — length shortcut / counter loop / tier-2 cascade — by
elimination.

PR #2742 review #r3270242659: updated the accumulator section
comment to note long_count routes through COUNTER, not the
projection-required ACCUMULATOR path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ix tracking test)

PR #2742 review #r3270337491 — added explicit bounds guard at
finalize_lane_emission entry:

  let nSrcs = length(srcNames)
  if (nSrcs != 1 && nSrcs != 2) panic("... only 1- or 2-source supported (got {nSrcs}) ...")
  if (length(topExprs) != nSrcs) panic("... length mismatch ...")

Defensive: helper is private + called from 2 sites passing 1- or
2-source arrays per protocol, but the guard trips a clear error if
future Z4/Z5 (3..8-ary zip) work routes through this without
extending the branch first.

PR #2742 review #r3270337476 — emit_accumulator_lane.average
semantics divergence from linq.das. PRE-EXISTING in helper:
accumulates in accType (often int → overflow risk) + returns NaN
on empty cnt, while linq.das average accumulates in double +
returns 0lf on empty. Affects single-source plan_loop_or_count
too. Existing fold test "average: empty → NaN" locks in the
current divergent behavior.

Fix DEFERRED to follow-up PR (uniform fix across both planners +
update of the single-source test). This PR adds a tracking test
(`test_zip_average_empty_returns_zero_when_fixed`) that:

- Has the target function (`target_zip_average_empty_fold`) wired up
- Calls `t->skip(...)` with a clear deferral message at the top
- Documents the desired post-fix assertion as a comment
- Acts as a discoverable to-do for the follow-up PR (un-skip + assert)

Per Boris's reinforcement (PR #2742): deferring is fine, adding
disabled tests is fine, but not adding tests at all for the bugs
we found — NOT fine.

Verification: lint clean, 222/222 interpret (221 passed + 1 skipped),
1171/1171 AOT (1170 passed + 1 skipped).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Spike for the next big chunk of the 64-bit array/table widening project
(PR-D, linq surface). Question: can a function accept `int | int64` as
a single signature and fork inside the body with static_if, so the ~10
linq functions targeted by PR-D widen with one signature each instead
of doubled overloads?

Answer: yes -- `def take_or(x : int | int64)` already parses (the
disjunction-parameter shape was used in tests/language/option_type.das
for ref/auto resolution). What was missing was a clean dispatch
predicate; `stripped_typename(x) == "int"` is a string-compare hack
for something this prominent.

Adds two `typeinfo` traits in src/ast/ast_infer_type.cpp next to
`is_numeric`, following the `is_string` pattern (baseType match +
`dim.size() == 0`):

  typeinfo is_int(x)   -> baseType == tInt   && dim.size() == 0
  typeinfo is_int64(x) -> baseType == tInt64 && dim.size() == 0

tests/long_array_table/test_int_int64_disjunction.das pins both halves
(disjunction-parameter dispatch + the two new traits) with
static_assert type-contract probes so silent reverts on either side
flip the test red.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Records the 8.3× win (m3f 58→7 ns/op) after the cherry-picked
plan_zip accumulator + early-exit lane work fires on the
zip(xs,ys)._select(_._0 * _._1).sum() chain.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…se8b-int-int64-disjunction

phase 8b: typeinfo is_int / is_int64 + int|int64 disjunction spike
…_array

Extends Approach Z direct-inline splice (PR #2750) to cover the remaining
terminator surface for from_decs* chains.

Slice 3a (accumulator family): min/max/average added to emit_decs_accumulator.
Match non-decs emit_accumulator_lane semantics — min/max keep a `first` flag
hoisted above outer for_each_archetype; average keeps a running sum + count
and divides via double() at end. sum/min/max/average require a scalar _select.

Slice 3b (early-exit): new emit_decs_early_exit for first/first_or_default/
any/all/contains. Outer becomes for_each_archetype_find (returns bool; inner
block returns true to stop the archetype walk). any/all/contains use the
find's return value directly (all negates). first/first_or_default thread a
found flag + result via prelude/tail.

Slice 3c (to_array): new emit_decs_to_array hoists `var buf` above outer
for_each_archetype and per-element push_clones the projection (or named tuple
when no _select). Dispatched via the implicit "no recognized terminator" path
since linqCalls marks to_array as skip=true.

Refactor: build_decs_tup_bind + build_decs_inner_for helpers extracted from
Slice 2's emit_decs_accumulator so the new emitters share the for-body shape.
DecsBridgeShape gains elementType (cloned from resVar._type.firstType) for
to_array / first / first_or_default when no projection is present.

Tests: 14 new functional parity + 3 AST-shape gate tests in
tests/linq/test_linq_from_decs.das. All 29 file-local tests green; 1146 linq
+ 234 decs interp tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
borisbat and others added 7 commits May 20, 2026 12:03
CI's das-lint catches this where the MCP lint doesn't (different rule set).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1. emit_decs_accumulator (average): correct the empty-source comment.
   Both numerator and denominator are cast to double before division —
   empty → 0.0/0.0 → IEEE NaN. Never an int-division panic.

2. plan_decs_unroll (implicit to_array): gate the fallthrough on
   `expr._type.isGoodArrayType`. Without the gate, decs-bridge chains
   that end in iterator output — `_fold(from_decs_template(...))`,
   `_fold(...)._where(...)` — silently materialized into array<T>
   instead of preserving the iterator the user expected. Iterator-typed
   chains now return null and cascade to tier-2 fold_linq_default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rminators

Closes the deferred shape coverage from PR #2751: chained `_select` chains,
`_where` after `_select`, `_count(pred)` / `_long_count(pred)`, and the
`_min_by` / `_max_by` retention terminators on `from_decs_template`.

Architecture shift: emit_decs_accumulator / early_exit / to_array now take a
shared `DecsChainInfo` (built by `compute_decs_chain_info`) and wrap their
per-element action via `wrap_decs_chain` instead of a singular projection +
whereCond pair. Each `_select` introduces a fresh `decs_sel{N}` bind whose
type carries forward; the reverse-walk wrapper emits `let bindN = proj` for
selects and `if (pred) ...` for wheres in chain order, so after-select
predicates see the projection output.

`_count(pred)` / `_long_count(pred)` ride the same path: the accumulator
emitter detects a 2-arg terminator call, peels its predicate against
`finalBind`, and wraps the counter increment with `if (pred) ...`. New
`emit_decs_min_max_by` mirrors the min/max accumulator but stores both key +
element (workhorse key via `<` / non-workhorse via `_::less`).

To keep `long_count(pred)` actually callable, linq.das gains
`long_count(iter; pred)` + `long_count(arr; pred)` overloads matching the
existing `count(iter; pred)` / `count(arr; pred)` shape, and linq_boost.das
gains `_long_count` shorthand alongside `_count`. Broader 64-bit sweep
(take/skip/element_at/top_n N-parameter) is gated on 64-bit arrays + tables
and noted at the tail of benchmarks/sql/LINQ.md.

tests/linq/test_linq_from_decs.das: 11 new tests (chained select sum,
select→where sum, where→select→where sum, _count(pred), _long_count(pred),
chained select to_array, _min_by, _max_by, plus three AST-shape gates).
Full sweep: 1171 linq tests + 239 decs tests green; lint clean (MCP + CI).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lan-zip-accum-early-exit-v2

linq_fold: plan_zip accumulator + early-exit terminators (resurrect orphaned #2742)
…-slice3-and-4

linq_fold: plan_decs_unroll Slices 3 + 4 (resurrect orphaned #2751 + #2752)
fix: add null check for subexpression type in ExprAt handling
@pull pull Bot locked and limited conversation to collaborators May 20, 2026
@pull pull Bot added the ⤵️ pull label May 20, 2026
@pull pull Bot merged commit 50df086 into forksnd:master May 20, 2026
@pull pull Bot had a problem deploying to github-pages May 20, 2026 20:58 Error
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants