Skip to content

[pull] master from GaijinEntertainment:master#1022

Merged
pull[bot] merged 10 commits into
forksnd:masterfrom
GaijinEntertainment:master
May 22, 2026
Merged

[pull] master from GaijinEntertainment:master#1022
pull[bot] merged 10 commits into
forksnd:masterfrom
GaijinEntertainment:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 22, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

borisbat and others added 10 commits May 21, 2026 12:23
The JIT (wasm) radio on daslang.io/playground/ has been disabled in
production because the per-sample wasm artifacts never made it into the
deployed _site. This wires up the staging step, ships two benchmark
samples (Dictionary, SHA-256) wired through the cross-compile pipeline,
and along the way unblocks the host-LLVM wasm32 JIT path that several
samples actually need.

Playground side:
- .github/workflows/pages.yml: overlay web/output/samples/examples/*.wasm
  onto _site/playground/samples/examples/ so the HEAD probe in main.js
  (updateEngineAvailability) stops 404'ing.
- web/examples/ui/samples/{data.json,examples/}: rename the old
  random_sequence sample to dict.das (closer to dasProfile naming),
  bump n 5000 -> 200000 so it's a real benchmark, and add sha256.das
  adapted from dasProfile (standalone, no config.das/testProfile deps).
- web/CMakeLists.txt: cross-compile dict + sha256 in the foreach.
- site/tests/playground/dropdowns.spec.js: update the e2e selector for
  the renamed sample, add SHA-256 dropdown coverage.

dasLLVM wasm32 cross-compile fixes (host LLVM vs emcc-built runtime
archive ABI alignment, surfaced cross-compiling dict/sha256 to wasm):

* Signature drifts between modules/dasLLVM/daslib/{llvm_jit_common,
  llvm_exe}.das declarations and src/builtin/module_jit.cpp:
  - jit_prologue: add the 6th arg (LineInfoArg* at).
  - jit_array_lock/unlock, jit_free_heap/persistent, jit_iterator_delete,
    jit_simnode_interop, jit_register_standalone_variable: fix return
    type (void in C++, was voidptr/int1 in the IR declaration).
  - get_jit_table_at/find/erase: add the missing 2 args
    (Context*, LineInfoArg*); call sites pass null since the C++
    dispatcher only looks at baseType.
  - llvm_jit.das jit_prologue call site: pass the 6th at arg.
* For-loop array iteration: stop ptrtoint'ing both ends to LLVMIntPtrType
  (statically i64 on the 64-bit host even when the codegen target is
  wasm32) and compare pointers directly. The old form let host pointer
  width leak into the IR, so wasm32 lowering produced a never-true
  termination compare; array<T> for-loops printed elements then ran
  off the end of the buffer forever.
* init_jit now takes an optional target_triple and pins the module's
  data layout BEFORE any IR is built when cross-compiling. Otherwise
  LLVM eagerly constant-folds GEPs using the host's data layout and
  bakes host-size strides into the IR.
* write_wasm passes +simd128,+nontrapping-fptoint to with_target_machine
  so vec4f returns get lowered the same way as the emcc-built runtime
  archive (which compiles with -msimd128 per web/CMakeLists.txt).
  Without the feature flag, host LLVM sret-lowered vec4f returns while
  the runtime returned them as native v128 - unreachable trap on every
  jit_invoke_block_* / jit_call_*.
* g_target_is_wasm flag (set in init_jit, read by process_function_hints,
  build_noalias_list, attach_loop_metadata): bypass user [hint(...)]
  application for wasm32 - alwaysinline over complex inlined bodies
  triggers a host-LLVM wasm32 backend miscompile that native JIT does
  not hit. The unsafe_range_check / unsafe_alias / unsafe_capture
  options also get default-false on wasm, which keeps bounds checks
  in (safer; tiny perf cost).
* LLVM_JIT_CODEGEN_VERSION 0x04 -> 0x08 to invalidate native JIT DLL
  caches across these changes.

Local browser timings on the staged playground (Chromium):
  Dictionary (n=200000, 10 iters): interp 8.9 ms/iter, JIT 5 ms/iter
  SHA-256 (1024 KB, 10 iters):     interp 2.6 MB/sec, JIT 200 MB/sec
hello/loop/func also run cleanly in both engines after the fixes.

Residual: sha256 JIT main-exit OOB after the timing output. The bench
output is correct and the trap fires post-print; tracked separately.
Won't affect the demo.

Lint clean on all 6 changed .das files. 285 jit_tests pass on the
interpreter; native -jit on dict + sha256 also green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ime helper

Fixes the CI wasm_cross sha256 trap surfaced after the previous commit
(memory access OOB at jit_shutdown, inside das::gc_root::~gc_root after
correct timing output is printed).

Root cause
----------
The IR for reading ctx->globals / ctx->shared computes a byte offset
from Context*:

    %gp  = GEP Context*, CONTEXT_OFFSET_OF_GLOBALS   ; e.g. + 152
    %glb = load ptr from %gp

CONTEXT_OFFSET_OF_GLOBALS comes from src/builtin/module_jit.cpp:

    addConstant<uint32_t>(*this, "CONTEXT_OFFSET_OF_GLOBALS",
        uint32_t(offsetof(Context, globals)));

`offsetof` is evaluated by the C++ compiler that builds module_jit.cpp,
i.e. the HOST compiler. On 64-bit MSVC / clang that's 152. The wasm32
runtime archive (libDaScript_runtime.a) is compiled by emcc with
wasm32 layout — 4-byte pointers — and Context::globals lives at a
different (smaller) offset. The JIT'd code reads from Context+152,
which on wasm32 lands past the field, returning a garbage "globals"
pointer. init_globals then writes the program's `let primes = ...`
constants (and any other globals) into the wrong memory, which doesn't
surface until module shutdown reads the corrupted gc_root.

Fix
---
* Add two C++ helpers in src/builtin/module_jit.cpp:

      DAS_API void * jit_get_globals_base(Context *)  { return ctx->globals; }
      DAS_API void * jit_get_shared_base (Context *)  { return ctx->shared;  }

  Registered as JIT externs alongside the existing get_jit_get_*_mnh
  pair. Because they live in the runtime archive, each side compiles
  them with the right Context layout for that target.

* In modules/dasLLVM/daslib/llvm_jit.das, gate the two global-pointer
  resolution sites on g_target_is_wasm:

    if (g_target_is_wasm) {
        // call jit_get_globals_base / jit_get_shared_base
    } else {
        // inline GEP + load using CONTEXT_OFFSET_OF_GLOBALS — unchanged
    }

  Native JIT keeps the inlined GEP+load (no extra call), so there is
  no perf cost for the JIT path that already worked. Only the
  wasm-cross-compile target goes through the helper — and on wasm
  LLVM inlines it through the runtime archive at link time anyway.

* LLVM_JIT_CODEGEN_VERSION 0x08 -> 0x09 (caches that bake the IR
  shape must invalidate).

Verification
------------
Local repro with wasmtime 44.0.1 (same version CI uses):
  wasmtime -W exceptions=y sha256.wasm
  "sha256", 0.006423000, 10
  155.69049 mb/sec
  -> exit 0 (was: out of bounds memory access at jit_shutdown)

Browser playground also clean for all five samples (hello / loop /
func / dict / sha256).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* init_jit features string: only set +simd128/+nontrapping-fptoint when
  cross-compile target is wasm. Previously applied to ANY non-empty
  triple — would break a future non-wasm cross-compile (e.g. aarch64,
  riscv) at LLVMCreateTargetMachine time. Caught by Copilot.

* sha256 sample header: clarify up front that this is the SHA-256
  COMPRESSION FUNCTION (no padding finalization), kept identical to the
  dasProfile cross-language bench shape so the numbers compare apples
  to apples. Input is always 1024 bytes (16 full blocks), so the
  no-trailing-bytes case doesn't arise. Caught by Copilot.

* pages.yml playground samples staging: cp from
  site/playground/samples (the directory both stage_site_playground
  AND stage_site_playground_wasm populate as their canonical output)
  instead of the previous two-step from web/examples/ui/samples plus
  the .wasm overlay from web/output/samples/examples. Drops one cp
  step and one source path. Caught by aleksisch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inalize_decs_emission extract, qn sweep

- A1: collapse `finalize_emission` into `finalize_emission_stmts`. Single
  caller (`plan_reverse`) now extracts its qmacro_block body via
  `push_block_list` before delegating. -8 LOC.

- A2: extract `finalize_decs_emission(emission, at, wrapToIter)` helper.
  Three callers (`plan_decs_order_family`, `plan_decs_reverse`,
  `plan_decs_distinct`) consolidate the `force_at + force_generated +
  conditional iter wrap` tail into a single call. The two wrap conditions
  (`needIterWrap && returnsArray` for order_family, `needIterWrap &&
  needBuffer` for distinct) merge into a pre-multiplied `wrapToIter`
  boolean at the call site. -9 LOC net.

- qname sweep: 124 of 132 `"`{prefix}`{at.line}`{at.column}"` sites
  collapse to `qn("prefix", at)`. The 8 remaining sites carry an extra
  backtick-segment after `{at.column}` (e.g. `{length(preCondStmts)}`,
  `{spec.slot}`) that doesn't fit `qn`'s signature; left as-is.

- Category A inline-collapse / Category B push_block_list adoption:
  SKIPPED after fresh audit. The plan's targeted sites have shifted to
  conditional-push shapes (after prior slices) that no longer fit clean
  inline-collapse. Defer to a future cleanup if it surfaces again.

Codegen invariant: `qn()` returns the byte-identical string as the inline
form, so all AOT baselines pass without refresh. Bench smoke 7 reps × 7
benches: m3f figures byte-identical to PR #2796 baselines (count 4 / sum 2
/ average 5 / aggregate_match 6 / groupby_count 36 / zip_dot_product 7 /
distinct_count 15 ns/op).

Validation matrix (9 lanes):
- Interp: linq 1332/1332, decs 245/245, ast_match 371/371, dasSQLITE 782/782
- AOT:    same
- JIT:    same modulo test_capture_cfb.das pre-existing failure on master

Net: -16 LOC (139 ins / 155 del). MCP lint + CI lint + format all clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fixes

Playground JIT toggle: ship benchmark samples + fix wasm32 cross-compile
…echanical-cleanup

linq_fold: mechanical cleanups (PR 3) — finalize_emission collapse + qn sweep
…_from + qmacro_block_to_array

Replace runs of two-or-more `stmts |> push <| qmacro_expr() { ... }` calls into
the same array with a single `stmts |> push_from <| qmacro_block_to_array() { ... }`
emission. Each cluster body reads as one multi-statement emission unit instead of
N separate per-statement pushes.

Pattern is composed entirely from existing pieces: `push_from` from builtin.das
plus the `qmacro_block_to_array` macro already used in 15 init sites in this
file (the `var stmts <- qmacro_block_to_array() { ... }` form), so consumers
familiar with that idiom recognize the shape immediately.

Net: -50 LOC (21 ins / 71 del), single file. Audit covered decs_boost
(0 raw sites), sqlite_linq (6 isolated sites, no clusters), and ast_match
(97 sites but all in the single-line `push <| qmacro_expr(\${...})` paren form,
already as compact as collapsing achieves) — only linq_fold has the multi-line
cluster pattern.

Validation: 9-lane matrix (interp+AOT+JIT × linq 1332 / decs 245 / ast_match 371 /
dasSQLITE 782) all green; bench smoke 7×7 m3f byte-identical to master across
count/sum/average/aggregate_match/groupby_count/zip_dot_product/distinct_count.
MCP lint + CI lint + format_file all clean.
…ush-cluster-collapse

linq_fold: collapse 21 consecutive qmacro_expr push clusters
…ush cluster consolidation

Codifies the patterns established by PRs #2793-#2799 so future macro work
follows the documented forms instead of rediscovering them.

Three new sections:
- "Shared AST-match helpers" — table of 11 public helpers in
  daslib/ast_match.das + daslib/templates_boost.das (match_call_in_module,
  match_call_in_linq, peel_lambda_*, peel_tuple_field_read,
  extract_const_string, qn, qm_peel_ref2value, push_block_list) with
  signatures + when-to-reach-for-each. Includes a "when patterns apply
  vs don't" note: introspection-heavy files (linq_fold, sqlite_linq,
  ast_match) benefit; emit-only files (decs_boost, the emitter half of
  templates_boost) don't.
- "qmatch — predicate-style pattern matching" — anti-pattern (hand-rolled
  is X / as X cascades) vs preferred predicate form with $e/$f/$v/$i tags
  bound to PRE-DECLARED outer variables (not result-struct fields).
  Documents the QMatchResult shape, points to sqlite_linq for 37+ adoption
  sites + tests/ast_match for grammar exercises.
- "Push cluster consolidation" (new subsection under "qmacro vs quote") —
  consecutive `arr |> push <| qmacro_expr() { ... }` runs into the same
  array collapse into a single emission via either Form A (push_from +
  qmacro_block_to_array, preferred, no clone) or Form B (push_block_list
  + qmacro_block, clones, use when the source block stays alive).
  Includes "when NOT to collapse" guard.

One section updated:
- "Peel ExprRef2Value before qmatch" now routes through qm_peel_ref2value
  (single source of truth) instead of showing the manual if-peel snippet.
  Adds note on why the helper still uses while-peel (conservative until
  ast_block_folding.cpp synthesis paths are audited).

PR 6 (decs_boost migration from the original ladder plan) intentionally
skipped: audit confirmed decs_boost has zero hand-rolled is_*_call
helpers, zero qname construction, zero ExprRef2Value while-loops, zero
push qmacro_expr clusters, and zero peel_lambda candidates. The file is
already lean — the migration would manufacture work.

+100 / -15 LOC. Doc-only; no code changes.
…skill-helpers-doc

skills/das_macros: document AST-match helpers, qmatch idiom, push cluster consolidation
@pull pull Bot locked and limited conversation to collaborators May 22, 2026
@pull pull Bot added the ⤵️ pull label May 22, 2026
@pull pull Bot merged commit 766f15f into forksnd:master May 22, 2026
@pull pull Bot had a problem deploying to github-pages May 22, 2026 02:58 Error
@pull pull Bot added the ⤵️ pull label May 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant