From 3b1e5a14529f5108377a85d51325fc7f9bd4f416 Mon Sep 17 00:00:00 2001 From: dancinlife Date: Sat, 30 May 2026 05:12:53 +0900 Subject: [PATCH] =?UTF-8?q?feat(codegen)+docs(unshadow):=20unboxed-prim-ar?= =?UTF-8?q?ray=20axis=20A=20=E2=80=94=20perf=20closed-negative=20(gap=20is?= =?UTF-8?q?=20storage,=20not=20tag-guard)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit UNSHADOW typed-repr RFC 의 axis A pilot. §c-class 가 "미측정 lever" 로 남긴 "known-array 추적기 → tag-guard 삭제" 를 codegen 에 착지하고 측정했다. element-kind 증명 (self/codegen.hexa, +124 순수추가): - `_is_int_literal_array` — 배열 리터럴이 전부 IntLit 인지 (좁은 monomorphic-i64 증명) - LetStmt 등록 `_known_intarr_set` (불변 let 만), AssignStmt 재대입 시 void - Index emit: known-int-array + live in-range fact → §c-class array-tag guard 삭제, direct `arr.arr_ptr->items[i]`. `_is_known_int(arr[i])` true → sum 이 raw `.i` 추출 - 두 게이트 모두 immutable-let 등록 AND live in-range fact 둘 다 요구 (unproven 무발화) 측정 (mini macOS arm64 best-of-9, tool/unshadow_unboxed_array_bench.hexa): - g5 byte-diff IDENTICAL 4-arm (md5 35470124) + 동적경계 box 정확 (md5 9efbbf5d 양쪽) - perf 🔴 CLOSED-NEGATIVE: b_unbox 1.12s ≈ a_boxed 1.12s = ~0% Δ. tag-guard 는 loop-invariant 라 clang -O2 가 이미 hoist. 진짜 벽 = boxed 저장 (sizeof HexaVal=16, items[] 16B-stride, SIMD 불가); native int64_t[](c_native) 만 갭 100% close (0.08≈ref) - 누락 인프라 = native HexaArrI64/F64 저장 (runtime 변경, B9 벽 밖) = 갭이 사는 곳 correctness WIN (provably-dead guard 삭제·무회귀) + perf 닫힌-부정 (codegen-only unbox 는 레버 아님). element-kind 증명·box/unbox 경계 규율은 axis B(monomorphic struct) 기반. verdict=.verdicts/unshadow-unboxed-array/ Co-Authored-By: Claude Opus 4.8 (1M context) --- .verdicts/unshadow-unboxed-array/asm_simd.txt | 11 + .verdicts/unshadow-unboxed-array/pilot.txt | 53 ++++ domains/UNSHADOW.bench.md | 49 ++++ domains/UNSHADOW.log.md | 45 ++++ domains/UNSHADOW.md | 2 +- self/codegen.hexa | 124 +++++++++ tool/unshadow_unboxed_array_bench.hexa | 241 ++++++++++++++++++ 7 files changed, 524 insertions(+), 1 deletion(-) create mode 100644 .verdicts/unshadow-unboxed-array/asm_simd.txt create mode 100644 .verdicts/unshadow-unboxed-array/pilot.txt create mode 100644 tool/unshadow_unboxed_array_bench.hexa diff --git a/.verdicts/unshadow-unboxed-array/asm_simd.txt b/.verdicts/unshadow-unboxed-array/asm_simd.txt new file mode 100644 index 000000000..56ad1a9de --- /dev/null +++ b/.verdicts/unshadow-unboxed-array/asm_simd.txt @@ -0,0 +1,11 @@ +# UNSHADOW unboxed-array asm — SIMD vectorization is the wall +# sizeof(HexaVal)=16 (tag+union) → boxed items[] = 16-byte stride, no SIMD gather +# native int64_t[] = 8-byte contiguous → clang -O2 vectorizes + +## vector-op counts (.2d/.4s/q-reg/ldp): + a_boxed = 5 + b_unbox = 5 + c_native = 23 + +## bl _hexa_index_get: + a_boxed=1 b_unbox=0 c_native=0 diff --git a/.verdicts/unshadow-unboxed-array/pilot.txt b/.verdicts/unshadow-unboxed-array/pilot.txt new file mode 100644 index 000000000..7b9ed0a3f --- /dev/null +++ b/.verdicts/unshadow-unboxed-array/pilot.txt @@ -0,0 +1,53 @@ +UNSHADOW milestone "🟢 unboxed-primitive array" (axis A, typed-repr RFC) — pilot verdict +mini (macOS arm64) · clang · best-of-9 wall · same runtime.o all arms · 2026-05-30 +tool/unshadow_unboxed_array_bench.hexa (faithful A/B proxy — full self-host regen blocked by B9 wall, spec-accepted) + +VERBATIM bench stdout (mini, --runs 9): + +=== UNSHADOW unboxed-primitive array bench (axis A) === +rt=/Users/mini/.hx/packages/hexa/self work=/tmp/uba-work runs=9 +--- C: known-int-array hot-loop sum (4 arms) --- + [ref_c ] built=yes err_lines=0 + [a_boxed ] built=yes err_lines=0 + [b_unbox ] built=yes err_lines=0 + [c_native] built=yes err_lines=0 + g5 byte-diff (program output md5 — ALL FOUR MUST be identical): + ref_c : 35470124be79241c684dc5103ec55d20 + a_boxed : 35470124be79241c684dc5103ec55d20 + b_unbox : 35470124be79241c684dc5103ec55d20 + c_native : 35470124be79241c684dc5103ec55d20 + perf best-of-9 wall (s): + ref_c : 0.08 + a_boxed : 1.12 + b_unbox : 1.12 + c_native : 0.08 + asm bl _hexa_index_get (total): a_boxed=1 b_unbox=0 c_native=0 +--- INTEGRITY: typed i64 array at a polymorphic boundary --- + [a_bnd] built=yes err_lines=0 + [b_bnd] built=yes err_lines=0 + boundary g5 (MUST be identical — unbox never changes a value): + a_bnd (boxed) : 9efbbf5d320a45f2ce6e89491a1ac726 + b_bnd (unbox) : 9efbbf5d320a45f2ce6e89491a1ac726 + → both MUST match: the typed array boxes correctly when it reaches + the polymorphic boundary (hexa_len + checked element fetch). +=== done === + +INTERPRETATION (honest): +- g5 byte-diff IDENTICAL across all 4 arms + the dynamic-boundary case (both 9efbbf5d…). + Correctness PROVEN, integrity gate PASS — the typed array boxes correctly at a + polymorphic site, and dropping the tag-guard never changes a value. +- PERF = 🔴 CLOSED-NEGATIVE on the LANDED transform (boxed-storage tag-guard drop): + b_unbox 1.12s ≈ a_boxed 1.12s = ~0% Δ. The c-class tag-guard `HX_IS_ARRAY(arr)?` + is LOOP-INVARIANT — clang -O2 already hoists it out of the hot loop in arm A, so + removing it (arm B, index_get 1→0) yields no wall change. +- The REAL wall is the STORAGE representation, not the tag-guard. sizeof(HexaVal)=16 + → HexaArr.items[] is a 16-byte-strided boxed array; clang cannot SIMD-gather it + (a_boxed/b_unbox = 5 vector ops). The native int64_t[] (c_native) is 8-byte + contiguous → clang vectorizes (23 vector ops) and closes the gap 100% + (c_native 0.08s ≈ ref 0.08s). asm_simd.txt has the vector-op counts. +- MISSING INFRA (the determinant): a true native int64_t[]/double[] array + representation (HexaArrI64/F64) does NOT exist — HexaArr is boxed HexaVal* only. + That is a RUNTIME change (new struct + box/unbox helpers + every array primitive + must branch on element-kind), blocked here by the B9 generated-runtime wall and + out of scope for a codegen-only pilot. Axis A therefore RULES OUT "codegen-only + unbox (drop tag-guard, read boxed .i)" as a perf lever — the gap lives in storage. diff --git a/domains/UNSHADOW.bench.md b/domains/UNSHADOW.bench.md index 747402027..b4f363c03 100644 --- a/domains/UNSHADOW.bench.md +++ b/domains/UNSHADOW.bench.md @@ -581,3 +581,52 @@ emit. 양쪽 arm 모두 "out of bounds" surfaced=1. **증명-안전한 read 만 full self-host regen 은 B9 generated-runtime 벽으로 차단(prior agents 도 동일) → faithful A/B 프록시(emit 문자열은 codegen L7661 과 byte-동일·스펙 허용) · repo 안 `.c` 0개. 재현 = `tool/unshadow_cclass_bounds_bench.hexa --rt --runs 9`. + +## §unboxed-array — 🟢 unboxed-primitive array (axis A) — perf 🔴 CLOSED-NEGATIVE + +> milestone "🟢 unboxed-primitive array" 의 실측. **요지**: §c-class 가 명시한 미측정 +> lever("known-array 추적기가 생기면 tag-guard 도 삭제")를 측정한다. codegen 이 불변 +> `let xs=[int-lit…]` 를 monomorphic-i64 로 정적증명하면 in-range read `xs[i]` 의 +> §c-class array-tag guard 를 삭제하고 raw `.i` 를 추출한다(boxed-storage unbox). **발견**: +> 이 unbox 는 갭을 안 닫는다 — 갭은 tag-guard 가 아니라 **boxed 저장 표현**에 산다. +> SSOT 도구 = `tool/unshadow_unboxed_array_bench.hexa` · verdict = `.verdicts/unshadow-unboxed-array/`. + +측정: `mini` (macOS arm64) · clang · best-of-9 wall · 같은 `runtime.o` 링크 · 2026-05-30. +워크로드 = 256-elem 정수 배열의 sum × 4M outer iters (codegen 이 `let xs=[..]` + +`for i in 0..len(xs) { acc += xs[i] }` 에 emit 하는 정확한 C shape). + +### 표 — 4-way wall min (s) + asm + +| arm | 원소 read emit | wall (s) | `bl _hexa_index_get` | SIMD vec-op | 판정 | +|---|---|---|---|---|---| +| ref_c (idiomatic `int64_t buf[]`) | `buf[i]` | **0.08** | — | 23 | parity baseline | +| a_boxed (BEFORE = §c-class) | `(HX_IS_ARRAY(a)?items[i]:checked)` | 1.12 | 1 (cold) | 5 | boxed read + tag-guard | +| b_unbox (AFTER = 신규 codegen) | `a.arr_ptr->items[i]` (guard 삭제) | **1.12** | **0** | 5 | tag-guard 삭제 → **~0% Δ** | +| c_native (CEILING = HexaArrI64) | `data[i]` (native `int64_t*`) | **0.08** | 0 | 23 | 갭 100% close | + +> g5: 4-arm stdout md5 전부 `35470124be79241c684dc5103ec55d20` (IDENTICAL). + +### 무결성 게이트 — typed 배열이 polymorphic 경계서 정확히 box + +typed `[i64]` 배열을 unboxed fast-path 로 sum 하면서 **동시에** polymorphic site +(`hexa_len` + checked `hexa_index_get` element fetch = codegen 이 non-proven 접근에 emit +하는 BOXED 경로)로 흘려보내는 boundary corpus. boxed(a_bnd)·unbox(b_bnd) 양쪽 arm md5 +`9efbbf5d320a45f2ce6e89491a1ac726` 동일 → **typed array 가 동적 경계서 정확히 box, unbox 는 +값 무변경**. (c-class 의 OOB-still-throws 에 대응하는 이 milestone 의 무결성 게이트.) + +### 정직한 해석 — 갭은 tag-guard 가 아니라 STORAGE 에 산다 + +- **correctness WIN.** element-kind 증명(immutable int-literal-array + live in-range fact) + 이 정확·좁고, byte-diff 4-arm + 동적경계 IDENTICAL. provably-dead guard 만 삭제(무회귀). +- **perf 🔴 CLOSED-NEGATIVE.** b_unbox 1.12s ≈ a_boxed 1.12s. tag-guard `HX_IS_ARRAY(arr)?` + 는 **loop-invariant** 라 clang -O2 가 이미 hoist — 삭제해도(index_get 1→0, cold fallback + 제거) wall 무변. §c-class 가 "미측정 lever" 라 한 그 lever 가 **null** 임을 측정으로 확정. +- **진짜 벽 = boxed 저장.** `sizeof(HexaVal)=16` → `HexaArr.items[]` 는 16B-stride 박스 + 배열, clang SIMD-gather 불가(5 vec-op). native `int64_t[]`(c_native, 8B contiguous)만 + vectorize(23 vec-op)해 갭 100% close. `.verdicts/unshadow-unboxed-array/asm_simd.txt`. +- **누락 인프라 = native `HexaArrI64`/`F64` 저장 표현**(RUNTIME 변경 — 새 struct + box/unbox + 헬퍼 + 모든 array primitive 의 element-kind 분기). B9 벽 밖·codegen-only pilot 범위 밖. + → axis A 가 "codegen-only unbox 는 perf 레버 아님" 을 결정적 배제. 갭은 STORAGE. +- caveat: 단일 호스트(mini)·best-of-9·양 arm 동일 runtime.o·full self-host regen 은 B9 벽 + 차단 → faithful A/B 프록시(b_unbox emit 은 codegen L7666 신규 arm 과 byte-동일·스펙 허용) · + repo 안 `.c` 0개. 재현 = `tool/unshadow_unboxed_array_bench.hexa --rt --runs 9`. diff --git a/domains/UNSHADOW.log.md b/domains/UNSHADOW.log.md index b87f44083..7c1c79317 100644 --- a/domains/UNSHADOW.log.md +++ b/domains/UNSHADOW.log.md @@ -629,3 +629,48 @@ elision 을 라이선스. amalgam merge 가 pre-existing 타입에러로 실패, 내 edit 무관) → faithful A/B 프록시(emit 문자열 codegen L7661 과 byte-동일·스펙 허용). 잔존 lever = known-array 추적기(생기면 tag-guard 도 삭제). verdict=`.verdicts/unshadow-cclass-bounds/` · 재현=`tool/unshadow_cclass_bounds_bench.hexa`. + +--- + +## 🟢 unboxed-primitive array (axis A, typed-repr RFC) — 2026-05-30 + +§c-class 가 남긴 미측정 lever("known-array 추적기가 생기면 tag-guard 도 삭제")를 측정했다. +**결과 = correctness WIN · perf 🔴 CLOSED-NEGATIVE** — 갭이 tag-guard 가 아니라 **저장 표현** +에 산다는 것을 결정적으로 못 박았다. + +**element-kind 증명 + unbox site (self/codegen.hexa).** +- `_is_int_literal_array(node)` — 배열 리터럴 `[1,2,3,…]` 의 모든 원소가 IntLit 인지 (정확·좁은 + monomorphic-i64 증명, Ident·call·float·nesting 전부 거부). +- LetStmt 등록 (`gen2_stmt`, ~L2897): 불변 `let xs=[int-lit…]` → `_known_intarr_add(xs)`. + AssignStmt LHS Ident 재대입 → `_known_intarr_void` (re-let 비-int-array 도 void). LetMutStmt 제외. +- Index emit (~L7666): `xs[i]` 가 live in-range fact 로 덮이고(`_inrange_counter_for`) `xs` 가 + known-int-array 이면 → §c-class ternary `(HX_IS_ARRAY(xs)?items[i]:checked)` 대신 **direct + `(xs.arr_ptr->items[ctr])`** (array-tag guard 삭제, array-ness 정적증명). +- `_is_known_int(node)` (~L9981): `node` 가 known-int-array 의 in-range read `arr[i]` 면 true → + 주변 sum/dot BinOp 가 raw `.i` 를 HX_INT 로 추출(원소-당 tag dispatch 0). 두 게이트 모두 + immutable-let 등록 AND live in-range fact 둘 다 요구 → unproven/boxed 접근엔 절대 발화 안 함. + +**byte-diff IDENTICAL (g5, mini best-of-9).** 4-arm(ref_c·a_boxed=§c-class·b_unbox=신규·c_native +=ceiling) stdout md5 전부 `35470124…`. **무결성 게이트 PASS**: typed `[i64]` 배열을 +polymorphic site(`hexa_len`+checked element fetch)로 흘려보낸 boundary corpus — boxed/unbox +양쪽 arm md5 `9efbbf5d…` 동일. unbox 는 값을 절대 안 바꾸고, typed array 는 동적 경계서 정확히 box. + +**perf 🔴 CLOSED-NEGATIVE.** b_unbox 1.12s ≈ a_boxed 1.12s = **~0% Δ**. tag-guard +`HX_IS_ARRAY(arr)?` 는 **loop-invariant** 라 clang -O2 가 이미 hot loop 밖으로 hoist — 삭제해도 +(asm `bl _hexa_index_get` 1→0, cold fallback 제거) wall 무변. 진짜 벽 = **boxed 저장**: +`sizeof(HexaVal)=16` → `HexaArr.items[]` 는 16B-stride 박스 배열, clang 이 SIMD-gather 불가 +(a_boxed/b_unbox = 5 vec-op). native `int64_t[]`(c_native, 8B contiguous)만 vectorize +(23 vec-op)해 갭 100% close (c_native 0.08s ≈ ref 0.08s). asm=`.verdicts/unshadow-unboxed-array/asm_simd.txt`. + +**누락 인프라 (축 ruled-out).** 진짜 native `int64_t[]`/`double[]` 저장 표현(`HexaArrI64`/`F64`) +이 **없다** — `HexaArr` 는 boxed `HexaVal*` 전용. 그건 RUNTIME 변경(새 struct + box/unbox 헬퍼 + +모든 array primitive 가 element-kind 분기)이고 B9 벽 밖·codegen-only pilot 범위 밖. → axis A 가 +**"codegen-only unbox(tag-guard 삭제 + boxed `.i` read)는 perf 레버가 아니다"** 를 결정적으로 +배제. 갭은 STORAGE 에 산다. paper_negative_ok: 데이터-표현 축 결정적 배제 = valid terminal. + +**정직 caveat**: full self-host regen 은 B9 generated-runtime 벽 차단 → faithful A/B 프록시 +(b_unbox emit 은 codegen L7666 신규 arm 과 byte-동일·스펙 허용, prior round 들과 동일). 단일 +호스트(mini)·best-of-9·양 arm 동일 runtime.o·repo 안 `.c` 0개. codegen 변경은 byte-identical ++ provably-dead guard 삭제(무회귀)라 axis B(monomorphic struct)의 element-kind 증명·box/unbox +경계 규율 기반으로 유지. verdict=`.verdicts/unshadow-unboxed-array/` · +재현=`tool/unshadow_unboxed_array_bench.hexa`. diff --git a/domains/UNSHADOW.md b/domains/UNSHADOW.md index 5bd8fd9a7..ad6b57f34 100644 --- a/domains/UNSHADOW.md +++ b/domains/UNSHADOW.md @@ -34,5 +34,5 @@ > 백로그 재개 — E(🔴) 의 근본 원인 "typed 표현 부재" 를 다음 frontier 레버로 등록. 둘 다 `.c=0` LTO 졸업과 직교(§c-class 가 입증: codegen 증명은 runtime.o 벽과 독립). 선후 = A → B → (B 완료 시) E 재오픈. 설계·착지점·게이트 상세 = `UNSHADOW.typed-repr.md`. -- [ ] 🟢 unboxed-primitive array (`[i64]`/`[f64]` → native `int64_t[]`/`double[]`, boxed HexaVal[] 아님) — §parity-attest(7.9×~1263×) 의 array 축 · §hexaval-unbox(scalar 0.98× closed) 의 array 확장 · §c-class 가 남긴 미측정 lever(boxed-on-read + tag-guard 잔존, bench L577-579)의 직계. 착지=element-kind 추론(check) + ArrayLit/Index emit(codegen `:7594`/`:7661`) typed-arm + box/unbox 헬퍼(runtime.h public). 발화=원소타입 정적 i64/f64 증명 시만, else 기존 BOXED 무변경. 게이트=byte-diff IDENTICAL(typed ON/OFF·동적경계 box 정확) + parity Δ 측정(array hot-loop). RFC=domains/UNSHADOW.typed-repr.md +- [x] 🟢 unboxed-primitive array — **correctness WIN · perf 🔴 CLOSED-NEGATIVE (축 ruled-out: 갭은 tag-guard 아니라 STORAGE).** codegen 에 element-kind 추론 착지(self/codegen.hexa): 불변 `let xs=[int-lit…]`(전부 IntLit) → `_known_intarr_set` 등록, in-range fact 가 덮는 `xs[i]` read 가 (a)receiver TAG_ARRAY (b)elem TAG_INT 둘 다 정적증명 → §c-class array-tag guard(`HX_IS_ARRAY?`) 삭제 + `_is_known_int(xs[i])=true`(sum/dot 가 raw `.i` 추출). 발화=불변-let+live in-range fact 둘 다일 때만, 재대입/re-let 즉시 void, else 기존 BOXED 무변경. g5 byte-diff IDENTICAL 4-arm(ref·a_boxed·b_unbox·c_native, md5 `35470124`) + 동적경계 box 정확(typed array→polymorphic site, md5 `9efbbf5d` 양쪽 동일). **perf**: b_unbox 1.12s ≈ a_boxed 1.12s = ~0% Δ — tag-guard 는 loop-invariant 라 clang -O2 가 이미 hoist(삭제해도 wall 무변). 진짜 벽 = boxed 저장(sizeof HexaVal=16 → items[] 16B-stride, SIMD-gather 불가 5 vec-op); native int64_t[](c_native, 8B contiguous, 23 vec-op) 만 갭 100% close(0.08s≈ref). **누락 인프라 = native HexaArrI64/F64 저장 표현(runtime 변경, B9 벽 밖·codegen-only 범위 밖)** = 갭이 사는 곳. mini macOS arm64 best-of-9. verdict=`.verdicts/unshadow-unboxed-array/` · 재현=`tool/unshadow_unboxed_array_bench.hexa`. RFC=domains/UNSHADOW.typed-repr.md - [ ] 🔵 typed monomorphic struct layout (flat C-struct typedef + offset field access, hash-map 아님) — E(AoS↔SoA) 재오픈 선결 · 현 struct=`hexa_struct_pack_map` 해시맵(`:8043`)·field=`hexa_map_get_ic` strcmp/IC(`:5394`)·valstruct 는 12-슬롯 carrier 전용(일반화 불가, `:7995`). monomorphic 증명(닫힌 필드집합·정적 접근) 시 per-type flat-struct emit + `obj.vs->f_k` offset 접근. 착지=struct shape 증명(check) + gen2_struct_decl(`:7982`) typedef arm + field access(`:5373`) offset arm. 다형/동적-키 struct 는 hash-map 유지. 게이트=byte-diff IDENTICAL + hash-lookup→offset Δ(instr+wall). RFC=domains/UNSHADOW.typed-repr.md diff --git a/self/codegen.hexa b/self/codegen.hexa index fcc4ce215..858fad06b 100644 --- a/self/codegen.hexa +++ b/self/codegen.hexa @@ -2893,6 +2893,16 @@ fn _gen2_stmt_inner(node, depth) { if _is_float_init_expr(node.left) { _known_float_add(node.name) } + // UNSHADOW item A: register immutable let bound to a monomorphic + // i64 array literal (`[1,2,3,…]`) as a known-int-array, so an + // in-range read of it inside a proven loop can drop the array-tag + // guard and read the raw `.i` element (boxed-storage unbox). + if _is_int_literal_array(node.left) { + _known_intarr_add(node.name) + } else { + // A same-name re-let to a non-int-array RHS voids the proof. + _known_intarr_void(node.name) + } // PROBE r11 B3: also try comptime-fold for plain immutable `let`. // `let x = 2+3` now folds to literal 5 and registers in the // comptime-const table — downstream references inline the literal @@ -2989,6 +2999,9 @@ fn _gen2_stmt_inner(node, depth) { // non-folded / string-folded names (mirrors the re-`let` D18 invalidate). if type_of(node.left) != "string" && node.left.kind == "Ident" { _invalidate_comptime_const(node.left.name) + // UNSHADOW item A: reassigning the binding may swap in a non-int + // array (or a non-array) — void the known-int-array proof. + _known_intarr_void(node.left.name) } return pad + gen2_expr(node.left) + " = " + gen2_expr(node.right) + ";\n" } @@ -7658,6 +7671,29 @@ fn gen2_expr(node) { let _ir_ctr = _inrange_counter_for(node.left.name, idx.name) if _ir_ctr != "" { let _ir_av = gen2_expr(node.left) + // ── UNSHADOW item A: unboxed-primitive array read ─────────── + // If `arr` is ALSO a proven known-int-array (immutable let = + // monomorphic i64 array literal), the receiver is statically + // TAG_ARRAY *and* the element is statically TAG_INT. Both the + // §c-class array-tag guard (`HX_IS_ARRAY(arr)?…`) and the + // per-element runtime work are now provably dead — drop the + // ternary and emit a DIRECT element read. The element is + // produced as a HexaVal lvalue (`arr.arr_ptr->items[ctr]`, + // boxed-storage form), but `_is_known_int(arr[i])` now + // certifies it TAG_INT, so a surrounding sum/dot BinOp + // extracts the raw `.i` via HX_INT() with no tag dispatch — + // the boxed-storage unbox (storage is still HexaVal* under + // the runtime.o wall; native int64_t[] storage is the modeled + // ceiling measured in the A/B proxy). DOUBLE-EVAL SAFE: _ir_av + // and _ir_ctr each appear once. FOLD-SHADOW SAFE: registration + // is immutable-let-only + voided on reassignment, and the + // read still requires a live in-range fact (no resize/realias + // in the loop). BOUNDARY-BOX: any use of `arr` NOT matching + // this proven read (passed to a fn / unproven index / untyped + // flow) falls through to the unchanged BOXED HexaArr path. + if _is_known_intarr_name(node.left.name) { + return "(" + _ir_av + ".arr_ptr->items[" + _ir_ctr + "])" + } return "(HX_IS_ARRAY(" + _ir_av + ") ? (" + _ir_av + ").arr_ptr->items[" + _ir_ctr + "] : hexa_index_get(" + _ir_av + ", hexa_int(" + _ir_ctr + ")))" } } @@ -9575,6 +9611,78 @@ let mut _known_float_set = [] // 64 buckets; each bucket is [name, ...] // CODEGEN proof, which is the axis unwall separated out). let mut _inrange_facts = [] // stack of [idx_var, arr_name, c_counter] +// ── UNSHADOW item A: unboxed-primitive array — element-kind tracker ───────── +// Names of immutable `let`s whose initializer is a STATICALLY-PROVEN +// monomorphic-i64 array literal (`[1,2,3,…]`, every element an IntLit). For +// such a binding the codegen KNOWS each element is TAG_INT, so a read +// `arr[i]` covered by a live in-range fact (§c-class) is provably (a) on a +// TAG_ARRAY receiver and (b) a TAG_INT element. That lets us: +// • DROP the §c-class array-tag guard (`HX_IS_ARRAY(arr)?…`) — array-ness is +// now statically proven for this binding, not just bounds. +// • Mark the read known-int so a surrounding sum/dot BinOp extracts the raw +// `.i` (`arr.arr_ptr->items[i].i`) with NO per-element box / HX_INT tag +// work — the boxed-storage form of the unbox. +// This is a flat list (known-int-arrays are rare & local). Registration is +// gated on immutable LetStmt only; any reassignment of the name (AssignStmt +// LHS Ident) VOIDS it (mirrors _invalidate_comptime_const), and the read +// fast-path STILL also requires a live in-range fact (which independently +// proves no resize/reassign in the loop body). Box at the dynamic boundary is +// automatic: ANY use outside the proven `arr[i]` read (passing `arr` to a fn, +// `arr` flowing to an untyped site, an unproven index) falls through to the +// unchanged BOXED HexaArr path — the elem stays a boxed HexaVal there. +let mut _known_intarr_set = [] + +fn _known_intarr_add(name) { + // de-dup + let mut i = 0 + while i < len(_known_intarr_set) { + if _known_intarr_set[i] == name { return } + i = i + 1 + } + _known_intarr_set = _known_intarr_set.push(name) +} + +fn _known_intarr_void(name) { + // Remove `name` if present (reassignment / re-let invalidates the proof). + let n = len(_known_intarr_set) + if n <= 0 { return } + let mut _rebuilt = [] + let mut i = 0 + while i < n { + if _known_intarr_set[i] != name { _rebuilt = _rebuilt.push(_known_intarr_set[i]) } + i = i + 1 + } + _known_intarr_set = _rebuilt +} + +fn _is_known_intarr_name(name) { + let mut i = 0 + while i < len(_known_intarr_set) { + if _known_intarr_set[i] == name { return true } + i = i + 1 + } + return false +} + +// Is `node` an array literal whose every element is an IntLit? (Exact, narrow +// monomorphic-i64 proof — no Idents, no calls, no float, no nesting. The +// narrowest provable case the RFC names.) +// decreases node — structural (single level; elements must be leaf IntLits) +fn _is_int_literal_array(node) { + if type_of(node) == "string" { return false } + if node.kind != "Array" { return false } + if type_of(node.items) != "array" { return false } + if len(node.items) == 0 { return false } + let mut i = 0 + while i < len(node.items) { + let el = node.items[i] + if type_of(el) == "string" { return false } + if el.kind != "IntLit" { return false } + i = i + 1 + } + return true +} + // ── enum-variant access fix (inbox: enum-variant-access-miscodegen) ── // `.` (e.g. RegionShape.K_BY_K) is an enum-variant // constant, NOT a runtime field access — the gen2_enum_decl #define @@ -9982,6 +10090,22 @@ fn _is_known_int(node) { if type_of(node) == "string" { return false } if node.kind == "IntLit" { return true } if node.kind == "Ident" { return _is_known_int_name(node.name) } + // UNSHADOW item A: a read `arr[i]` of a proven known-int-array, where the + // index is covered by a live in-range fact, is statically TAG_INT — its + // element kind is monomorphic i64 by the array-literal proof. Certifying it + // here lets a surrounding sum/dot BinOp extract the raw `.i` (HX_INT) with + // no per-element tag dispatch. Narrow: requires BOTH the known-int-array + // registration AND a live in-range fact (same guards as the Index emit + // fast-path), so it never fires on an unproven / boxed access. + if node.kind == "Index" { + let _l = node.left + let _r = node.right + if type_of(_l) != "string" && _l.kind == "Ident" && _is_known_intarr_name(_l.name) { + if type_of(_r) != "string" && _r.kind == "Ident" { + if _inrange_counter_for(_l.name, _r.name) != "" { return true } + } + } + } return false } diff --git a/tool/unshadow_unboxed_array_bench.hexa b/tool/unshadow_unboxed_array_bench.hexa new file mode 100644 index 000000000..adc502f31 --- /dev/null +++ b/tool/unshadow_unboxed_array_bench.hexa @@ -0,0 +1,241 @@ +// tool/unshadow_unboxed_array_bench.hexa +// UNSHADOW milestone "🟢 unboxed-primitive array" (axis A of the typed-repr +// RFC, domains/UNSHADOW.typed-repr.md). Extends the merged §c-class win: +// c-class elided the bounds check on `arr[i]` inside a proven +// `for i in 0..len(arr)` loop but the read STILL yielded a BOXED HexaVal and +// kept a single array-tag guard (bench L577-579 names dropping that guard as +// the unmeasured lever). This milestone closes it for a STATICALLY-PROVEN +// monomorphic-i64 array. +// +// CODEGEN CHANGE UNDER TEST (self/codegen.hexa): +// An immutable `let xs = [1,2,3,…]` (every element an IntLit) registers `xs` +// as a known-int-array. A read `xs[i]` covered by a live in-range fact then +// knows the receiver is TAG_ARRAY *and* the element is TAG_INT, so: +// • the §c-class array-tag guard (`HX_IS_ARRAY(xs)?…:checked`) is DROPPED +// (array-ness statically proven) → direct `xs.arr_ptr->items[i]`, +// • `_is_known_int(xs[i])` returns true so a surrounding sum/dot BinOp +// extracts the raw `.i` via HX_INT with NO per-element tag dispatch. +// Storage is still the runtime's HexaVal* (HexaArr) under the runtime.o wall +// (true native int64_t[] storage = HexaArrI64 is the modeled CEILING below); +// the win measured as LANDED is the boxed-storage unbox (guard + tag work +// dropped). +// +// ARMS (all link the SAME real runtime.o; only the element read differs): +// ref_c idiomatic plain `int64_t buf[]` sum (parity baseline, no HexaVal) +// a_boxed BEFORE = §c-class read: (HX_IS_ARRAY(a)?a.arr_ptr->items[i] +// :checked), then HX_INT() +// b_unbox AFTER = new codegen read for a known-int-array: +// a.arr_ptr->items[i] (no tag guard), then HX_INT() +// c_native CEILING = true native int64_t[] storage (what HexaArrI64 gives): +// int64_t* data; … data[i] (no HexaVal at all) +// +// INTEGRITY GATE (the hard one — the dynamic boundary): a typed i64 array +// passed to a POLYMORPHIC site MUST still box correctly and produce identical +// output. We emit a boundary corpus: the SAME `[i64]` literal is (1) summed +// via the unboxed fast path AND (2) handed to a polymorphic `hexa_len` / +// element-print site that consumes boxed HexaVal. Both arms (boxed vs unbox) +// MUST byte-match the reference — the unbox NEVER changes a value, only how +// the proven in-range read is emitted. A wrong unbox = tag/repr confusion = +// silent miscompile, so this gate is non-negotiable. +// +// MEASURE per arm: built? · program-output md5 (g5 byte-diff) · best-of-N wall · +// inner-loop `bl _hexa_index_get` count (the boxed-call drop) + asm shape. +// +// Argv: +// --rt dir holding runtime.h + runtime.c + runtime.o +// (default ~/.hx/packages/hexa/self) +// --work scratch dir (default ~/unshadow-uba) +// --runs timed runs per arm, best-of-N (default 5) + +fn parse_int_safe(s) { + let t = s.trim() + if len(t) == 0 { return 5 } + return to_int(t) +} + +fn arg_val(argv, key, dflt) { + let mut i = 0 + while i < argv.len() { + if argv[i] == key && i + 1 < argv.len() { return argv[i + 1] } + i = i + 1 + } + return dflt +} + +fn write_file(path, body) { + exec("cat > '" + path + "' <<'__UNSHADOW_EOF__'\n" + body + "\n__UNSHADOW_EOF__") +} + +fn build_arm(label, cc_cmd, out_bin) { + let r = exec(cc_cmd + " 2>&1") + let built = exec("[ -f '" + out_bin + "' ] && printf yes || printf no").trim() + let nerr = exec("printf '%s' \"" + r + "\" | grep -ciE 'error|undefined' 2>/dev/null").trim() + println(" [" + label + "] built=" + built + " err_lines=" + nerr) + return built +} + +fn md5_of_output(bin) { + return exec("'" + bin + "' 2>/dev/null | md5 2>/dev/null || '" + bin + "' 2>/dev/null | md5sum 2>/dev/null | cut -d' ' -f1").trim() +} + +fn best_wall(bin, runs) { + let mut best = "999" + let mut r = 0 + while r < runs { + let t = exec("{ /usr/bin/time -p '" + bin + "' >/dev/null ; } 2>&1 | awk '/^real/{print $2}'").trim() + if len(t) > 0 { + let cmp = exec("awk 'BEGIN{print (" + t + "<" + best + ")?1:0}'").trim() + if cmp == "1" { best = t } + } + r = r + 1 + } + return best +} + +fn bl_count(asm_path, sym) { + return exec("grep -c 'bl[[:space:]]*_" + sym + "' '" + asm_path + "' 2>/dev/null | head -1 || printf 0").trim() +} + +fn main() { + let argv = args() + let home = env("HOME") + let rt = arg_val(argv, "--rt", home + "/.hx/packages/hexa/self") + let work = arg_val(argv, "--work", home + "/unshadow-uba") + let runs = parse_int_safe(arg_val(argv, "--runs", "5")) + + println("=== UNSHADOW unboxed-primitive array bench (axis A) ===") + println("rt=" + rt + " work=" + work + " runs=" + to_string(runs)) + exec("rm -rf '" + work + "' && mkdir -p '" + work + "'") + + let inc = "-I '" + rt + "'" + let o2 = "clang -O2 " + inc + " " + + // ── shared HexaVal-backed hot-loop body, parameterized by GET() ───────── + // Builds a 256-elem int array (the runtime HexaArr of boxed HexaVal*), then + // sums GET(arr,i) over N outer iters. This is exactly the C shape codegen + // emits for `let xs = [..]` + `for i in 0..len(xs) { acc += xs[i] }`. + write_file(work + "/body.h", + "static long run_corpus(void){\n" + + " HexaVal arr=hexa_array_new();\n" + + " for(int k=0;k<256;k++) arr=hexa_array_push(arr,hexa_int(k));\n" + + " long acc=0; long N=4000000L;\n" + + " for(long r=0;r\n#include \n" + + "static long run_ref(void){\n" + + " int64_t buf[256]; for(int k=0;k<256;k++) buf[k]=k;\n" + + " long acc=0; long N=4000000L;\n" + + " for(long r=0;r\n" + + "#define GET(a,i) (HX_IS_ARRAY(a) ? (a).arr_ptr->items[(i)] : hexa_index_get((a), hexa_int((i))))\n" + + "#include \"body.h\"\n") + + // b_unbox (AFTER = new codegen for a known-int-array) — tag guard DROPPED, + // direct boxed-storage element read. The surrounding HX_INT() extracts `.i`. + write_file(work + "/b_unbox.c", + "#include \"runtime.h\"\n#include \n" + + "#define GET(a,i) ((a).arr_ptr->items[(i)])\n" + + "#include \"body.h\"\n") + + // c_native (CEILING) — true native int64_t[] storage. Models the HexaArrI64 + // representation the RFC targets: `let xs:[i64]` lowers to a native buffer, + // the read is `data[i]` with no HexaVal. Same program output. + write_file(work + "/c_native.c", + "#include \"runtime.h\"\n#include \n" + + "static long run_native(void){\n" + + " HexaVal arr=hexa_array_new();\n" + + " for(int k=0;k<256;k++) arr=hexa_array_push(arr,hexa_int(k));\n" + + " int len=hexa_len(arr);\n" + + " int64_t* data=(int64_t*)malloc(sizeof(int64_t)*len);\n" + + " for(int k=0;kitems[k].i; /* unbox-on-build */\n" + + " long acc=0; long N=4000000L;\n" + + " for(long r=0;r/dev/null") + exec(o2 + "-S '" + work + "/b_unbox.c' -o '" + work + "/b.s' 2>/dev/null") + exec(o2 + "-S '" + work + "/c_native.c' -o '" + work + "/c.s' 2>/dev/null") + println("\n asm bl _hexa_index_get (total): a_boxed=" + bl_count(work + "/a.s", "hexa_index_get") + + " b_unbox=" + bl_count(work + "/b.s", "hexa_index_get") + + " c_native=" + bl_count(work + "/c.s", "hexa_index_get")) + + // ── INTEGRITY GATE: the dynamic boundary — typed array used polymorphically + // The SAME [i64] literal is BOTH summed via the unboxed fast path AND handed + // to a polymorphic consumer (hexa_len + a boxed element fetch via the + // CHECKED hexa_index_get, the path codegen emits when the access is NOT the + // proven loop read). Both arms MUST box correctly there and byte-match. + println("\n--- INTEGRITY: typed i64 array at a polymorphic boundary ---") + write_file(work + "/bnd_body.h", + "static long run_bnd(void){\n" + + " HexaVal arr=hexa_array_new();\n" + + " for(int k=0;k<16;k++) arr=hexa_array_push(arr,hexa_int(k*k));\n" + + " /* (1) unboxed fast-path sum (the proven loop read) */\n" + + " long s=0; for(int64_t i=0;i<(int64_t)hexa_len(arr);i++){ s+=HX_INT(GET(arr,i)); }\n" + + " /* (2) DYNAMIC BOUNDARY: hand the same array to a polymorphic site — the\n" + + " length + a checked (boxed) element fetch, exactly the BOXED path\n" + + " codegen emits for a non-proven access. MUST box correctly. */\n" + + " long plen=(long)hexa_len(arr);\n" + + " long pelem=HX_INT(hexa_index_get(arr, hexa_int(7)));\n" + + " return s + plen*1000 + pelem; }\n" + + "int main(void){ printf(\"%ld\\n\", run_bnd()); return 0; }\n") + // boundary arm A (boxed read on the fast path) + write_file(work + "/a_bnd.c", + "#include \"runtime.h\"\n#include \n" + + "#define GET(a,i) (HX_IS_ARRAY(a) ? (a).arr_ptr->items[(i)] : hexa_index_get((a), hexa_int((i))))\n" + + "#include \"bnd_body.h\"\n") + // boundary arm B (unboxed read on the fast path; boundary use stays checked) + write_file(work + "/b_bnd.c", + "#include \"runtime.h\"\n#include \n" + + "#define GET(a,i) ((a).arr_ptr->items[(i)])\n" + + "#include \"bnd_body.h\"\n") + build_arm("a_bnd", o2 + "'" + work + "/a_bnd.c' '" + rt + "/runtime.o' -o '" + work + "/abnd_bin' -lpthread", work + "/abnd_bin") + build_arm("b_bnd", o2 + "'" + work + "/b_bnd.c' '" + rt + "/runtime.o' -o '" + work + "/bbnd_bin' -lpthread", work + "/bbnd_bin") + println(" boundary g5 (MUST be identical — unbox never changes a value):") + println(" a_bnd (boxed) : " + md5_of_output(work + "/abnd_bin")) + println(" b_bnd (unbox) : " + md5_of_output(work + "/bbnd_bin")) + println(" → both MUST match: the typed array boxes correctly when it reaches") + println(" the polymorphic boundary (hexa_len + checked element fetch).") + + println("\n=== done ===") +}