dancinlab · dancinlife · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/.verdicts/unshadow-hexaval-unbox/pilot.txt b/.verdicts/unshadow-hexaval-unbox/pilot.txt
@@ -0,0 +1,25 @@
+UNSHADOW HexaVal-unbox pilot — verdict (mini macOS arm64, clang 21.0.0)
+========================================================================
+falsifier: known-int binop result reboxed via inline ((HexaVal){.tag=TAG_INT,.i=(...)})
+  must (a) be byte-identical to out-of-line hexa_int(...) AND (b) close the parity gap.
+
+G5 byte-diff gate (output, both arms + ref-C):
+  before  stdout = 34200003330000000
+  after   stdout = 34200003330000000
+  ref-C   stdout = 34200003330000000
+  G5: IDENTICAL  (md5 63888b02e0325abf096209d943c8413f)
+
+asm (hot mix(), otool -S / clang -S, -O2):
+  before  bl _hexa_int = 17   (total bl = 19)
+  after   bl _hexa_int = 0    (total bl = 2 = mix-call + printf)
+  after mix() = pure register arith (add/sub/lsl in x8..x11), zero HexaVal spills.
+
+wall (best-of-11, ms):
+  ref-C @-O2          = 54
+  before (boxed rebox)= 599
+  after  (inline lit) = 53
+
+findings:
+  unbox speedup (before/after) = 11.30x   (91.2% wall drop)
+  parity gap before = 11.09x ;  after = 0.98x  (AT PARITY)
+  gap closed = 100% of the before-parity gap on this known-int workload.
diff --git a/.verdicts/unshadow-same-tu/F-UNSHADOW-SAME-TU.txt b/.verdicts/unshadow-same-tu/F-UNSHADOW-SAME-TU.txt
@@ -0,0 +1,108 @@
+F-UNSHADOW-SAME-TU — 🔵×🟡 same-TU 빌드 기본화 cost/benefit PILOT (measured)
+================================================================================
+Host: pool mini (macOS arm64, Apple clang). Both arms same host, back-to-back.
+Runtime source: pre-graduation self/ tree (commit 151c52c8, B9-faithful — the
+  emitter reproduces runtime.c byte-identically, B9.C-10 source-SHA gate) via
+  `git archive 151c52c8 self | tar -x` into /tmp (the repo is .c=0 post-B9, so
+  runtime.c must be supplied; this is the milestone-blessed faithful proxy).
+Reproducer (committed, parse-gated + run on mini):
+  tool/unshadow_same_tu_bench.hexa  --rt <dir-with-self/runtime.c> --runs 5
+
+FALSIFIER (pre-registered)
+  "Making the C-emit build same-TU by default (user.c `#include "runtime.c"`,
+   no separate runtime.o 2nd TU) (a) opens #2-ext-class cross-layer boundary
+   wins generally (clang -O2 inlines bl _rt_*/_hexa_* across the now-open
+   boundary), byte-identical, AND (b) carries an acceptable build-time/binary
+   cost — i.e. default-on is a net win."
+
+BUILD-RECIPE CHANGE IMPLEMENTED (self/main.hexa cmd_build, GATED HEXA_SAME_TU=1)
+  Two coordinated, reversible edits — opt-in only, NOT a forced global default:
+   (1) codegen half — when HEXA_SAME_TU=1, the transpile step is run with
+       HEXA_USE_RUNTIME_C=1 (the existing codegen.hexa:947 escape hatch), so
+       emitted user.c does `#include "runtime.c"` instead of `runtime.h` → the
+       runtime amalgam enters the user TU.
+   (2) link half — when HEXA_SAME_TU=1, the separate runtime object/source 2nd
+       TU is dropped from the final clang call (_rt_input = "") — the runtime is
+       already textually present, so a single TU compiles. (Adding it as a 2nd
+       TU would duplicate every symbol → link error.)
+   guarded `shared != "1" && len(target) == 0` (same-TU not applied to --shared
+   PIC or cross-target zig builds). Unset HEXA_SAME_TU → byte-for-byte the
+   legacy walled build (the resolve_prebuilt / content-hash-.o / source path).
+   self/main.hexa parses cleanly.
+
+MEASUREMENT METHOD (honest A/B proxy — no full self-host rebuild)
+  The full `hexa cc --regen` self-host rebuild is blocked by the B9 wall
+  (runtime.c GENERATED, absent from a fresh clone — same blocker the prior
+  unwall agent hit). The milestone spec explicitly blesses a faithful A/B proxy:
+  two build modes, SAME runtime source, isolating only the TU/link strategy.
+   · workload transpiled by the INSTALLED hexat (emits `#include "runtime.h"`).
+   · WALLED  : compile user.c + link a precompiled runtime object (2 TU) — the
+               live default.
+   · SAME-TU : textually swap runtime.h→runtime.c in user.c (the EXACT transform
+               the codegen half performs) and compile as ONE TU.
+   Both arms compile against the SAME runtime.c source (the walled object is
+   compiled from it), so only the TU boundary varies — precisely the variable
+   the cmd_build edit controls.
+
+VERIFICATION (verbatim — tool/unshadow_same_tu_bench.hexa --runs 5, mini)
+  --- workload: string-boundary ---
+    built: walled=yes  same-TU=yes
+    g5 md5: walled=0e2afa85abbd8d3d13b7a79efb429a8e  same-TU=0e2afa85abbd8d3d13b7a79efb429a8e  [IDENTICAL]
+    wall best-of-5: walled=1.87s  same-TU=1.48s
+    binary size: walled=409080B  same-TU=408888B
+  --- workload: HexaVal-arith (control) ---
+    built: walled=yes  same-TU=yes
+    g5 md5: walled=657d1ec4586d9e7cb3572bd47e3d1bb2  same-TU=657d1ec4586d9e7cb3572bd47e3d1bb2  [IDENTICAL]
+    wall best-of-5: walled=0.58s  same-TU=0.44s
+    binary size: walled=408728B  same-TU=408552B
+  --- build-time (best-of-3, mini-class) ---
+    walled COLD (compile runtime.o + link)  : 3.53s
+    walled WARM (cached runtime.o, link only): 0.10s   ← live default hot path
+    same-TU (recompile amalgam EVERY build)  : 3.55s
+  --- _u_main hot-fn boundary `bl` histogram (string workload) ---
+    [WALLED]                              [SAME-TU]
+      12 bl _hexa_int                       (gone — inlined)
+       2 bl _rt_str_starts_with             (gone — inlined → 2 bl _hxlcl_strncmp + 2 _hxlcl_strlen)
+       1 bl _hexa_contains_poly             (gone — inlined → 1 bl _hxlcl_strstr)
+       1 bl _hexa_to_string                 (gone — inlined → 1 __hexa_to_string_rec)
+       2 bl _hexa_bool                      (gone — inlined)
+       4 bl _hexa_add_slow                  4 bl _hexa_add_slow (kept)
+       3 bl _hexa_truthy                    4 bl _hexa_truthy (kept)
+
+FINDING (honest — benefit real, cost prohibitive for default)
+  BENEFIT (real, generalizes): same-TU opens the whole HexaVal/runtime ABI to
+    clang -O2 cross-TU inlining. The #2-ext-class boundary calls _rt_str_starts_with
+    (2→0) and _hexa_contains_poly (1→0) flip called→inlined exactly as §lto-unwall
+    predicted; and crucially the win is NOT string-specific — the HexaVal-arith
+    control (_hexa_int boxing) also wins (0.58→0.44s, −24%), because hexa_int /
+    hexa_to_string / hexa_bool boxing helpers are themselves runtime boundary
+    calls that same-TU inlines. Measured wall: string −21% (1.87→1.48s), arith
+    −24% (0.58→0.44s). g5 byte-IDENTICAL on BOTH workloads.
+  COST (prohibitive for default-on): same-TU recompiles the full ~14.6K-line
+    runtime amalgam into EVERY user TU — 3.55s/build vs the walled WARM default
+    of 0.10s = ~35× build-time tax. The walled model amortizes the one-time
+    3.53s runtime compile via the content-hash `runtime.<sha>.o` cache; same-TU
+    structurally CANNOT cache the runtime (it is fused into each user TU, keyed
+    by user source). Binary size is a wash (−0.05%, −192 B). A second structural
+    cost: same-TU as a shipped default REQUIRES runtime.c on disk, which B9
+    graduation removed — default-on would re-introduce a generated-.c dependency.
+
+RULED-OUT AXES
+  - default-on same-TU is NOT a net win — the ~35× per-build compile tax on the
+    hot path dominates the −21~24% runtime win for general (non-perf) builds.
+  - the benefit is NOT string-specific — it generalizes to any HexaVal-boxing
+    hot loop (control workload confirms), so the lever is the whole runtime ABI.
+  - binary size is NOT a meaningful axis (wash).
+
+RECOMMENDATION: OPT-IN FLAG (HEXA_SAME_TU=1), NOT default-on. The −21~24%
+  byte-identical runtime win is real and generalizes, but the ~35× build-time
+  tax (3.55s vs 0.10s WARM) + the re-introduced generated-runtime.c dependency
+  make default-on a poor tradeoff for the common build. Same-TU is worth it for
+  RELEASE / perf builds of HexaVal-/boundary-call-heavy programs — exactly the
+  opt-in surface this pilot landed. Terminal: opt-in, not default.
+
+VERDICT: 🔵×🟡 same-TU build = OPT-IN (HEXA_SAME_TU=1), NOT default. BENEFIT
+  −21~24% byte-identical (boundary calls _rt_str_starts_with/_hexa_contains_poly/
+  _hexa_int inlined, generalizes past strings) · COST ~35× build-time tax
+  (3.55s vs 0.10s warm) + generated-runtime.c dependency · binary Δ −0.05%
+  (wash). Terminal measured recommendation: opt-in flag.
diff --git a/bench/unshadow/knownint_heavy.hexa b/bench/unshadow/knownint_heavy.hexa
@@ -0,0 +1,46 @@
+// UNSHADOW HexaVal-unbox pilot — known-int rebox hot workload.
+//
+// Every binop below has BOTH operands provably TAG_INT (immutable int `let`s
+// or IntLits), so codegen takes the STRUCTURAL-2 known-int fast path. On
+// origin/main that path re-boxes each intermediate via the out-of-line
+// `hexa_int(…)` runtime call (`bl _hexa_int` at -O2 — the runtime.o C-ABI
+// wall). The pilot emits the result as an INLINE compound literal
+// `((HexaVal){.tag=TAG_INT,.i=(…)})` instead, so clang -O2 can keep the
+// chain in registers and drop the rebox calls. Byte-equivalent (hexa_int(n)
+// ≡ {.tag=TAG_INT,.i=n}); the value is identical, only the boxing form moves.
+fn mix(n: int) -> int {
+    let a = n + 1
+    let b = a + 2
+    let c = b * 3
+    let d = c - a
+    let e = d + b
+    let f = e * 2
+    let g = f - c
+    let h = g + a
+    let i = h * 2
+    let j = i - b
+    let k = j + c
+    let l = k - d
+    let m = l + e
+    let o = m * 2
+    let p = o - f
+    let q = p + g
+    return q
+}
+
+fn main() {
+    // warmup
+    let mut warm = 0
+    while warm < 3 {
+        mix(7)
+        warm = warm + 1
+    }
+    // timed body
+    let mut r = 0
+    let mut rounds = 0
+    while rounds < 60000000 {
+        r = r + mix(rounds)
+        rounds = rounds + 1
+    }
+    println(r)
+}
diff --git a/domains/UNSHADOW.bench.md b/domains/UNSHADOW.bench.md
@@ -378,3 +378,142 @@ fold/LICM 을 막아 parity 가 free 가 **아니다**. parity ≈1.0 은 `.c=0`
 > 바이너리는 캐시된 `runtime.o` 링크(코드 변경 0) · reference C 는 `/tmp` 외부 작성(hook
 > 회피 · repo 안 `.c` 0개) · fib ref-O2=1ms 는 clang 의 dead-loop elim 으로 degenerate
 > (ratio 절대값보다 "벽 너머 fold 불가"라는 정성 신호가 본질).
+
+## §hexaval-unbox — 🟢 HexaVal 언박싱 pilot (known-int rebox → inline literal)
+
+> milestone `🟢 HexaVal 언박싱 / register-pack` 의 실측. **요지**: §parity-attest 가
+> raw 7.9×~1263× 갭의 주범으로 지목한 HexaVal 박싱을 **한 좁은 지점**에서 제거한다 —
+> codegen STRUCTURAL-2 known-int BinOp fast-path 가 결과를 **out-of-line `hexa_int(…)`**
+> 로 재박싱하던 것을 **inline C compound literal** `((HexaVal){.tag=TAG_INT,.i=(…)})` 로
+> 바꿔, 핫루프 매 산술 step 의 `bl _hexa_int` ABI 호출(= runtime.o C-ABI 벽)을 없앤다.
+> `self/codegen.hexa` L5127. 발화 조건 = `_is_known_int` 가 두 피연산자를 정적 TAG_INT
+> 로 인증할 때만(불변 int-only `let`/IntLit). 그 외엔 기존 boxed emit = 일반 경로 무변경.
+
+측정: `mini` (macOS arm64) · clang 21.0.0 · best-of-11 wall(real min, ms) · 같은
+`runtime.o` 링크 · 2026-05-30 · tag `mac-arm64-mini`. 워크로드 = `knownint_heavy`
+(16-op 불변 int `let` 체인 × 60M, 매 op 가 known-int fast-path 발화).
+
+### 표 — 3-way wall min (ms) + asm
+
+| arm | wall (ms) | hot `mix()` `bl _hexa_int` | parity gap (arm/ref) |
+|---|---|---|---|
+| ref-C @-O2 (plain `int64_t`) | 54  | — (no HexaVal) | 1.00× (baseline) |
+| BEFORE — out-of-line `hexa_int(…)` rebox (origin/main) | 599 | **17** | **11.09×** |
+| AFTER — inline `((HexaVal){.tag=TAG_INT,.i=(…)})` (pilot) | 53  | **0** | **0.98×** |
+
+> g5(정확성): before/after/ref **세 바이너리 stdout 전부 동일** = `34200003330000000`
+> (md5 `63888b02e0325abf096209d943c8413f`). asm: AFTER `mix()` 는 순수 레지스터 arith
+> (`add`/`sub`/`lsl` in x8..x11), HexaVal spill(`str`/`ldr`) 0 — clang -O2 가 16-op
+> 체인을 ~10 스칼라 명령으로 fold. BEFORE 는 17개 opaque `bl _hexa_int` 가 이 fold 를 차단.
+
+### 발견 — 박싱 제거가 known-int 워크로드의 parity 갭을 닫는다
+
+- **unbox speedup = 11.30× (91.2% wall drop)** · **parity gap 11.09× → 0.98×** = known-int
+  핫루프의 raw-parity 갭을 **100% closed**(AFTER 53ms ≈ ref 54ms, 노이즈 내 동일).
+- §parity-attest 의 "raw parity 는 runtime.o C-ABI 벽이 막는다"가 박싱 축에서 **확증** —
+  벽 = 매 op 의 `hexa_int(…)` out-of-line rebox. inline literal 로 그 호출을 제거하면
+  clang -O2 가 벽 없이 누산기를 레지스터에 유지·fold → idiomatic C 와 parity.
+
+**정직 caveat**:
+- 측정은 **faithful C A/B proxy** — 두 arm 이 각 codegen variant 의 call-site emit 을
+  정확히 미러(같은 runtime.o·clang -O2). full self-host transpiler rebuild **아님**:
+  **B9 빌드 벽**(origin/main HEAD 에 일관된 generated-.c 셋 부재 + 설치 트리 runtime.h
+  ABI skew → `hexa cc --regen` merge forward-decl 버그/module link skew 로 canonical
+  재빌드 차단, 메모리 `reference_b9_generated_c_no_checkout_shortcut`). proxy sound 근거 =
+  byte-equivalence 가 **runtime 소스에서 증명**됨(`runtime_core_emit.hexa:1371`
+  `hexa_int(n)={.tag=TAG_INT,.i=n}`) + 변경 변수 1개만 격리.
+- 갭-클로저 절대값은 **known-int 비율이 높은** 워크로드 기준. `_is_known_int` 미발화
+  케이스(mut 누산기 — 예 `fib_heavy` 의 `let mut a; a=b`)는 이 pilot 미적용 →
+  mut-accumulator 언박싱(raw `int64_t` 캐리)은 별도 follow-up.
+- codegen 편집 검증: `self/codegen.hexa` parse-clean(`hexa parse` OK) + 편집 라인이
+  emit 하는 C 문자열이 AFTER arm 형태(`((HexaVal){.tag=TAG_INT,.i=(HX_INT(l) op HX_INT(r))})`)
+  와 정확히 일치(구성으로 검증). 재현 = `bench/unshadow/knownint_heavy.hexa` ·
+  verdict = `.verdicts/unshadow-hexaval-unbox/pilot.txt`.
+## §same-tu — C-emit same-TU 빌드 기본화 cost/benefit PILOT 실측
+
+> milestone "🔵×🟡 same-TU 빌드 기본화" 의 실측. **요지**: §lto-unwall 이 입증한
+> same-TU(`#include "runtime.c"`)를 C-emit 빌드 경로의 빌드-레시피로 만들면 (1) 무엇이
+> 드는가(레시피 변경), (2) BENEFIT(#2-ext류 경계호출 cross-layer 전면 개방), (3) COST
+> (빌드시간·바이너리)를 측정해 default/opt-in/no 정직 권고를 낸다.
+> SSOT 도구 = `tool/unshadow_same_tu_bench.hexa` · verdict = `.verdicts/unshadow-same-tu/`.
+
+### 구현한 same-TU 빌드 MODE (self/main.hexa cmd_build · GATED HEXA_SAME_TU=1)
+
+reversible · opt-in 두 짝 편집(전역 default 강제 flip 아님):
+
+1. **codegen 반쪽** — HEXA_SAME_TU=1 일 때 transpile 스텝을 `HEXA_USE_RUNTIME_C=1`
+   (기존 codegen.hexa:947 escape hatch)으로 돌려 user.c 가 `#include "runtime.c"` 를
+   emit → 런타임 아말감이 user TU 안으로 들어온다.
+2. **link 반쪽** — HEXA_SAME_TU=1 일 때 별도 runtime 오브젝트/소스 2nd TU 를 최종
+   clang 호출에서 뺀다(`_rt_input = ""`). 런타임이 이미 텍스트로 들어와 있으니 단일 TU
+   컴파일. (2nd TU 로 또 넣으면 모든 심볼 중복 → 링크 에러.)
+
+`shared != "1" && len(target) == 0` 가드(–shared PIC·cross-target zig 제외). unset →
+바이트 동일하게 legacy walled 빌드. main.hexa parse-gate PASS.
+
+### 측정 방법 (정직한 A/B 프록시 — full self-host rebuild 없이)
+
+full `hexa cc --regen` 자체빌드는 **B9 벽**(runtime.c GENERATED · fresh clone 부재 —
+선행 unwall 에이전트가 부딪힌 그 블로커)으로 막힘. milestone 스펙이 명시 허용한 faithful
+프록시: 두 빌드 모드 · **동일 runtime 소스** · TU/link 전략만 격리.
+- workload 는 INSTALLED hexat 로 transpile(`#include "runtime.h"` emit).
+- **WALLED**: user.c + 별 precompiled runtime 오브젝트 링크(2 TU) — live default.
+- **SAME-TU**: user.c 의 runtime.h→runtime.c 텍스트 swap(codegen 반쪽과 동일 변환) →
+  단일 TU 컴파일.
+- runtime 소스는 B9 graduation(commit 151c52c8) 직전 `git archive | tar -x` 트리(에미터가
+  byte-identical 재현 = B9.C-10 source-SHA 게이트라 faithful). 양 arm 이 같은 runtime.c 로
+  컴파일되므로 오직 TU 경계만 변수.
+
+### 측정: `mini` (macOS arm64) · best-of-5 wall · 2026-05-30
+
+| workload | g5 (md5) | walled wall | same-TU wall | Δ | walled size | same-TU size |
+|---|---|---|---|---|---|---|
+| string-boundary       | IDENTICAL `0e2afa85…` | 1.87s | **1.48s** | **−21%** | 409080 B | 408888 B |
+| HexaVal-arith (control) | IDENTICAL `657d1ec4…` | 0.58s | **0.44s** | **−24%** | 408728 B | 408552 B |
+
+**빌드시간 (best-of-3):**
+
+| 빌드 모드 | 빌드시간 | 메모 |
+|---|---|---|
+| walled COLD (runtime.o 컴파일 + 링크) | 3.53s | first-ever build |
+| **walled WARM (runtime.o 캐시 · 링크만)** | **0.10s** | **live default hot path** |
+| **same-TU (매 빌드 아말감 재컴파일)** | **3.55s** | runtime.o 캐시 구조적 불가 |
+
+**`_u_main` 핫함수 경계 `bl` 히스토그램 (string workload):**
+
+| bl 타깃 | walled | same-TU | 비고 |
+|---|---|---|---|
+| `_rt_str_starts_with` | 2 | **0** | 인라인 → `_hxlcl_strncmp`×2 + `_hxlcl_strlen`×2 |
+| `_hexa_contains_poly`  | 1 | **0** | 인라인 → `_hxlcl_strstr`×1 |
+| `_hexa_int`            | 12 | **0** | 정수 박싱 헬퍼 전부 인라인 |
+| `_hexa_to_string`      | 1 | **0** | 인라인 → `__hexa_to_string_rec` |
+| `_hexa_bool`           | 2 | **0** | 인라인 |
+
+### 정직한 해석 — BENEFIT 실재·일반화 / COST 기본화엔 과대
+
+- **BENEFIT (실재·일반화):** same-TU 가 HexaVal/runtime ABI 전체를 clang -O2 cross-TU
+  인라이너에 연다. #2-ext류 경계호출 `_rt_str_starts_with`(2→0)·`_hexa_contains_poly`(1→0)
+  가 §lto-unwall 예측대로 call→inlined. 결정적으로 win 은 **string 전용이 아니다** —
+  HexaVal-arith 컨트롤(`_hexa_int` 박싱)도 −24%(0.58→0.44s)로 이긴다. hexa_int/
+  hexa_to_string/hexa_bool 박싱 헬퍼 자체가 런타임 경계호출이라 same-TU 가 전부 인라인.
+  g5 양 workload byte-IDENTICAL.
+- **COST (기본화엔 과대):** same-TU 는 ~14.6K-line 런타임 아말감을 **매 user TU 마다 재컴파일**
+  → 3.55s/빌드 vs walled WARM 0.10s = **~35× 빌드시간 세금**. walled 는 1회 3.53s 런타임
+  컴파일을 content-hash `runtime.<sha>.o` 캐시로 amortize; same-TU 는 런타임이 user TU 에
+  융합돼 **구조적으로 캐시 불가**. 바이너리 크기는 wash(−0.05% · −192 B). 2차 구조적 비용:
+  default-on same-TU 는 디스크에 runtime.c 를 요구 → B9 graduation 이 지운 generated-.c
+  의존을 재도입.
+
+### 권고 (정직)
+
+**OPT-IN FLAG (HEXA_SAME_TU=1) · NOT default-on.** −21~24% byte-identical 런타임 win 은
+실재하고 일반화하지만, ~35× 빌드시간 세금(3.55s vs 0.10s WARM) + generated-runtime.c 의존
+재도입 때문에 일반 빌드에서 default-on 은 나쁜 트레이드. same-TU 는 HexaVal-/경계호출-heavy
+프로그램의 **release/perf 빌드**에 가치 — 이 pilot 이 랜딩한 opt-in surface 가 바로 그것.
+terminal 측정 권고 = opt-in flag.
+
+> caveat: 단일 호스트(mini) 단일 세션 · wall = best-of-5 real min · 양 arm 동일 runtime.c
+> 소스(walled .o 도 그것으로 컴파일) back-to-back · runtime 소스 = B9-faithful pre-graduation
+> 트리(emitter SSOT 와 byte-identical) · full self-host rebuild 은 B9 벽으로 차단되어 A/B
+> 프록시 사용(스펙 허용) · repo 안 `.c` 0개 유지(/tmp 외부 트리). 재현 =
+> `tool/unshadow_same_tu_bench.hexa --rt <self-with-runtime.c> --runs 5`.