Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
F-UNSHADOW-2EXT-RT-STR-INLINE โ€” ๐Ÿ”ด CLOSED-NEGATIVE
========================================================
Host: pool mini (macOS arm64). Base: origin/main 002cedc58 (codegen.hexa
context identical at the buildable measurement base af645e419).

FALSIFIER (pre-registered)
"A simple pure rt_str_* accessor (rt_str_starts_with) can be inlined at its
codegen call site โ€” replacing the out-of-line precompiled-runtime.o C-ABI
call with its body โ€” so clang -O2 optimizes across the boundary, while
staying byte-identical and single-eval (let-bound expr arg)."

METHOD
Prim chosen: rt_str_starts_with (the str_len free-fn path is a dead/latent
branch: cg_rt_target()=="c" so the rt_* family is unreachable, and the
str_len builtin emits an uncompilable `hexa_str_len(HexaVal)`; the working
string-length path is `.len()` -> hexa_int(hexa_len(...)) which is already a
direct exported call). rt_str_starts_with IS live (emitted by `.starts_with()`)
and PURE (runtime_core.c):
int rt_str_starts_with(HexaVal s, HexaVal prefix) {
if (!HX_IS_STR(s) || !HX_IS_STR(prefix)) return 0;
size_t plen = HX_STRLEN(prefix);
return hxlcl_strncmp(HX_STR(s), HX_STR(prefix), plen) == 0;
}
Inline emitted at all 3 call sites with a DOUBLE-EVAL GUARD (both args
let-bound once into __sw_s / __sw_p in a GCC statement-expression).

VERIFICATION (verbatim)
[isolation diff โ€” base vs inline emitted C, ONLY the call site differs]
23c23
< hexa_println(hexa_to_string(hexa_bool(rt_str_starts_with(__hexa_sl_0, __hexa_sl_1))));
---
> hexa_println(hexa_to_string(hexa_bool(({ HexaVal __sw_s = (__hexa_sl_0), __sw_p = (__hexa_sl_1); (!HX_IS_STR(__sw_s) || !HX_IS_STR(__sw_p)) ? 0 : (hxlcl_strncmp(HX_STR(__sw_s), HX_STR(__sw_p), HX_STRLEN(__sw_p)) == 0); }))));

[double-eval guard โ€” design CORRECT: each arg let-bound exactly once,
visible above as `HexaVal __sw_s = (...), __sw_p = (...)`. BASE single-eval
reference: result=true / calls=1]

[BASE corpus g5 reference output]
true / false / true / true / false / true / false / true / hits=3

[THE WALL โ€” inline arm fails to LINK]
Undefined symbols for architecture arm64:
"_HX_STRLEN", referenced from:
_u_main in sw_corpus_inl.o (x8)
clang: error: linker command failed with exit code 1

[root cause โ€” runtime.h export boundary]
grep -c "HX_STRLEN\|hxlcl_strncmp" self/runtime.h -> 0
runtime.h does NOT include <string.h> and exports ONLY HexaVal-level ABI
functions (rt_str_starts_with itself is exported at runtime.h:257). HX_IS_STR
and HX_STR ARE exported; HX_STRLEN and hxlcl_strncmp are NOT โ€” they exist only
inside the runtime.c amalgam. Emitted programs (`#include "runtime.h"` + link
runtime.o) therefore cannot reference the prim's internal helpers.

FINDING (ruled-out axis)
rt_str_* body-inlining at the codegen call site is BLOCKED by the runtime.h
ABI boundary. Every meaningful rt_str_* accessor body touches runtime-internal
scalar helpers (HX_STRLEN, hxlcl_strncmp, libc strlen/strncmp) that are
deliberately NOT exported to emitted programs. This is the structural reason
the hexa_int pilot (#2) succeeded where strings cannot: hexa_int's box-literal
`((HexaVal){.tag=TAG_INT,.i=N})` uses ONLY the runtime.h-visible HexaVal struct
layout + a literal โ€” no internal helper. String prims have no such purely-layout
inline form. The double-eval guard design is independently correct (proven in
the emitted C); the falsification is purely at the link boundary, NOT the guard.
Inlining would require either exporting the internal helpers in runtime.h (a
runtime-ABI change that re-widens the very boundary UNSHADOW aims to remove) or
adding `#include <string.h>` to the global emit prelude (not a surgical call-site
inline; reintroduces the strncmp/strlen macro-shadow collisions runtime.c fights).
Both are out of scope for a call-site inline. CLOSED-NEGATIVE.
24 changes: 24 additions & 0 deletions bench/unshadow/strstarts_corpus.hexa
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// bench/unshadow/strstarts_corpus.hexa โ€” string-heavy correctness corpus
// for the `.starts_with()` -> rt_str_starts_with prim. Exercises empty
// strings, exact match, over-length prefix, and UTF-8. BASE output (g5
// reference, verified on mini macOS arm64):
// true / false / true / true / false / true / false / true / hits=3
fn main() {
let s = "hello, world"
println(to_string(s.starts_with("hello")))
println(to_string(s.starts_with("world")))
println(to_string(s.starts_with("")))
println(to_string(s.starts_with("hello, world")))
println(to_string(s.starts_with("hello, world!")))
println(to_string("".starts_with("")))
println(to_string("".starts_with("x")))
println(to_string("cafรฉ โ˜•".starts_with("cafรฉ")))
let mut hits = 0
let words = ["apple", "apricot", "banana", "avocado", "cherry"]
let mut i = 0
while i < 5 {
if words[i].starts_with("a") { hits = hits + 1 }
i = i + 1
}
println("hits=" + to_string(hits))
}
24 changes: 24 additions & 0 deletions bench/unshadow/strstarts_heavy.hexa
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// bench/unshadow/strstarts_heavy.hexa โ€” UNSHADOW #2 EXTENSION micro-bench.
//
// Hot loop hammering the `rt_str_starts_with` string prim via the
// `.starts_with()` method. Intended A/B: base (out-of-line
// `rt_str_starts_with(s, p)` call across the precompiled-runtime.o C-ABI
// boundary) vs an inlined-at-call-site arm.
//
// FINDING (closed-negative): the inline arm CANNOT be linked. The prim body
// (`HX_STRLEN(prefix)` + `hxlcl_strncmp(HX_STR(s), HX_STR(p), plen)`) needs
// runtime-internal symbols that runtime.h deliberately does NOT export to
// emitted programs (only the HexaVal-level ABI fn `rt_str_starts_with` is
// exported). So this bench runs the BASE arm only; the inline arm is the
// ruled-out axis. See domains/UNSHADOW.log.md (2026-05-29 extension entry).
fn main() {
let s = "the quick brown fox jumps over the lazy dog 0123456789"
let p = "the quick"
let mut acc = 0
let mut i = 0
while i < 20000000 {
if s.starts_with(p) { acc = acc + 1 }
i = i + 1
}
println("acc=" + to_string(acc))
}
22 changes: 22 additions & 0 deletions bench/unshadow/strstarts_singleeval.hexa
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// bench/unshadow/strstarts_singleeval.hexa โ€” double-eval guard probe.
//
// The receiver of `.starts_with()` is a side-effecting call (`make_str()`
// increments a global counter). A correct single-evaluation lowering prints
// `calls=1`; a double-eval (the to_int / comptime-fold miscompile class)
// would print `calls=2`.
//
// The proposed inline DID let-bind the args exactly once in the emitted C โ€”
// `({ HexaVal __sw_s = (make_str()), __sw_p = ("hello"); ... })`
// โ€” so the single-eval guard design is correct (verified in the emitted C).
// The inline is blocked at LINK (unexported runtime internals), not by the
// guard. The BASE arm here is the live reference: it prints `calls=1`.
let mut _calls = 0
fn make_str() -> string {
_calls = _calls + 1
return "hello_world"
}
fn main() {
let r = make_str().starts_with("hello")
println("result=" + to_string(r))
println("calls=" + to_string(_calls))
}
29 changes: 29 additions & 0 deletions domains/UNSHADOW.log.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,35 @@

Append-only history sister of `UNSHADOW.md`. Each entry starts with `## <ISO timestamp> โ€” <header>` (newest on top); body = `- [x]` (done) / `- [ ]` (pending) checkbox tasks.

## 2026-05-29 โ€” #2 EXTENSION (rt_str_* string prim inline) โ€” ๐Ÿ”ด CLOSED-NEGATIVE (ABI ๊ฒฝ๊ณ„ ์ฐจ๋‹จ)

#2 hexa_int ์ธ๋ผ์ธ์„ STRING ๋Ÿฐํƒ€์ž„ prim ์œผ๋กœ ํ™•์žฅ ์‹œ๋„. ๊ฒฐ๋ก : **rt_str_\* prim ์˜ body ๋ฅผ
codegen ์ฝœ์‚ฌ์ดํŠธ์— ์ธ๋ผ์ธํ•˜๋Š” ๊ฒƒ์€ `runtime.h` export ๊ฒฝ๊ณ„๋กœ ์ธํ•ด ๊ตฌ์กฐ์ ์œผ๋กœ ๋ถˆ๊ฐ€** โ€”
publishable closed-negative. milestone flip ์—†์Œ(ํ™•์žฅ ๋กœ๊ทธ only). verdict ์›๋ฌธ:
`.verdicts/unshadow-2ext-rt-str-inline/F-UNSHADOW-2EXT-RT-STR-INLINE.txt`.

- [x] prim ์„ ์ •: `rt_str_starts_with` (HexaVal 2-arg ยท PURE ยท ๋น„ํ• ๋‹น ยท LIVE). ํ›„๋ณด ์กฐ์‚ฌ์—์„œ ๋ฐœ๊ฒฌํ•œ ํ•จ์ •:
`str_len` free-fn ๊ฒฝ๋กœ๋Š” dead โ€” `cg_rt_target()=="c"` ๋ผ `rt_str_len_v`/`rt_str_*` ๋ถ„๊ธฐ๋Š” ๋„๋‹ฌ ๋ถˆ๊ฐ€์ด๊ณ ,
`str_len` ๋นŒํŠธ์ธ์€ `hexa_str_len(HexaVal)`(์ปดํŒŒ์ผ ๋ถˆ๊ฐ€) ๋ฅผ emitํ•˜๋Š” latent ๋ฒ„๊ทธ ๊ฒฝ๋กœ. ์‹ค์ œ ์ž‘๋™ํ•˜๋Š”
๋ฌธ์ž์—ด-๊ธธ์ด ๊ฒฝ๋กœ๋Š” `.len()` โ†’ `hexa_int(hexa_len(...))` ๋กœ **์ด๋ฏธ** ์ง์ ‘ export ์ฝœ(์ถ”๊ฐ€ ์ธ๋ผ์ธ ์—ฌ์ง€ ์—†์Œ).
- [x] codegen ๊ตฌํ˜„: 3๊ฐœ `.starts_with()` ์ฝœ์‚ฌ์ดํŠธ(self/codegen.hexa) ๋ฅผ GCC statement-expression ์ธ๋ผ์ธ์œผ๋กœ ๊ต์ฒด.
DOUBLE-EVAL ๊ฐ€๋“œ ์ ์šฉ โ€” ๋‘ ์ธ์ž ๋ชจ๋‘ `__sw_s`/`__sw_p` ๋กœ **๋‹จ ํ•œ ๋ฒˆ** let-bind (expr ์ธ์ž๋ผ ํ•„์ˆ˜).
- [x] g5 ๊ฒฉ๋ฆฌ byte-diff: base vs inline emitted C ๊ฐ€ **์ฝœ์‚ฌ์ดํŠธ ๋ผ์ธ๋งŒ** ์ƒ์ด(๋‚˜๋จธ์ง€ byte-identical). verbatim:
`< ...rt_str_starts_with(__hexa_sl_0, __hexa_sl_1)...`
`> ...({ HexaVal __sw_s = (__hexa_sl_0), __sw_p = (__hexa_sl_1); (!HX_IS_STR(__sw_s)||!HX_IS_STR(__sw_p))?0:(hxlcl_strncmp(HX_STR(__sw_s),HX_STR(__sw_p),HX_STRLEN(__sw_p))==0); })...`
- [x] single-eval ๊ฐ€๋“œ ๊ฒ€์ฆ: emitted C ์—์„œ ๊ฐ ์ธ์ž๊ฐ€ ์ •ํ™•ํžˆ 1ํšŒ let-bind ๋จ์ด ์œก์•ˆ ํ™•์ธ. BASE single-eval ๋ ˆํผ๋Ÿฐ์Šค
`make_str().starts_with("hello")` โ†’ `calls=1`. ๊ฐ€๋“œ **์„ค๊ณ„๋Š” ์ •์ƒ** โ€” ๋ฐ˜์ฆ์€ ๊ฐ€๋“œ๊ฐ€ ์•„๋‹ˆ๋ผ ๋งํฌ ๊ฒฝ๊ณ„.
- [x] **THE WALL** (๋ฐ˜์ฆ): inline arm ๋งํฌ ์‹คํŒจ โ€” `Undefined symbols: "_HX_STRLEN", referenced from _u_main` (x8).
๊ทผ๋ณธ์›์ธ: `grep -c "HX_STRLEN|hxlcl_strncmp" self/runtime.h` = **0**. runtime.h ๋Š” `<string.h>` ๋„ ๋ฏธํฌํ•จ,
HexaVal-๋ ˆ๋ฒจ ABI ํ•จ์ˆ˜๋งŒ export(`rt_str_starts_with` ์ž์ฒด๋Š” runtime.h:257 export). `HX_IS_STR`/`HX_STR` ๋Š”
export ๋˜์ง€๋งŒ `HX_STRLEN`/`hxlcl_strncmp` ๋Š” runtime.c amalgam ๋‚ด๋ถ€์—๋งŒ ์กด์žฌ โ†’ emitted ํ”„๋กœ๊ทธ๋žจ์—์„œ ์ฐธ์กฐ ๋ถˆ๊ฐ€.
- [x] **์ •์งํ•œ finding (ruled-out axis)**: hexa_int ๊ฐ€ ์„ฑ๊ณตํ•œ ์ด์œ  = box-literal `((HexaVal){.tag=TAG_INT,.i=N})`
์ด runtime.h-๊ฐ€์‹œ struct ๋ ˆ์ด์•„์›ƒ + ๋ฆฌํ„ฐ๋Ÿด๋งŒ ์‚ฌ์šฉ(๋‚ด๋ถ€ ํ—ฌํผ 0). ๋ชจ๋“  ์˜๋ฏธ์žˆ๋Š” rt_str_\* body ๋Š” ๋‚ด๋ถ€ ์Šค์นผ๋ผ
ํ—ฌํผ(HX_STRLEN/strncmp)๋ฅผ ๊ฑด๋“œ๋ ค ์ˆœ์ˆ˜-๋ ˆ์ด์•„์›ƒ ์ธ๋ผ์ธ ํ˜•ํƒœ๊ฐ€ ์—†์Œ. ์ธ๋ผ์ธํ•˜๋ ค๋ฉด (a) runtime.h ์— ๋‚ด๋ถ€
ํ—ฌํผ export(=UNSHADOW ๊ฐ€ ์—†์• ๋ ค๋Š” ๊ฒฝ๊ณ„๋ฅผ ๋„๋กœ ๋„“ํž˜) ๋˜๋Š” (b) ์ „์—ญ emit ํ”„๋ฆฌ์•ฐ๋ธ”์— `#include <string.h>`
์ถ”๊ฐ€(์ฝœ์‚ฌ์ดํŠธ surgical ์ธ๋ผ์ธ ์•„๋‹˜ + strncmp/strlen ๋งคํฌ๋กœ-shadow ์ถฉ๋Œ ์žฌ์œ ๋ฐœ) ํ•„์š”. ๋‘˜ ๋‹ค ์Šค์ฝ”ํ”„ ๋ฐ– โ†’ CLOSED.
- [x] ์ •ํ™•์„ฑ ์šฐ์„ : silent miscompile ์œ„ํ—˜ ํšŒํ”ผ ์œ„ํ•ด codegen ๋ณ€๊ฒฝ **revert**(์ธ๋ผ์ธ์€ `_HX_STRLEN` ๋ฏธ์ •์˜๋กœ ๋นŒ๋“œ ์ž์ฒด ์‹คํŒจ).
bench/log/verdict ๋งŒ commit. UNSHADOW.md cross-layer milestone **๋ฏธ๋ณ€๊ฒฝ**(extension).
## 2026-05-29T12:59Z โ€” #4 atlas-guided const-fold pilot โ€” ๐Ÿ”ต๐ŸŸข g5 IDENTICAL ยท ~65% faster

UNSHADOW.easy.md ยงA ("๋ฃจํ”„/ํ˜ธ์ถœ์„ ์•ˆ ๋Œ๊ธฐ"): codegen ์ด PUREยทatlas-๊ฒ€์ฆ
Expand Down
Loading