Goal
Optimize the transpilation pipeline (Rust → RISC-V → PVM) to produce PVM blobs that execute faster on the capability-javm-v2 kernel. The transpiler itself can be arbitrarily slow — only the resulting blob's runtime performance matters.
Pipeline (capability-javm-v2)
Rust source → rustc (riscv64em-javm.json) → ELF → grey-transpiler → JAR v2 capability manifest blob
↓
kernel → CODE cap → recompiler (JIT) → execute
JAR v2 blob format
- Magic: `JAR\x02`
- Header: memory_pages, cap_count, invoke_cap
- Capability manifest: entries for CODE, DATA caps with slot indices
- Data section: code sub-blob (jump table + PVM bytecode + packed bitmask) + initial data
Capability layout (standard service)
- Cap 64: CODE (instruction cache, Harvard — not in address space)
- Cap 65: STACK (zero-filled DATA, RW)
- Cap 66: RO data (constants, string literals)
- Cap 67: RW data (initialized globals)
- Cap 68: HEAP (zero-filled DATA, RW)
- Cap 254: UNTYPED (bump allocator, omitted when memory_pages=0)
- Cap 255: IPC/args
Key differences from old flat-memory model
- Harvard architecture: code is NOT in the address space (CODE cap vs DATA caps)
- 4GB virtual window per CODE cap (mmap-based, shared across VMs)
- Memory accessed via DATA caps mapped at specific base pages
- ecalli(N) dispatches to cap[N] — protocol caps, CODE (CREATE), HANDLE (CALL VM), management ops
- Programs terminate via `ecalli(0xFF)` = REPLY (no halt address)
Transpiler components
- `grey-transpiler/src/linker.rs`: ELF → PVM transpilation (section parsing, relocation, RISC-V → PVM translation)
- `grey-transpiler/src/emitter.rs`: blob generation (`build_service_program`), bitmask packing
- `grey-transpiler/src/assembler.rs`: hand-craft PVM bytecode (used by benchmarks)
- `build-javm/src/lib.rs`: build pipeline (cargo → ELF → link_elf → blob)
What to optimize
Target JSON (`riscv64em-javm.json`):
- Profile different nightly versions for best RISC-V codegen
- Experiment with additional RISC-V extensions
- Tune inline thresholds (`--inline-threshold=275` currently)
Transpiler (`grey-transpiler`):
- Inter-block liveness analysis (eliminate dead load_imm across blocks)
- Superblock formation / trace-based optimization
- Peephole passes: dead store elimination, load forwarding
- Stack frame optimization: reduce spills for register-heavy code
Blob layout:
- DATA cap layout optimization (minimize page faults during execution)
- Code sub-blob ordering (hot paths first for I-cache)
How to benchmark
# Full suite (grey + polkavm comparison)
POLKAVM_ALLOW_EXPERIMENTAL=1 POLKAVM_DEFAULT_COST_MODEL=full-l1-hit cargo bench -p grey-bench
# Grey-only (faster iteration)
cargo bench -p grey-bench -- 'grey-'
# Multi-VM benchmark
cargo bench -p grey-bench --bench subvm_bench
# Verify correctness after any change
cargo test -p grey-bench
Rules
- Always benchmark before AND after. Use criterion's built-in comparison.
- If a change shows no measurable improvement or regresses, revert it.
- The transpiler can be slow — only runtime performance of the resulting blob matters.
- Verify correctness: `cargo test -p grey-bench` checks exact result and gas match.
Replaces #84.
Goal
Optimize the transpilation pipeline (Rust → RISC-V → PVM) to produce PVM blobs that execute faster on the capability-javm-v2 kernel. The transpiler itself can be arbitrarily slow — only the resulting blob's runtime performance matters.
Pipeline (capability-javm-v2)
JAR v2 blob format
Capability layout (standard service)
Key differences from old flat-memory model
Transpiler components
What to optimize
Target JSON (`riscv64em-javm.json`):
Transpiler (`grey-transpiler`):
Blob layout:
How to benchmark
Rules
Replaces #84.