Optimize grey-transpiler for capability-javm-v2

## Goal

Optimize the transpilation pipeline (Rust → RISC-V → PVM) to produce PVM blobs that execute faster on the capability-javm-v2 kernel. The transpiler itself can be arbitrarily slow — only the resulting blob's runtime performance matters.

## Pipeline (capability-javm-v2)

```
Rust source → rustc (riscv64em-javm.json) → ELF → grey-transpiler → JAR v2 capability manifest blob
                                                                                    ↓
                                              kernel → CODE cap → recompiler (JIT) → execute
```

### JAR v2 blob format
- Magic: \`JAR\\x02\`
- Header: memory_pages, cap_count, invoke_cap
- Capability manifest: entries for CODE, DATA caps with slot indices
- Data section: code sub-blob (jump table + PVM bytecode + packed bitmask) + initial data

### Capability layout (standard service)
- Cap 64: CODE (instruction cache, Harvard — not in address space)
- Cap 65: STACK (zero-filled DATA, RW)
- Cap 66: RO data (constants, string literals)
- Cap 67: RW data (initialized globals)
- Cap 68: HEAP (zero-filled DATA, RW)
- Cap 254: UNTYPED (bump allocator, omitted when memory_pages=0)
- Cap 255: IPC/args

### Key differences from old flat-memory model
- Harvard architecture: code is NOT in the address space (CODE cap vs DATA caps)
- 4GB virtual window per CODE cap (mmap-based, shared across VMs)
- Memory accessed via DATA caps mapped at specific base pages
- ecalli(N) dispatches to cap\[N\] — protocol caps, CODE (CREATE), HANDLE (CALL VM), management ops
- Programs terminate via \`ecalli(0xFF)\` = REPLY (no halt address)

## Transpiler components

- **\`grey-transpiler/src/linker.rs\`**: ELF → PVM transpilation (section parsing, relocation, RISC-V → PVM translation)
- **\`grey-transpiler/src/emitter.rs\`**: blob generation (\`build_service_program\`), bitmask packing
- **\`grey-transpiler/src/assembler.rs\`**: hand-craft PVM bytecode (used by benchmarks)
- **\`build-javm/src/lib.rs\`**: build pipeline (cargo → ELF → link_elf → blob)

## What to optimize

**Target JSON (\`riscv64em-javm.json\`):**
- Profile different nightly versions for best RISC-V codegen
- Experiment with additional RISC-V extensions
- Tune inline thresholds (\`--inline-threshold=275\` currently)

**Transpiler (\`grey-transpiler\`):**
- Inter-block liveness analysis (eliminate dead load_imm across blocks)
- Superblock formation / trace-based optimization
- Peephole passes: dead store elimination, load forwarding
- Stack frame optimization: reduce spills for register-heavy code

**Blob layout:**
- DATA cap layout optimization (minimize page faults during execution)
- Code sub-blob ordering (hot paths first for I-cache)

## How to benchmark

```bash
# Full suite (grey + polkavm comparison)
POLKAVM_ALLOW_EXPERIMENTAL=1 POLKAVM_DEFAULT_COST_MODEL=full-l1-hit cargo bench -p grey-bench

# Grey-only (faster iteration)
cargo bench -p grey-bench -- 'grey-'

# Multi-VM benchmark
cargo bench -p grey-bench --bench subvm_bench

# Verify correctness after any change
cargo test -p grey-bench
```

## Rules

- Always benchmark before AND after. Use criterion's built-in comparison.
- If a change shows no measurable improvement or regresses, revert it.
- The transpiler can be slow — only runtime performance of the resulting blob matters.
- Verify correctness: \`cargo test -p grey-bench\` checks exact result and gas match.

Replaces #84.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize grey-transpiler for capability-javm-v2 #399

Goal

Pipeline (capability-javm-v2)

JAR v2 blob format

Capability layout (standard service)

Key differences from old flat-memory model

Transpiler components

What to optimize

How to benchmark

Rules

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Optimize grey-transpiler for capability-javm-v2 #399

Description

Goal

Pipeline (capability-javm-v2)

JAR v2 blob format

Capability layout (standard service)

Key differences from old flat-memory model

Transpiler components

What to optimize

How to benchmark

Rules

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions