Optimize javm recompiler performance

## Goal

Improve the javm recompiler (x86-64 JIT) performance. Open-ended issue for incremental optimizations.

## Architecture (capability-javm-v2)

The PVM uses a capability-based kernel with Harvard architecture:

- **Kernel** (`crates/javm/src/kernel.rs`): dispatches ecalli, manages VM lifecycle (CREATE/CALL/REPLY), capability operations (MAP/UNMAP/SPLIT/GRANT/REVOKE)
- **Recompiler** (`crates/javm/src/recompiler/`): x86-64 JIT, one compilation per CODE cap, shared across all VMs using that cap
- **Interpreter** (`crates/javm/src/interpreter/`): pre-decoded bytecode interpreter, reference backend
- **Signal handler** (`recompiler/signal.rs`): SIGSEGV-based memory bounds checking via guard pages + trap table
- **Gas metering** (`gas_sim.rs`): per-basic-block pipeline gas simulation
- **Backing store** (`backing.rs`): memfd-backed physical memory pool, MAP_SHARED into 4GB CODE windows

### Key types
- `CodeCap`: compiled PVM code + 4GB virtual window (shared across VMs)
- `VmInstance`: register state + cap table + lifecycle state (u16 VM ID, max 65535)
- `JitContext`: repr(C) struct at fixed offsets for JIT native code (regs, gas, memory ptr, PC)
- `live_ctx`: optimization keeping JitContext alive across ecalli dispatch (avoids register copies)

### Multi-VM execution
- `CALL(CODE)` → CREATE child VM (cap bitmask propagation)
- `CALL(HANDLE/CALLABLE)` → suspend caller, run target VM
- `REPLY` → return to caller, restore gas
- Context switch = register swap (all VMs sharing a CODE cap use the same 4GB window)
- `recompiler_resume_cap`: fast JitContext re-entry when same VM + same code cap continues

## Benchmark suite

```bash
# Single-VM workloads (8 benchmarks: fib, hostcall, sort, sieve, blake2b, keccak, ed25519, ecrecover)
cargo bench -p grey-bench --bench pvm_bench

# Multi-VM workload (fib_recur: recursive fibonacci via CREATE + CALL, 21K VMs)
cargo bench -p grey-bench --bench subvm_bench

# Grey-only (fast iteration)
cargo bench -p grey-bench -- 'grey-'

# PolkaVM comparison (pipeline gas metering)
POLKAVM_ALLOW_EXPERIMENTAL=1 POLKAVM_DEFAULT_COST_MODEL=full-l1-hit cargo bench -p grey-bench
```

## Optimization areas

**Code generation (`recompiler/codegen.rs`):**
- Fix Mul64+MulUpper fusion (#190) — disabled due to RAX aliasing bug, impacts crypto workloads
- Reduce native code size for better I-cache utilization
- Profile with `valgrind --tool=callgrind` to find hotspots
- Batch capacity checks in the compile loop

**Multi-VM overhead (`kernel.rs`):**
- Context switch cost: register save/restore when switching between VMs
- `live_ctx` optimization: currently disabled after VM switches — explore keeping it alive across CALL/REPLY when code_cap_id matches
- VM allocation: `Vec::push` for new VMs — consider arena allocation
- Cap table: 256 × Option<Cap> per VM (~8KB) — consider sparse representation for VMs with few caps

**Kernel dispatch (`kernel.rs`):**
- `dispatch_ecalli` is `#[inline(always)]` — verify this stays optimal
- ProtocolCall resume path vs full segment rebuild

**Compilation cost:**
- JIT compilation happens once per CODE cap, not per VM. Already amortized for multi-VM workloads.
- For short-lived programs (blake2b, keccak), compilation overhead dominates — tracked separately

## Rules

- Always benchmark before AND after. Use criterion's built-in comparison.
- If a change shows no measurable improvement or regresses, revert it.
- Do not use `polkavm` or `polkavm-common` crates — implement from first principles.
- Verify correctness: `cargo test -p grey-bench` checks a0 values and exact gas match between interpreter and recompiler.

Replaces #56.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize javm recompiler performance #398

Goal

Architecture (capability-javm-v2)

Key types

Multi-VM execution

Benchmark suite

Optimization areas

Rules

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Optimize javm recompiler performance #398

Description

Goal

Architecture (capability-javm-v2)

Key types

Multi-VM execution

Benchmark suite

Optimization areas

Rules

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions