Goal
Improve the javm recompiler (x86-64 JIT) performance. Open-ended issue for incremental optimizations.
Architecture (capability-javm-v2)
The PVM uses a capability-based kernel with Harvard architecture:
- Kernel (
crates/javm/src/kernel.rs): dispatches ecalli, manages VM lifecycle (CREATE/CALL/REPLY), capability operations (MAP/UNMAP/SPLIT/GRANT/REVOKE)
- Recompiler (
crates/javm/src/recompiler/): x86-64 JIT, one compilation per CODE cap, shared across all VMs using that cap
- Interpreter (
crates/javm/src/interpreter/): pre-decoded bytecode interpreter, reference backend
- Signal handler (
recompiler/signal.rs): SIGSEGV-based memory bounds checking via guard pages + trap table
- Gas metering (
gas_sim.rs): per-basic-block pipeline gas simulation
- Backing store (
backing.rs): memfd-backed physical memory pool, MAP_SHARED into 4GB CODE windows
Key types
CodeCap: compiled PVM code + 4GB virtual window (shared across VMs)
VmInstance: register state + cap table + lifecycle state (u16 VM ID, max 65535)
JitContext: repr(C) struct at fixed offsets for JIT native code (regs, gas, memory ptr, PC)
live_ctx: optimization keeping JitContext alive across ecalli dispatch (avoids register copies)
Multi-VM execution
CALL(CODE) → CREATE child VM (cap bitmask propagation)
CALL(HANDLE/CALLABLE) → suspend caller, run target VM
REPLY → return to caller, restore gas
- Context switch = register swap (all VMs sharing a CODE cap use the same 4GB window)
recompiler_resume_cap: fast JitContext re-entry when same VM + same code cap continues
Benchmark suite
# Single-VM workloads (8 benchmarks: fib, hostcall, sort, sieve, blake2b, keccak, ed25519, ecrecover)
cargo bench -p grey-bench --bench pvm_bench
# Multi-VM workload (fib_recur: recursive fibonacci via CREATE + CALL, 21K VMs)
cargo bench -p grey-bench --bench subvm_bench
# Grey-only (fast iteration)
cargo bench -p grey-bench -- 'grey-'
# PolkaVM comparison (pipeline gas metering)
POLKAVM_ALLOW_EXPERIMENTAL=1 POLKAVM_DEFAULT_COST_MODEL=full-l1-hit cargo bench -p grey-bench
Optimization areas
Code generation (recompiler/codegen.rs):
Multi-VM overhead (kernel.rs):
- Context switch cost: register save/restore when switching between VMs
live_ctx optimization: currently disabled after VM switches — explore keeping it alive across CALL/REPLY when code_cap_id matches
- VM allocation:
Vec::push for new VMs — consider arena allocation
- Cap table: 256 × Option per VM (~8KB) — consider sparse representation for VMs with few caps
Kernel dispatch (kernel.rs):
dispatch_ecalli is #[inline(always)] — verify this stays optimal
- ProtocolCall resume path vs full segment rebuild
Compilation cost:
- JIT compilation happens once per CODE cap, not per VM. Already amortized for multi-VM workloads.
- For short-lived programs (blake2b, keccak), compilation overhead dominates — tracked separately
Rules
- Always benchmark before AND after. Use criterion's built-in comparison.
- If a change shows no measurable improvement or regresses, revert it.
- Do not use
polkavm or polkavm-common crates — implement from first principles.
- Verify correctness:
cargo test -p grey-bench checks a0 values and exact gas match between interpreter and recompiler.
Replaces #56.
Goal
Improve the javm recompiler (x86-64 JIT) performance. Open-ended issue for incremental optimizations.
Architecture (capability-javm-v2)
The PVM uses a capability-based kernel with Harvard architecture:
crates/javm/src/kernel.rs): dispatches ecalli, manages VM lifecycle (CREATE/CALL/REPLY), capability operations (MAP/UNMAP/SPLIT/GRANT/REVOKE)crates/javm/src/recompiler/): x86-64 JIT, one compilation per CODE cap, shared across all VMs using that capcrates/javm/src/interpreter/): pre-decoded bytecode interpreter, reference backendrecompiler/signal.rs): SIGSEGV-based memory bounds checking via guard pages + trap tablegas_sim.rs): per-basic-block pipeline gas simulationbacking.rs): memfd-backed physical memory pool, MAP_SHARED into 4GB CODE windowsKey types
CodeCap: compiled PVM code + 4GB virtual window (shared across VMs)VmInstance: register state + cap table + lifecycle state (u16 VM ID, max 65535)JitContext: repr(C) struct at fixed offsets for JIT native code (regs, gas, memory ptr, PC)live_ctx: optimization keeping JitContext alive across ecalli dispatch (avoids register copies)Multi-VM execution
CALL(CODE)→ CREATE child VM (cap bitmask propagation)CALL(HANDLE/CALLABLE)→ suspend caller, run target VMREPLY→ return to caller, restore gasrecompiler_resume_cap: fast JitContext re-entry when same VM + same code cap continuesBenchmark suite
Optimization areas
Code generation (
recompiler/codegen.rs):valgrind --tool=callgrindto find hotspotsMulti-VM overhead (
kernel.rs):live_ctxoptimization: currently disabled after VM switches — explore keeping it alive across CALL/REPLY when code_cap_id matchesVec::pushfor new VMs — consider arena allocationKernel dispatch (
kernel.rs):dispatch_ecalliis#[inline(always)]— verify this stays optimalCompilation cost:
Rules
polkavmorpolkavm-commoncrates — implement from first principles.cargo test -p grey-benchchecks a0 values and exact gas match between interpreter and recompiler.Replaces #56.