Goal
Improve the javm interpreter performance. The interpreter is the reference backend — it must produce identical results and gas consumption as the recompiler. Open-ended issue for incremental optimizations.
Architecture (capability-javm-v2)
The interpreter lives in crates/javm/src/interpreter/mod.rs. It uses a pre-decoded instruction array for fast dispatch:
- Pre-decode (
predecode_instructions): raw PVM bytecode → flat Vec<DecodedInst> with resolved branch targets, pre-computed gas block costs, and flattened operands (ra, rb, rd, imm1, imm2)
- Execution (
run_segment): sequential instruction dispatch via match on DecodedInst.opcode, with pre-resolved next_idx / target_idx for branches
- Gas metering: per-gas-block charge at block entry (same pipeline model as recompiler)
Key types
DecodedInst: pre-decoded instruction with opcode, flat operands, gas cost, pre-resolved next/target indices
Interpreter: PVM state (registers, PC, gas, code, memory pointer, basic_block_starts)
InterpreterProgram: pre-decoded program (instructions, pc_to_idx mapping)
Multi-VM context
The interpreter runs within the same kernel as the recompiler. VM switching (CALL/REPLY) is handled by the kernel — the interpreter just executes segments and returns exit reasons. The kernel selects interpreter vs recompiler via PvmBackend / GREY_PVM env var.
Benchmark suite
# Single-VM workloads (interpreter columns)
cargo bench -p grey-bench --bench pvm_bench -- 'interpreter'
# Multi-VM workload
GREY_PVM=interpreter cargo bench -p grey-bench --bench subvm_bench
# Compare interpreter vs recompiler
cargo bench -p grey-bench --bench pvm_bench
# Full comparison including polkavm
POLKAVM_ALLOW_EXPERIMENTAL=1 POLKAVM_DEFAULT_COST_MODEL=full-l1-hit cargo bench -p grey-bench
Optimization areas
Dispatch overhead:
- Current: match-based dispatch on
DecodedInst.opcode
- Threaded/computed-goto dispatch (requires unsafe + function pointer table)
- Profile-guided optimization of opcode ordering in the match
Pre-decode improvements:
- Instruction fusion during pre-decode (e.g., load_imm + add → add_imm)
- Specialized fast paths for common instruction sequences
- Pack
DecodedInst tighter (currently ~56 bytes per instruction — cache pressure)
Memory access:
- Current: bounds-checked via
read_u8/write_u8 etc. with % (1u64 << 32) masking
- Consider batch bounds checking or page-table-based dispatch
Gas metering:
- Gas block costs are pre-computed — charge is a single subtract + sign check per block
- Investigate whether the branch on negative gas is a significant branch misprediction source
Pre-decode cost:
- Pre-decode runs once per
InterpreterProgram::predecode() call
- For short-lived programs, this is a significant fraction of total time
- Consider lazy pre-decode (decode on first execution of each block)
Rules
- Always benchmark before AND after. Use criterion's built-in comparison.
- If a change shows no measurable improvement or regresses, revert it.
- Interpreter must produce identical results and gas consumption as recompiler —
cargo test -p grey-bench verifies this.
- Do not use
polkavm or polkavm-common crates — implement from first principles.
Goal
Improve the javm interpreter performance. The interpreter is the reference backend — it must produce identical results and gas consumption as the recompiler. Open-ended issue for incremental optimizations.
Architecture (capability-javm-v2)
The interpreter lives in
crates/javm/src/interpreter/mod.rs. It uses a pre-decoded instruction array for fast dispatch:predecode_instructions): raw PVM bytecode → flatVec<DecodedInst>with resolved branch targets, pre-computed gas block costs, and flattened operands (ra, rb, rd, imm1, imm2)run_segment): sequential instruction dispatch via match onDecodedInst.opcode, with pre-resolvednext_idx/target_idxfor branchesKey types
DecodedInst: pre-decoded instruction with opcode, flat operands, gas cost, pre-resolved next/target indicesInterpreter: PVM state (registers, PC, gas, code, memory pointer, basic_block_starts)InterpreterProgram: pre-decoded program (instructions, pc_to_idx mapping)Multi-VM context
The interpreter runs within the same kernel as the recompiler. VM switching (CALL/REPLY) is handled by the kernel — the interpreter just executes segments and returns exit reasons. The kernel selects interpreter vs recompiler via
PvmBackend/GREY_PVMenv var.Benchmark suite
Optimization areas
Dispatch overhead:
DecodedInst.opcodePre-decode improvements:
DecodedInsttighter (currently ~56 bytes per instruction — cache pressure)Memory access:
read_u8/write_u8etc. with% (1u64 << 32)maskingGas metering:
Pre-decode cost:
InterpreterProgram::predecode()callRules
cargo test -p grey-benchverifies this.polkavmorpolkavm-commoncrates — implement from first principles.