Skip to content

Optimize javm interpreter performance #400

Description

@sorpaas

Goal

Improve the javm interpreter performance. The interpreter is the reference backend — it must produce identical results and gas consumption as the recompiler. Open-ended issue for incremental optimizations.

Architecture (capability-javm-v2)

The interpreter lives in crates/javm/src/interpreter/mod.rs. It uses a pre-decoded instruction array for fast dispatch:

  • Pre-decode (predecode_instructions): raw PVM bytecode → flat Vec<DecodedInst> with resolved branch targets, pre-computed gas block costs, and flattened operands (ra, rb, rd, imm1, imm2)
  • Execution (run_segment): sequential instruction dispatch via match on DecodedInst.opcode, with pre-resolved next_idx / target_idx for branches
  • Gas metering: per-gas-block charge at block entry (same pipeline model as recompiler)

Key types

  • DecodedInst: pre-decoded instruction with opcode, flat operands, gas cost, pre-resolved next/target indices
  • Interpreter: PVM state (registers, PC, gas, code, memory pointer, basic_block_starts)
  • InterpreterProgram: pre-decoded program (instructions, pc_to_idx mapping)

Multi-VM context

The interpreter runs within the same kernel as the recompiler. VM switching (CALL/REPLY) is handled by the kernel — the interpreter just executes segments and returns exit reasons. The kernel selects interpreter vs recompiler via PvmBackend / GREY_PVM env var.

Benchmark suite

# Single-VM workloads (interpreter columns)
cargo bench -p grey-bench --bench pvm_bench -- 'interpreter'

# Multi-VM workload
GREY_PVM=interpreter cargo bench -p grey-bench --bench subvm_bench

# Compare interpreter vs recompiler
cargo bench -p grey-bench --bench pvm_bench

# Full comparison including polkavm
POLKAVM_ALLOW_EXPERIMENTAL=1 POLKAVM_DEFAULT_COST_MODEL=full-l1-hit cargo bench -p grey-bench

Optimization areas

Dispatch overhead:

  • Current: match-based dispatch on DecodedInst.opcode
  • Threaded/computed-goto dispatch (requires unsafe + function pointer table)
  • Profile-guided optimization of opcode ordering in the match

Pre-decode improvements:

  • Instruction fusion during pre-decode (e.g., load_imm + add → add_imm)
  • Specialized fast paths for common instruction sequences
  • Pack DecodedInst tighter (currently ~56 bytes per instruction — cache pressure)

Memory access:

  • Current: bounds-checked via read_u8/write_u8 etc. with % (1u64 << 32) masking
  • Consider batch bounds checking or page-table-based dispatch

Gas metering:

  • Gas block costs are pre-computed — charge is a single subtract + sign check per block
  • Investigate whether the branch on negative gas is a significant branch misprediction source

Pre-decode cost:

  • Pre-decode runs once per InterpreterProgram::predecode() call
  • For short-lived programs, this is a significant fraction of total time
  • Consider lazy pre-decode (decode on first execution of each block)

Rules

  • Always benchmark before AND after. Use criterion's built-in comparison.
  • If a change shows no measurable improvement or regresses, revert it.
  • Interpreter must produce identical results and gas consumption as recompiler — cargo test -p grey-bench verifies this.
  • Do not use polkavm or polkavm-common crates — implement from first principles.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions