perf: Store instruction as compact integer, extract fields on demand
Motivation
Currently, decode_instruction eagerly extracts all 12 fields from the encoded
instruction into a struct with separate fields. This has costs:
- Decode time: All fields are extracted upfront, even if not all are used
in a given code path
- Memory:
Instruction struct is ~48+ bytes vs 16 bytes for a u128
wrapper
An alternative approach is to store the raw encoded instruction and extract
fields via bit operations only when accessed. This trades slightly higher field
access cost for lower decode cost and smaller memory footprint.
Current implementation
pub struct Instruction {
pub off0: isize,
pub off1: isize,
pub off2: isize,
pub dst_register: Register,
pub op0_register: Register,
pub op1_addr: Op1Addr,
pub res: Res,
pub pc_update: PcUpdate,
pub ap_update: ApUpdate,
pub fp_update: FpUpdate,
pub opcode: Opcode,
pub opcode_extension: OpcodeExtension,
}
Proposed implementation
pub struct Instruction(u128);
impl Instruction {
pub fn offset0(self) -> isize { /* bit extraction */ }
pub fn dst_register(self) -> Register { /* bit extraction */ }
pub fn opcode_extension(self) -> OpcodeExtension { /* bit extraction */ }
// ... etc
}
Prior art
PR #1762 attempted this optimization but is now outdated and has bugs:
- Used
u64 encoding, but main now uses u128 to support Stwo opcodes
- Missing
OpcodeExtension field (Blake, BlakeFinalize, QM31Operation)
- Incorrect bit masks for
ap_update (used 3 bits instead of 2)
- Wrong handling of
ApUpdate::Add2 (it's derived from opcode == Call, not
encoded separately)
Current instruction encoding
From decoder.rs:
// opcode_extension| opcode|ap_update|pc_update|res_logic|op1_src|op0_reg|dst_reg
// 79 ... 17 16 15| 14 13 12| 11 10| 9 8 7| 6 5|4 3 2| 1| 0
- Bits 0-47: Three 16-bit offsets (off0, off1, off2)
- Bits 48-62: Flags (dst_reg, op0_reg, op1_src, res_logic, pc_update,
ap_update, opcode)
- Bit 63+: Opcode extension (0=Stone, 1=Blake, 2=BlakeFinalize, 3=QM31Operation)
Tasks
References
perf: Store instruction as compact integer, extract fields on demand
Motivation
Currently,
decode_instructioneagerly extracts all 12 fields from the encodedinstruction into a struct with separate fields. This has costs:
in a given code path
Instructionstruct is ~48+ bytes vs 16 bytes for au128wrapper
An alternative approach is to store the raw encoded instruction and extract
fields via bit operations only when accessed. This trades slightly higher field
access cost for lower decode cost and smaller memory footprint.
Current implementation
Proposed implementation
Prior art
PR #1762 attempted this optimization but is now outdated and has bugs:
u64encoding, but main now usesu128to support Stwo opcodesOpcodeExtensionfield (Blake, BlakeFinalize, QM31Operation)ap_update(used 3 bits instead of 2)ApUpdate::Add2(it's derived fromopcode == Call, notencoded separately)
Current instruction encoding
From
decoder.rs:ap_update, opcode)
Tasks
Instructionas au128wrapper with accessor methodsTryFrom<u128>with proper validation (matching currentdecode_instructionerror cases)ApUpdate::Add2whenopcode == Call && ap_update_num == 0Res::Unconstrainedwhenres_logic == 0 && pc_update == JnzFpUpdatederived fromOpcode#[inline]hints on accessorsReferences
vm/src/vm/decoding/decoder.rsvm/src/types/instruction.rs