Skip to content

Cache compiled JIT code across nub-kernel invocations #844

Description

@sorpaas

Background

Each `Nub::invoke_cached` call against the Hyperlight backend goes through:

```
host: Nub::invoke_cached
→ guest: FN_ID_NUB_INVOKE_CACHED → nub_invoke_cached
→ jit_run::run_pvm_with_mem
→ Compiler::new(...).compile(code, bitmask) ← always
→ execute the freshly-compiled native code
```

The `Compiler::compile` call happens every invocation. There is no JIT-compiled-code cache. Confirmed by grep in `rust/nub-arch-x86/src/jit_run.rs:277` — single `compile()` call site, inside the per-call hot path. The host-side state cache (`nub-host-kvm::cache`) already publishes immutable `PublishSpec` slabs into the shared cache region under `instance_hash`; the guest then re-runs the recompiler over those slabs on every call.

For workloads with a short execution body (e.g. `goldilocks_mul` 100k chained muls finishing in ~500 µs), the recompile pass is a meaningful fraction of total time. For long-running Cap invocations it's noise.

Goal

Cache the native code (or the full `CompileResult`: `native_code`, `dispatch_table`, `trap_table`, `exit_label_offset`) keyed on `instance_hash`, so a hot `invoke_cached` is just "locate cached compile + jump into it."

Sketch

  • Compile output is deterministic given `(code, bitmask, jump_table)` plus the `HelperFns` sentinel layout (which is fixed at the call site). So the cache key can just be `instance_hash` — the same hash the state cache already uses.
  • The most natural place is inside the cache region itself — alongside the immutable code/bitmask/jump_table slabs that `cache::publish` already lays down. Add a per-slot `native_code_off / native_code_len` (plus its sidecar tables), compiled lazily on first invoke and reused thereafter.
  • Memory cost: `native_code` is ~5× the PVM bytecode for typical workloads. The cache region's `Talc` allocator will need headroom; possibly we want eviction (LRU?) once we have multi-tenant chains.
  • Eviction needs to coordinate with `pin`/`unpin` — a slot being invoked must not have its native code freed.

Constraint on javm-bench

`javm-bench` must keep measuring recompile + exec combined, because that's the cost a fresh Cap invocation pays in production today and the cost a cold cache pays after eviction. Even after this issue lands, the bench should publish + invoke as it does now, with the cache servicing the warm path implicitly. If we want a separate "warm exec only" number for the issue table in #842 we can add an opt-in alongside the existing harness — but the headline `recompiler` column stays recompile + exec.

Out of scope

  • Disk-persistent cache: V0 lives in the cache region across the sandbox lifetime; persisting across sandbox boots is a separate problem.
  • Self-modifying / re-publish flow: today `cache::publish` rejects duplicate hashes; eviction + republish is a future API.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions