Skip to content

feat(save): LazyStore dual-mode WASM backend#9898

Draft
dmadisetti wants to merge 1 commit into
mainfrom
dm/lazy-store-merge
Draft

feat(save): LazyStore dual-mode WASM backend#9898
dmadisetti wants to merge 1 commit into
mainfrom
dm/lazy-store-merge

Conversation

@dmadisetti

@dmadisetti dmadisetti commented Jun 15, 2026

Copy link
Copy Markdown
Member

Summary

This PR creates a WASM compatible LazyStore and introduces the DualMode store concept, where a targeted store can have different behaviors on different platforms.

The LazyLoader wasm dual loads values from an expected cache value using pyodide's native fetch mechanism, and falls back to caching in memory (since no disk writes in WASM)

@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Jun 24, 2026 11:21pm

Request Review

@dmadisetti dmadisetti added the enhancement New feature or request label Jun 15, 2026
Base automatically changed from dm/save-stubs to main June 24, 2026 17:09
@dmadisetti dmadisetti force-pushed the dm/lazy-store-merge branch from b4e80dc to 6cb26c5 Compare June 24, 2026 22:50
dmadisetti added a commit that referenced this pull request Jun 24, 2026
Add CachedLifecycle: an executor lifecycle that skips a cell's body on a
cache hit and backfills its defs on a miss. Detection of upstream
unserializable defs is done at use-site via a duck-typed
`__marimo_unhashable__` marker check (no import coupling to the
serialization toolkit), and a pre-flight ref scan requeues the producing
cells via a soft MarimoCancelCellError + Scheduler.requeue_for_rerun
rather than hard-failing.

Stacked on the LazyStore dual-mode backend (#9898): relies on that PR's
mark-don't-write mechanism so a cell's own unserializable def restores as
an UnhashableStub tripwire instead of raising a PicklingError.
dmadisetti added a commit that referenced this pull request Jun 24, 2026
When `marimo export html-wasm --execute` runs, the executed session's
LazyLoader writes an export manifest at kernel teardown listing the cache
keys it produced. The export step copies exactly those files into
`<out_dir>/public/cache/`, where the WASM store's HTTP fallback fetches
them — so a cached notebook ships its caches and skips recomputation in
the browser.

Stacked on the LazyStore dual-mode backend (#9898), which provides
`LazyLoader.flush_all()`, the `_ACTIVE_LAZY_LOADERS` registry, and
`WasmExportableStore.export_manifest()`.
Flesh out the LazyStore placeholder into a dual-mode store:

- LazyStore (native): wraps an inner FileStore, tracks written/touched
  keys for export.
- WasmLazyStore (Pyodide): writes to a shared in-session DictStore;
  reads fall through to concurrent HTTP fetch from
  notebook_location()/public/cache/, with path-traversal-safe keys and
  poisoned-key eviction on corrupt restore.
- The single native/WASM decision is made once via a DualLoader registry
  entry (resolve_loader), so nothing downstream re-checks the platform.

Adds WasmExportableStore (export_manifest tracking) + DictStore, the
_ACTIVE_LAZY_LOADERS registry, and LazyLoader.flush_all() for export.

Also folds in the unserializable-def robustness mechanism: rather than
writing a placeholder blob, the loader marks the manifest Item with
unserializable_type and reconstructs the UnhashableStub tripwire on load
(from_item).
@dmadisetti dmadisetti force-pushed the dm/lazy-store-merge branch from 6cb26c5 to 45fbfde Compare June 24, 2026 23:20
dmadisetti added a commit that referenced this pull request Jun 24, 2026
Add CachedLifecycle: an executor lifecycle that skips a cell's body on a
cache hit and backfills its defs on a miss. Detection of upstream
unserializable defs is done at use-site via a duck-typed
`__marimo_unhashable__` marker check (no import coupling to the
serialization toolkit), and a pre-flight ref scan requeues the producing
cells via a soft MarimoCancelCellError + Scheduler.requeue_for_rerun
rather than hard-failing.

Stacked on the LazyStore dual-mode backend (#9898): relies on that PR's
mark-don't-write mechanism so a cell's own unserializable def restores as
an UnhashableStub tripwire instead of raising a PicklingError.
dmadisetti added a commit that referenced this pull request Jun 24, 2026
When `marimo export html-wasm --execute` runs, the executed session's
LazyLoader writes an export manifest at kernel teardown listing the cache
keys it produced. The export step copies exactly those files into
`<out_dir>/public/cache/`, where the WASM store's HTTP fallback fetches
them — so a cached notebook ships its caches and skips recomputation in
the browser.

Stacked on the LazyStore dual-mode backend (#9898), which provides
`LazyLoader.flush_all()`, the `_ACTIVE_LAZY_LOADERS` registry, and
`WasmExportableStore.export_manifest()`.
@dmadisetti dmadisetti changed the title feat(save): LazyStore dual-mode backend + torch .pt codec feat(save): LazyStore dual-mode WASM backend Jun 25, 2026
@dmadisetti dmadisetti requested a review from Copilot June 25, 2026 18:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a WASM-compatible backend for the lazy persistence layer by introducing a “dual-mode” loader concept (native vs Pyodide/WASM), plus new manifest markers to safely represent unserializable defs without writing placeholder blobs. This fits into marimo._save by extending the existing lazy cache loader/store to support Pyodide constraints (no threads, no durable filesystem) and safer restore semantics.

Changes:

  • Introduces DualLoader/resolve_loader so PERSISTENT_LOADERS["lazy"] can resolve to LazyLoader (native) or WasmLazyLoader (WASM) via a single is_pyodide() check.
  • Adds LazyStore/WasmLazyStore and a minimal DictStore, plus a WasmExportableStore interface for batch blob retrieval and export manifest tracking.
  • Adds Item.unserializable_type + from_item(..., var_name=...) to reconstruct UnhashableStub tripwires from the manifest when serialization fails; expands tests around these behaviors (including torch .pt codec coverage).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/_save/stubs/test_unhashable_stub.py Adds tests for explicit type_name handling and reconstructing UnhashableStub from manifest Item.
tests/_save/stubs/test_lazy_codecs.py New torch-based tests covering lazy codec selection and .pt round-trips.
tests/_save/loaders/test_loader.py Adds regression tests ensuring unserializable values don’t write blobs and restore as tripwires; includes UI stale-blob cleanup behavior.
tests/_save/loaders/test_lazy_wasm.py New tests for DictStore, native vs WASM LazyStore behaviors, key sanitization, batch-path loading, and synchronous WASM writes.
marimo/_save/stubs/lazy_stub.py Extends Item with unserializable_type; updates UnhashableStub to support explicit type_name for manifest reconstruction.
marimo/_save/stores/store.py Adds WasmExportableStore interface with get_batch() and export_manifest().
marimo/_save/stores/dict_store.py New in-memory DictStore for WASM sessions.
marimo/_save/stores/init.py Exports WasmExportableStore.
marimo/_save/save.py Resolves loader registry entries via resolve_loader(...) before calling .partial(...).
marimo/_save/loaders/lazy.py Major update: adds LazyStore/WasmLazyStore, loader instance registry + flush_all, batch blob read path, unserializable markers, and WASM variants for read/write/eviction behavior.
marimo/_save/loaders/init.py Adds DualLoader + resolve_loader, registers lazy loader as native/WASM pair.

Comment on lines +300 to +303
clean = PurePosixPath(key)
if ".." in clean.parts or clean.is_absolute():
raise ValueError(f"Invalid cache key: {key}")
return str(clean)
Comment on lines +309 to +315
key = self._sanitize_key(key)
url = f"{self._base_url()}/{key}"
try:
with urllib.request.urlopen(url) as resp:
return resp.read() if resp.status == 200 else None
except Exception:
return None
Comment on lines +339 to +347
try:
results = loop.run_until_complete(_fetch_all())
except Exception:
# run_until_complete on the live pyodide loop requires JSPI
# (WebAssembly stack switching), which e.g. Firefox lacks. Fall
# back to sequential synchronous XHR via the pyodide_http-patched
# urllib — legal in a worker.
results = [(k, self._http_get(k)) for k in keys_list]
yield from results
Comment on lines +47 to +49
if TYPE_CHECKING:
from collections.abc import Callable

@dmadisetti dmadisetti requested a review from mscolnick June 25, 2026 18:27
return False


class WasmExportableStore(Store):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe consider just adding to Store? seems like useful/common functions, but idk if its always used

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_batch could have a default impl. export_manifest could be maybe be export_keys() or just list_keys(). it could default to empty if causes churn for other stores

_POISONED_KEYS: set[str] = set()


class LazyStore(WasmExportableStore):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to re-use TieredStore for some of these and just mix and match instead of integrate more deeply?

return sorted(self._written_keys | self._touched_keys)


class WasmLazyStore(LazyStore):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this also re-use TieredStore?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants