feat(save): LazyStore dual-mode WASM backend#9898
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
c2b8ae6 to
7c9f48e
Compare
b4e80dc to
6cb26c5
Compare
Add CachedLifecycle: an executor lifecycle that skips a cell's body on a cache hit and backfills its defs on a miss. Detection of upstream unserializable defs is done at use-site via a duck-typed `__marimo_unhashable__` marker check (no import coupling to the serialization toolkit), and a pre-flight ref scan requeues the producing cells via a soft MarimoCancelCellError + Scheduler.requeue_for_rerun rather than hard-failing. Stacked on the LazyStore dual-mode backend (#9898): relies on that PR's mark-don't-write mechanism so a cell's own unserializable def restores as an UnhashableStub tripwire instead of raising a PicklingError.
When `marimo export html-wasm --execute` runs, the executed session's LazyLoader writes an export manifest at kernel teardown listing the cache keys it produced. The export step copies exactly those files into `<out_dir>/public/cache/`, where the WASM store's HTTP fallback fetches them — so a cached notebook ships its caches and skips recomputation in the browser. Stacked on the LazyStore dual-mode backend (#9898), which provides `LazyLoader.flush_all()`, the `_ACTIVE_LAZY_LOADERS` registry, and `WasmExportableStore.export_manifest()`.
Flesh out the LazyStore placeholder into a dual-mode store: - LazyStore (native): wraps an inner FileStore, tracks written/touched keys for export. - WasmLazyStore (Pyodide): writes to a shared in-session DictStore; reads fall through to concurrent HTTP fetch from notebook_location()/public/cache/, with path-traversal-safe keys and poisoned-key eviction on corrupt restore. - The single native/WASM decision is made once via a DualLoader registry entry (resolve_loader), so nothing downstream re-checks the platform. Adds WasmExportableStore (export_manifest tracking) + DictStore, the _ACTIVE_LAZY_LOADERS registry, and LazyLoader.flush_all() for export. Also folds in the unserializable-def robustness mechanism: rather than writing a placeholder blob, the loader marks the manifest Item with unserializable_type and reconstructs the UnhashableStub tripwire on load (from_item).
6cb26c5 to
45fbfde
Compare
Add CachedLifecycle: an executor lifecycle that skips a cell's body on a cache hit and backfills its defs on a miss. Detection of upstream unserializable defs is done at use-site via a duck-typed `__marimo_unhashable__` marker check (no import coupling to the serialization toolkit), and a pre-flight ref scan requeues the producing cells via a soft MarimoCancelCellError + Scheduler.requeue_for_rerun rather than hard-failing. Stacked on the LazyStore dual-mode backend (#9898): relies on that PR's mark-don't-write mechanism so a cell's own unserializable def restores as an UnhashableStub tripwire instead of raising a PicklingError.
When `marimo export html-wasm --execute` runs, the executed session's LazyLoader writes an export manifest at kernel teardown listing the cache keys it produced. The export step copies exactly those files into `<out_dir>/public/cache/`, where the WASM store's HTTP fallback fetches them — so a cached notebook ships its caches and skips recomputation in the browser. Stacked on the LazyStore dual-mode backend (#9898), which provides `LazyLoader.flush_all()`, the `_ACTIVE_LAZY_LOADERS` registry, and `WasmExportableStore.export_manifest()`.
There was a problem hiding this comment.
Pull request overview
Adds a WASM-compatible backend for the lazy persistence layer by introducing a “dual-mode” loader concept (native vs Pyodide/WASM), plus new manifest markers to safely represent unserializable defs without writing placeholder blobs. This fits into marimo._save by extending the existing lazy cache loader/store to support Pyodide constraints (no threads, no durable filesystem) and safer restore semantics.
Changes:
- Introduces
DualLoader/resolve_loadersoPERSISTENT_LOADERS["lazy"]can resolve toLazyLoader(native) orWasmLazyLoader(WASM) via a singleis_pyodide()check. - Adds
LazyStore/WasmLazyStoreand a minimalDictStore, plus aWasmExportableStoreinterface for batch blob retrieval and export manifest tracking. - Adds
Item.unserializable_type+from_item(..., var_name=...)to reconstructUnhashableStubtripwires from the manifest when serialization fails; expands tests around these behaviors (including torch.ptcodec coverage).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_save/stubs/test_unhashable_stub.py | Adds tests for explicit type_name handling and reconstructing UnhashableStub from manifest Item. |
| tests/_save/stubs/test_lazy_codecs.py | New torch-based tests covering lazy codec selection and .pt round-trips. |
| tests/_save/loaders/test_loader.py | Adds regression tests ensuring unserializable values don’t write blobs and restore as tripwires; includes UI stale-blob cleanup behavior. |
| tests/_save/loaders/test_lazy_wasm.py | New tests for DictStore, native vs WASM LazyStore behaviors, key sanitization, batch-path loading, and synchronous WASM writes. |
| marimo/_save/stubs/lazy_stub.py | Extends Item with unserializable_type; updates UnhashableStub to support explicit type_name for manifest reconstruction. |
| marimo/_save/stores/store.py | Adds WasmExportableStore interface with get_batch() and export_manifest(). |
| marimo/_save/stores/dict_store.py | New in-memory DictStore for WASM sessions. |
| marimo/_save/stores/init.py | Exports WasmExportableStore. |
| marimo/_save/save.py | Resolves loader registry entries via resolve_loader(...) before calling .partial(...). |
| marimo/_save/loaders/lazy.py | Major update: adds LazyStore/WasmLazyStore, loader instance registry + flush_all, batch blob read path, unserializable markers, and WASM variants for read/write/eviction behavior. |
| marimo/_save/loaders/init.py | Adds DualLoader + resolve_loader, registers lazy loader as native/WASM pair. |
| clean = PurePosixPath(key) | ||
| if ".." in clean.parts or clean.is_absolute(): | ||
| raise ValueError(f"Invalid cache key: {key}") | ||
| return str(clean) |
| key = self._sanitize_key(key) | ||
| url = f"{self._base_url()}/{key}" | ||
| try: | ||
| with urllib.request.urlopen(url) as resp: | ||
| return resp.read() if resp.status == 200 else None | ||
| except Exception: | ||
| return None |
| try: | ||
| results = loop.run_until_complete(_fetch_all()) | ||
| except Exception: | ||
| # run_until_complete on the live pyodide loop requires JSPI | ||
| # (WebAssembly stack switching), which e.g. Firefox lacks. Fall | ||
| # back to sequential synchronous XHR via the pyodide_http-patched | ||
| # urllib — legal in a worker. | ||
| results = [(k, self._http_get(k)) for k in keys_list] | ||
| yield from results |
| if TYPE_CHECKING: | ||
| from collections.abc import Callable | ||
|
|
| return False | ||
|
|
||
|
|
||
| class WasmExportableStore(Store): |
There was a problem hiding this comment.
maybe consider just adding to Store? seems like useful/common functions, but idk if its always used
There was a problem hiding this comment.
get_batch could have a default impl. export_manifest could be maybe be export_keys() or just list_keys(). it could default to empty if causes churn for other stores
| _POISONED_KEYS: set[str] = set() | ||
|
|
||
|
|
||
| class LazyStore(WasmExportableStore): |
There was a problem hiding this comment.
is it possible to re-use TieredStore for some of these and just mix and match instead of integrate more deeply?
| return sorted(self._written_keys | self._touched_keys) | ||
|
|
||
|
|
||
| class WasmLazyStore(LazyStore): |
There was a problem hiding this comment.
could this also re-use TieredStore?
Summary
This PR creates a WASM compatible LazyStore and introduces the
DualModestore concept, where a targeted store can have different behaviors on different platforms.The LazyLoader wasm
dualloads values from an expected cache value using pyodide's nativefetchmechanism, and falls back to caching in memory (since no disk writes in WASM)