virtual-zarr · maxrjones · Dec 19, 2025 · Jan 22, 2026 · Jan 22, 2026
diff --git a/docs/api/aiohttp.md b/docs/api/aiohttp.md
@@ -1 +1,5 @@
-::: obspec_utils.aiohttp.AiohttpStore
+::: obspec_utils.aiohttp
+    options:
+      members: true
+      filters:
+        - "!^_"  # Exclude private members
diff --git a/docs/api/obspec.md b/docs/api/obspec.md
@@ -1,2 +1,5 @@
-::: obspec_utils.obspec.StoreReader
-::: obspec_utils.obspec.StoreMemCacheReader
+::: obspec_utils.obspec
+    options:
+      filters:
+        - "!^_"  # Exclude private members
+        - "!ReadableStore*"
diff --git a/docs/api/obstore.md b/docs/api/obstore.md
@@ -1,2 +1,5 @@
-::: obspec_utils.obstore.ObstoreReader
-::: obspec_utils.obstore.ObstoreMemCacheReader
+::: obspec_utils.obstore
+    options:
+      members: true
+      filters:
+        - "!^_"  # Exclude private members
diff --git a/docs/api/registry.md b/docs/api/registry.md
@@ -1,2 +1,5 @@
-::: obspec_utils.registry.ObjectStoreRegistry
-::: obspec_utils.registry.UrlKey
+::: obspec_utils.registry
+    options:
+      members: true
+      filters:
+        - "!^_"  # Exclude private members
diff --git a/docs/api/typing.md b/docs/api/typing.md
@@ -1,4 +1,7 @@
 ::: obspec_utils.obspec.ReadableStore
 
-::: obspec_utils.typing.Url
-::: obspec_utils.typing.Path
+::: obspec_utils.typing
+    options:
+      members: true
+      filters:
+        - "!^_"  # Exclude private members
diff --git a/docs/benchmark.md b/docs/benchmark.md
@@ -0,0 +1,180 @@
+# Benchmarking
+
+`obspec-utils` includes a benchmark script for comparing the performance of different approaches to reading cloud-hosted data.
+
+## Benchmark Script
+
+The benchmark script compares fsspec, obstore readers, and VirtualiZarr + Icechunk approaches for reading NetCDF files from S3.
+
+??? note "View full script"
+
+    ```python
+    --8<-- "scripts/benchmark_readers.py"
+    ```
+
+## Running the Benchmark
+
+```bash
+# Full benchmark with default settings
+uv run scripts/benchmark_readers.py
+
+# Quick test with fewer files
+uv run scripts/benchmark_readers.py --n-files 2
+
+# Skip specific benchmarks
+uv run scripts/benchmark_readers.py --skip fsspec_default obstore_eager
+
+# Label results for a specific environment
+uv run scripts/benchmark_readers.py --environment cloud --description "AWS us-west-2"
+```
+
+## Benchmark Results
+
+```python exec="on"
+import json
+from pathlib import Path
+
+results_file = Path("scripts/benchmark_timings.json")
+
+if results_file.exists():
+    with open(results_file) as f:
+        all_results = json.load(f)
+
+    for env_name, env_data in all_results.items():
+        print(f"### {env_data.get('description', env_name)}")
+        print()
+        print(f"- **Environment**: {env_data.get('environment', 'unknown')}")
+        print(f"- **Files tested**: {env_data.get('n_files', 'N/A')}")
+        print(f"- **Timestamp**: {env_data.get('timestamp', 'N/A')}")
+        print()
+
+        timings = env_data.get("timings", {})
+        if timings:
+            # Sort by total time
+            sorted_methods = sorted(timings.items(), key=lambda x: x[1].get("total", float("inf")))
+            fastest_total = sorted_methods[0][1].get("total", 1) if sorted_methods else 1
+
+            print("| Method | Open | Spatial | Time Slice | Timeseries | **Total** |")
+            print("|--------|-----:|--------:|-----------:|-----------:|----------:|")
+
+            for method, times in sorted_methods:
+                total = times.get("total", 0)
+                speedup = total / fastest_total if fastest_total > 0 else 1
+                speedup_str = " ⚡" if speedup <= 1.01 else f" ({speedup:.1f}x)"
+
+                print(
+                    f"| {method} | "
+                    f"{times.get('open', 0):.2f}s | "
+                    f"{times.get('spatial_subset_load', 0):.2f}s | "
+                    f"{times.get('time_slice_load', 0):.2f}s | "
+                    f"{times.get('timeseries_load', 0):.2f}s | "
+                    f"**{total:.2f}s**{speedup_str} |"
+                )
+
+            print()
+            print("*All times in seconds. Lower is better.*")
+            print()
+else:
+    print("*No benchmark results available. Run the benchmark script to generate results.*")
+```
+
+
+## Command Line Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--environment` | `local` | Label for this run (`local` or `cloud`) |
+| `--description` | auto | Description for this run |
+| `--n-files` | 5 | Number of files to test with |
+| `--output` | `benchmark_timings.json` | Output JSON file |
+| `--skip` | `fsspec_default` | Benchmarks to skip |
+
+## Benchmarked Methods
+
+| Method | Description |
+|--------|-------------|
+| `fsspec_default_cache` | fsspec with default caching strategy |
+| `fsspec_block_cache` | fsspec with 8MB block cache |
+| `obstore_reader` | Basic `ObstoreReader` with buffered reads |
+| `obstore_eager` | `ObstoreEagerReader` - loads entire file into memory |
+| `obstore_prefetch` | `ObstorePrefetchReader` - background prefetching |
+| `obstore_parallel` | `ObstoreParallelReader` - parallel range fetching |
+| `obstore_hybrid` | `ObstoreHybridReader` - exponential readahead + parallel fetching |
+| `virtualzarr_icechunk` | VirtualiZarr + Icechunk for virtual Zarr stores |
+
+
+## File Handlers Comparison
+
+### ObstoreReader
+
+The basic reader with configurable buffer size. Best for simple sequential reads.
+
+```python
+from obspec_utils import ObstoreReader
+
+reader = ObstoreReader(store, path, buffer_size=1024*1024)
+```
+
+### ObstoreEagerReader
+
+Loads the entire file into memory before reading. Best when files will be read multiple times and are small enough to fit in memory.
+
+```python
+from obspec_utils import ObstoreEagerReader
+
+reader = ObstoreEagerReader(store, path)
+```
+
+### ObstorePrefetchReader
+
+Prefetches upcoming byte ranges in background threads. Best for sequential read patterns.
+
+```python
+from obspec_utils import ObstorePrefetchReader
+
+reader = ObstorePrefetchReader(
+    store, path,
+    prefetch_size=4*1024*1024,  # 4 MB ahead
+    chunk_size=1024*1024,        # 1 MB chunks
+    max_workers=2,
+)
+```
+
+### ObstoreParallelReader
+
+Fetches multiple byte ranges in parallel using `get_ranges`. Best for random access patterns.
+
+```python
+from obspec_utils import ObstoreParallelReader
+
+reader = ObstoreParallelReader(
+    store, path,
+    chunk_size=1024*1024,   # 1 MB chunks
+    batch_size=16,          # Up to 16 parallel fetches
+)
+```
+
+### ObstoreHybridReader
+
+Combines exponential readahead (for metadata) with parallel chunk fetching (for data). Best for HDF5/NetCDF files.
+
+```python
+from obspec_utils import ObstoreHybridReader
+
+reader = ObstoreHybridReader(
+    store, path,
+    initial_readahead=32*1024,   # Start with 32 KB
+    readahead_multiplier=2.0,    # Double each time
+    chunk_size=1024*1024,        # 1 MB chunks for data
+)
+```
+
+## Choosing the Right Reader
+
+| Use Case | Recommended Reader |
+|----------|-------------------|
+| Small files, repeated access | `ObstoreEagerReader` |
+| Sequential reads, streaming | `ObstorePrefetchReader` |
+| Random access, array chunks | `ObstoreParallelReader` |
+| HDF5/NetCDF files | `ObstoreHybridReader` |
+| Simple, one-time reads | `ObstoreReader` |
diff --git a/docs/index.md b/docs/index.md
@@ -10,7 +10,12 @@ Utilities for interacting with object storage, based on [obspec](https://github.
 
 2. **ReadableStore Protocol**: A minimal protocol defining the read-only interface required for object storage access. This allows alternative backends (like aiohttp) to be used instead of obstore.
 
-3. **File Handlers**: Wrappers around obstore's file reading capabilities that provide a familiar file-like interface.
+3. **File Handlers**: Wrappers around obstore's file reading capabilities that provide a familiar file-like interface, making it easy to integrate with libraries that expect standard Python file objects:
+   - `ObstoreReader`: Basic reader with buffered reads
+   - `ObstoreEagerReader`: Eagerly loads entire file into memory
+   - `ObstorePrefetchReader`: Background prefetching for sequential reads
+   - `ObstoreParallelReader`: Parallel range fetching for random access
+   - `ObstoreHybridReader`: Combines exponential readahead with parallel fetching
 
 ## Design Philosophy
 
@@ -110,20 +115,27 @@ data = cached_reader.readall()
 For maximum performance with obstore, use the obstore-specific readers which leverage obstore's native `ReadableFile`:
 
 ```python
+import xarray as xr
 from obstore.store import S3Store
-from obspec_utils.obstore import ObstoreReader, ObstoreMemCacheReader
+from obspec_utils.obstore import ObstoreReader, ObstoreMemCacheReader, ObstoreEagerReader, ObstoreHybridReader
 
 store = S3Store(bucket="my-bucket")
 
 # Uses obstore's optimized buffered reader
 reader = ObstoreReader(store, "path/to/file.bin", buffer_size=1024*1024)
 data = reader.read(100)
 
-# Uses obstore's MemoryStore for caching
-cached_reader = ObstoreMemCacheReader(store, "path/to/file.bin")
-data = cached_reader.readall()
+# Memory-cached reader for repeated access
+cached_reader = ObstoreEagerReader(store, "path/to/file.bin")
+data = cached_reader.readall()  # Read entire file from memory cache
+
+# Hybrid reader for HDF5/NetCDF files (recommended for xarray)
+with ObstoreHybridReader(store, "path/to/file.nc") as reader:
+    ds = xr.open_dataset(reader, engine="h5netcdf")
 ```
 
+See the [Benchmark](benchmark.md) page for performance comparisons between the different readers.
+
 ## Contributing
 
 1. Clone the repository: `git clone https://github.com/virtual-zarr/obspec-utils.git`

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -14,6 +14,7 @@ extra:
 
 nav:
   - "index.md"
+  - "benchmark.md"
   - "API":
       - Typing: "api/typing.md"
       - Aiohttp Adapters: "api/aiohttp.md"