diff --git a/dev-docs/specs/2026-06-16-api-roadmap.md b/dev-docs/specs/2026-06-16-api-roadmap.md
new file mode 100644
index 0000000..2978da3
--- /dev/null
+++ b/dev-docs/specs/2026-06-16-api-roadmap.md
@@ -0,0 +1,113 @@
+# zarrista — API roadmap & next areas
+
+**Date:** 2026-06-16
+**Status:** Draft
+
+## Positioning
+
+zarrista is **not** the same kind of project as the official
+[`zarrs-python`](https://github.com/zarrs/zarrs-python). That binding is small and
+narrow: it injects the `zarrs` codec pipeline *behind* zarr-python, accelerating
+encode/decode while zarr-python owns the API, stores, indexing, and metadata.
+
+zarrista is being explored as a **low-level Zarr API in its own right** — one that
+could *replace* zarr-python, or that zarr-python could *depend on* for its core in
+the medium term. That ambition sets the design constraints below.
+
+### Design mindset vs. shipping order
+
+These are deliberately different:
+
+- **Design mindset: zarr-python replacement.** Every API decision should be made as
+  if this library will eventually need writing, full indexing semantics, a
+  pluggable store abstraction, groups-with-creation, and consolidated metadata.
+  Don't paint ourselves into a read-only or numerics-only corner with the type
+  signatures, class hierarchy, or store traits.
+- **Shipping order: fast standalone cloud reader first.** The immediate goal is to
+  get something real working end-to-end so we can **benchmark** against zarr-python
+  on cloud reads. The reader path (async + obstore) is where zarrs should already
+  beat zarr-python, so it's both the fastest path to a demo and the most
+  differentiated.
+
+The litmus test for any near-term decision: *does it move us toward a benchmarkable
+cloud reader, without foreclosing the replacement-grade API later?*
+
+## Current surface (as of this doc)
+
+Read-only, async-first metadata + raw-chunk reader:
+
+- `Array` / `AsyncArray`: metadata properties (`shape`, `dtype`, `ndim`, `attrs`,
+  `metadata`, `chunk_grid`, `codecs`, `dimension_names`, `path`) plus
+  `retrieve_chunk(chunk_indices)`.
+- `Group` / `AsyncGroup`: `attrs`, `array_keys()`, `group_keys()`, child navigation.
+- `Data`: zero-copy numpy via the buffer protocol.
+- Stores: sync `FilesystemStore` / `MemoryStore`; async = any `obstore.ObjectStore`.
+- Dtypes: fixed-width numerics only (bool, int/uint 8–64, float16/32/64).
+- No writing, no array-coordinate indexing, no fill values, no var-length dtypes.
+
+The key gap: `retrieve_chunk` is a *chunk-coordinate* primitive. Users think in
+*array coordinates*. Closing that gap is what turns this from a chunk inspector into
+an array library.
+
+## Tier 1 — makes it usable (do first)
+
+1. **Array indexing / `__getitem__`.** Map Python `slice`/`int`/`Ellipsis`/`None`
+   to `zarrs::array_subset::ArraySubset`, call `retrieve_array_subset_opt`, return
+   an ndarray. Start with **basic indexing** (slices + ints + ellipsis); defer
+   orthogonal/vectorized/boolean to a later pass (mirror zarr-python's `.oindex` /
+   `.vindex` split). The `retrieve_array_subset` path is already stubbed/commented
+   in `array/sync.rs`. This is the single highest-impact change.
+
+2. **Fill values + edge chunks.** Indexing forces this: subsets spanning the array
+   boundary or hitting missing chunks need the fill value. Extraction code already
+   exists commented-out in `dtype.rs`. Expose as `Array.fill_value` *and* wire into
+   the subset read path — without it, partial-edge-chunk reads are wrong.
+
+3. **Complete the dtype story.** Add **variable-length strings** and **fixed-width
+   bytes** (target numpy 2 `StringDType`). Structured/complex dtypes can wait.
+
+## Tier 2 — where zarrista should beat zarr-python
+
+The differentiation, and the thing to benchmark:
+
+4. **Parallel multi-chunk / subset reads.** Lean on zarrs's concurrent codec+I/O
+   pipeline. Expose `retrieve_chunks(list_of_indices)` and make `__getitem__` over a
+   multi-chunk region fan out internally with a configurable concurrency limit
+   (`zarrs` `CodecOptions`/concurrency knobs). Headline: a single
+   `await arr[big_slice]` pulling hundreds of chunks concurrently from S3.
+
+5. **`retrieve_*_into` / preallocated output.** Decode into a caller-provided
+   buffer to avoid an allocation and integrate cleanly with xarray/dask block
+   fetching. Builds on the existing `Data` buffer-protocol work.
+
+## Tier 3 — larger projects (replacement-grade, sequence deliberately)
+
+6. **Writing.** New axis: writable stores (obstore PUT), `store_chunk` /
+   `store_array_subset`, `create_array` / `create_group`, resize. Required for the
+   replacement goal; not required for the first benchmark.
+
+7. **Store extensibility.** (a) Let obstore back the **sync** path too via
+   `block_on`, so we don't maintain two store worlds. (b) A Python-implementable
+   `Store` protocol for custom backends, mirroring zarr-python's `Store` ABC.
+
+8. **Consolidated metadata + group creation** — replacement-grade parity items.
+
+## Cross-cutting: testing
+
+Currently ~one smoke test. Before Tier 1 lands, stand up **round-trip tests against
+zarr-python**: write with zarr-python, read with zarrista, assert equality across
+dtypes/codecs/sharding. Cheapest way to buy correctness confidence and a prerequisite
+for trustworthy benchmarks. Do this in parallel with Tier 1.
+
+## Recommended sequence
+
+1. Round-trip test harness vs. zarr-python (parallel, ongoing).
+2. Tier 1: indexing → fill values/dtypes.
+3. Tier 2: parallel bulk reads → benchmark vs. zarr-python on a real cloud dataset.
+4. Reassess with benchmark numbers in hand before committing to Tier 3 (writing).
+
+## Out of scope (for now)
+
+Writing; full fancy/boolean indexing; consolidated metadata; group creation; custom
+Python stores. All are in-scope for the *design* (don't foreclose them) but not for
+the first benchmarkable milestone.
diff --git a/dev-docs/specs/2026-06-18-string-dtype-design.md b/dev-docs/specs/2026-06-18-string-dtype-design.md
new file mode 100644
index 0000000..30a8660
--- /dev/null
+++ b/dev-docs/specs/2026-06-18-string-dtype-design.md
@@ -0,0 +1,130 @@
+# zarrista — variable-length string dtype
+
+**Date:** 2026-06-18
+**Status:** Approved
+
+## Goal
+
+Let zarrista read Zarr arrays whose dtype is **variable-length UTF-8 string**,
+returning them as a numpy 2 `StringDType` array via `Data.to_numpy()`. This is the
+first slice of Tier 1, item 3 of the [API roadmap](2026-06-16-api-roadmap.md)
+("complete the dtype story"). Fixed-width raw bytes, complex, and fixed UTF-32 are
+explicitly deferred to follow-up tasks.
+
+## Guiding principle (settled in brainstorming)
+
+**Rust produces the decoded payload; Python owns numpy-dtype construction.** The
+buffer protocol carries *bytes*, not type semantics, for anything richer than a
+buffer-native scalar. For variable-length strings this is not just a preference —
+it is forced: rust-numpy has no safe API for numpy 2 `StringDType`
+([PyO3/rust-numpy#505](https://github.com/PyO3/rust-numpy/issues/505)). So the
+numpy array is always built on the Python side, and Rust never touches
+`StringDType`.
+
+## Dtype categories
+
+This frames the broader "dtype story" so the string work slots in cleanly. Only
+**category 3** is implemented in this round.
+
+| Category | Members | Held as | Buffer protocol | `to_numpy` |
+|---|---|---|---|---|
+| 1. Buffer-native | existing 12 numerics | `ArrayD<T>` | typed, strided (unchanged) | `np.asarray(self)` — zero-copy typed view |
+| 2. Raw fixed-width | `r*N`, complex, UTF-32 (future) | bytes + numpy-dtype string + shape | flat `B` buffer | `np.frombuffer(self, dt).reshape(shape)` |
+| 3. Variable-length | **`string` (this round)**, bytes (future) | `ArrayD<String>` | none | build `list[str]` → `np.array(lst, StringDType()).reshape(shape)` |
+
+Category 2 infrastructure is **not** built in this round — string needs none of it.
+
+## What zarrs gives us
+
+- `DataType::String` is `StringDataType`; `dtype.is::<StringDataType>()` identifies it.
+- `String: ElementOwned` — zarrs does the UTF-8 decode for us.
+- With the `ndarray` feature (already enabled), `retrieve_array_subset_ndarray::<String>()`
+  and `retrieve_chunk_ndarray::<String>()` return `ArrayD<String>`.
+- The `vlen_utf8` codec (and `vlen`, `vlen_v2`) is implemented in zarrs, so it can
+  decode what zarr-python writes for a string array.
+- Edge/missing chunks are filled with the array's string fill value automatically
+  during retrieval — no special handling here.
+
+## Changes
+
+### `src/data.rs` — `DataInner` and `PyData`
+
+1. **Add a variant:** `DataInner::String(ArrayD<String>)`.
+2. **`with_array!` gains a `String($a) => $body` arm.** This compiles because the
+   only remaining `with_array!` call-site bodies are `a.shape()`, `a.strides()`, and
+   `a.as_ptr()` — all valid for `ArrayD<String>`. (See deletion below for why no
+   body needs `numpy::Element`.)
+3. **Delete `to_numpy_with_copy`.** It is the `buffer_format == None` fallback and is
+   currently dead (all 12 numerics have a format). String is handled by an explicit
+   branch in `to_numpy` instead, so the fallback — the one body that needed
+   `PyArray::from_array` (and thus `numpy::Element`, which `String` is not) — is
+   removed. This is what lets `with_array!` accept a `String` arm.
+4. **`buffer_format`:** add `String(_) => None`. `__getbuffer__` already rejects the
+   `None` case with `PyBufferError` ("no buffer-protocol representation") — strings
+   are not buffer-exportable, by design.
+5. **`itemsize` / `data_ptr`:** add `String(_)` arms. These are only reached from the
+   buffer-protocol path, which `String` never enters, so the arms exist solely for
+   match exhaustiveness (`itemsize` may return the size of a `String` element;
+   `data_ptr` returns the `ArrayD<String>` pointer). The stored `strides` are unused
+   for strings.
+6. **`to_numpy` becomes a 2-way branch:**
+   - `DataInner::String(arr)` → build the numpy array (below).
+   - everything else → `np.asarray(self)` (zero-copy typed view, unchanged).
+
+   The `StringDType` builder: import `numpy`, construct `np.dtypes.StringDType()`,
+   build a flat `list[str]` by iterating `arr` in C-order (`arr.iter()` follows
+   zarrs's C-order layout), then
+   `np.array(flat_list, dtype=string_dtype).reshape(arr.shape())`. Empty arrays
+   (a zero-length axis) reshape correctly from an empty list.
+
+### `src/array/sync.rs` and `src/array/async.rs` — read dispatch
+
+In both `retrieve_array_subset` and `retrieve_chunk`, after the existing
+`for_each_dtype!` macro loop, add an explicit string arm:
+
+```rust
+if dtype.is::<StringDataType>() {
+    let data = self.inner.retrieve_array_subset_ndarray::<String>(&array_subset)?;
+    return Ok(PyData::from(DataInner::String(data)));
+}
+```
+
+(and the `retrieve_chunk_ndarray::<String>` analogue for `retrieve_chunk`). On
+`AsyncArray` the same arm uses the async method names —
+`async_retrieve_array_subset_ndarray::<String>` and
+`async_retrieve_chunk_ndarray::<String>` — matching how the existing numeric arms
+call `async_retrieve_array_subset::<ArrayD<$elem>>`. The trailing
+`NotImplementedError` for unsupported dtypes stays as the fallback for everything
+still unimplemented.
+
+## Testing
+
+- **Python round-trip vs. zarr-python** (extends the harness seeded by the indexing
+  work): write variable-length string arrays with zarr-python across a few
+  shapes / chunkings, including a partial-edge-chunk case, read them back with
+  zarrista's `FilesystemStore`, and assert `Data.to_numpy()` equals the zarr-python
+  array. Cover both `retrieve_array_subset`/`__getitem__` and `retrieve_chunk`.
+  Include an array with an empty selection (zero-length axis).
+- **Rust unit test:** a small `ArrayD<String>` → `DataInner::String` → `to_numpy`
+  check is hard without a Python interpreter; rely on the Python round-trip for
+  end-to-end coverage and keep any Rust-side test limited to construction.
+- **Tooling:** rebuild with `maturin develop` after Rust changes; run tests with
+  `uv run --no-project pytest` so uv does not rebuild on every invocation.
+
+## Risks
+
+- **zarr-python's string encoding.** The round-trip assumes zarr-python emits a
+  string array zarrs can decode via `vlen_utf8`/`vlen`. If zarr-python uses an
+  encoding zarrs does not recognize, the round-trip test will surface it; resolving
+  any codec mismatch is part of this task.
+
+## Out of scope (deferred)
+
+- **Fixed-width raw bytes** (`r*N` → `|V<n>`) and the category-2 `frombuffer`
+  infrastructure — next task.
+- **complex64/128 and fixed UTF-32** — the "fold in" follow-up task.
+- **bfloat16** (drags in an `ml_dtypes` runtime dependency) and **variable-length
+  bytes** (numpy object array, different semantics).
+- **String fill-value scalar exposure** (`fill_value_to_py` / `Array.fill_value`) —
+  the separate roadmap item 2.
+- **Writing** string arrays.