feat(cache): support single byte ranges on Cache.Open by alecthomas · Pull Request #346 · block/cachew

alecthomas · 2026-06-23T10:08:11Z

Intent

Make the cache range-aware so callers can fetch a slice of an object instead of the whole thing, and add a primitive that uses that capability to pull large objects faster by reading their chunks in parallel.

Features

Byte ranges on `Cache.Open`

Open accepts a Range option and resolves it end to end — through the client SDK, every backend (disk, memory, S3, remote, tiered), and the server's HTTP serving.

Typed, half-open API: cache.Range(start, end) requests [start, end); a negative end means "to the end of the object". cache.Range(0, -1) is the whole object, cache.Range(0, 500) the first 500 bytes.
If-Range (ETag): cache.IfRange(etag) only applies the range when the stored ETag matches, otherwise the full object is served.
Efficient slicing per backend: disk seeks + io.LimitReader, memory slices, S3 issues a single ranged GET, remote forwards the range over the wire; Tiered skips backfill on partial reads so a truncated object is never cached.
HTTP semantics: serving emits 206 Partial Content with Content-Range, 416 Range Not Satisfiable for out-of-bounds ranges, and advertises Accept-Ranges: bytes. Stat/HEAD ignores Range. Incoming client Range headers (including suffix bytes=-N) are forwarded verbatim by the proxy handlers.

Scope: a single byte range only — multi-range and malformed Range headers fall back to a full 200; If-Range supports the entity-tag form.

`ParallelGet` helper

cache.ParallelGet(ctx, c, key, dst io.WriterAt, chunkSize, concurrency) downloads an object from any range-capable Cache by fetching its chunks concurrently — generalising the S3 backend's parallel download to any implementation, so latency-bound backends like the remote cache can saturate bandwidth.

Open-first discovery: the first ranged Open yields chunk zero, the total size (from Content-Range), and the ETag — no separate Stat.
Revision-consistent: every subsequent chunk is pinned with If-Range to the discovery ETag; a chunk whose ETag differs (the key was rewritten mid-download) is reported as an error rather than splicing two revisions. Missing or short chunks are errors too.
Streams to a caller-owned sink: each worker copies its range straight to dst at the right offset via io.NewOffsetWriter + io.Copy (no full-object buffering); concurrency is bounded by errgroup.SetLimit. The caller owns dst's lifecycle and need not pre-size it.

Add Range/If-Range support to Cache.Open mirroring the existing ETag conditional pattern. RequestOptions gains Range/IfRange fields and a ResolveRange resolver for a single byte range, gated on the stored ETag via If-Range. Backends slice the body (disk seek, memory slice, S3 ranged GET) and set Content-Range; out-of-bounds ranges return ErrRangeNotSatisfiable. Tiered skips backfill on partial reads to avoid caching truncated objects. Serving emits 206/416 and advertises Accept-Ranges; Stat ignores Range. Multi-range falls back to a full 200. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address code-review findings on the byte-range support: - Strip Range, If-Range and Content-Range on PUT (httputil.TransportHeaders), alongside the existing If-Match/If-None-Match. A stored Content-Range would otherwise be replayed and make a plain GET spuriously answer 206. - Return the stored headers (not nil) on the disk Seek error path, matching the 416 path. - Add tests: stored-Content-Range regression, full-size and suffix Content-Length assertions, and a zero-length-object 416 case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the raw-string Range(spec) option with a typed Range(start, end) taking a half-open byte interval; a negative end means "to the end of the object" (Content-Length). e.g. Range(0, 500) is the first 500 bytes and Range(0, -1) the whole object. The raw HTTP header form is retained as RangeHeader(spec) for forwarding a client's Range verbatim (used by httputil and for suffix ranges that the typed form can't express). The wire format and server-side parsing are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The client SDK targets the apiv1 endpoint and only needs the typed Range(start, end) form, so remove the raw RangeHeader option (and its cache re-export). The server's proxy handlers still forward an external client's verbatim Range header (e.g. suffix "bytes=-N") by setting the shared RequestOptions.Range field directly in httputil. Suffix coverage stays via the existing apiv1 end-to-end test; the redundant suite subtest is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 047fa1e5ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ases A 416 carries no body, but the backend had already set Content-Length to the full object size, so clients could wait for bytes that never arrive. Drop it on the unsatisfiable-range path. Also route the conditional/range option logic (range.go, conditional.go, tiered.go) through the cache package's own aliases rather than referencing the client package directly; add NewRequestOptions/RequestOptions/RangeOutcome aliases to the cache API for that. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94b87ca6cc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

alecthomas · 2026-06-23T10:42:41Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94b87ca6cc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ServeCacheHit infers 206 from a Content-Range header, so a Content-Range persisted in an object's stored metadata (pre-existing objects, or direct Cache.Create callers that bypass the APIV1 PUT filter) would turn a plain full GET into a spurious 206. rangeShortCircuit now drops Content-Range on the full-response path, so the 206 signal only ever reflects a real range. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Generalise the S3 backend's parallel chunked download to any Cache: ParallelGet Stats the object, then fetches chunkSize-byte ranges concurrently (up to a given concurrency) via ranged Opens, reassembling them in order through a pipe. Latency-bound backends such as the remote cache can saturate bandwidth with overlapping reads. Small objects or concurrency<2 fall back to a single Open. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b771b61ddb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Rework ParallelGet to write into a caller-supplied io.WriterAt instead of returning a streaming io.ReadCloser, which removes the ordered-channel reassembly, the io.Pipe, and every io.ReadAll the previous version inherited from the S3 code: - Open-first discovery: the first ranged Open yields chunk zero, the total size (from Content-Range) and the ETag, replacing the separate Stat round trip. - Every remaining chunk is pinned with IfRange to that ETag; a chunk whose ETag differs (object rewritten mid-download) returns an error rather than splicing revisions. Missing or short chunks are likewise errors. - Each worker streams its range straight to dst via io.NewOffsetWriter + io.Copy at non-overlapping offsets; concurrency is bounded with errgroup.SetLimit. The caller owns dst's lifecycle and need not pre-size it (WriteAt extends it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

With errgroup SetLimit, the dispatch loop kept queuing every remaining chunk once a failure cancelled the group, blocking on the limiter only to bail immediately. Break out of the loop when egCtx is cancelled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alecthomas · 2026-06-23T11:45:32Z

@codex review

chatgpt-codex-connector · 2026-06-23T11:51:19Z

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: 4ac137be00

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The discovery Open must stay synchronous to learn the total size and ETag, but its body was also being copied to the sink serially before the remaining chunks were scheduled, so a large first chunk blocked all parallelism. Hand the already -open first chunk to a goroutine in the errgroup alongside the other workers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ece7d749b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Stat ignores Range and never runs rangeShortCircuit, so a Content-Range left in an object's stored metadata was echoed on a 200 HEAD, advertising partial metadata for the full object. Move the strip into conditionalShortCircuit, the one step both Stat and Open run on stored headers, so Stat (and the 304 paths) never carry it and Open only does when rangeShortCircuit sets a real range. The now-redundant strip in rangeShortCircuit is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cc2176897

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The TransportHeaders strip list ran only in the apiv1 PUT handler, so direct Cache.Create callers could persist transport headers (incl. Content-Range) as object metadata. Apply httputil.FilterHeaders with that list in the disk, memory and S3 Create paths, so every writer gets identical hygiene. This complements — rather than replaces — the read-time strip in conditionalShortCircuit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0c7cbf5d5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…no ETag Without an ETag the per-chunk IfRange pinning is a no-op (IfRange("") is ignored and an empty ETag matches an empty ETag), so chunks from different revisions could be spliced undetected during a rewrite. Objects stored before ETags were recorded have none, so rather than failing, fall back to a single revision-consistent read instead of parallelising. Single-chunk objects (one read) are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alecthomas · 2026-06-24T00:00:27Z

@codex review

chatgpt-codex-connector · 2026-06-24T00:07:10Z

Codex Review: Didn't find any major issues. Bravo.

Reviewed commit: ca3550cedb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

alecthomas and others added 4 commits June 23, 2026 16:28

alecthomas requested a review from a team as a code owner June 23, 2026 10:08

alecthomas requested review from joshfriend and removed request for a team June 23, 2026 10:08

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread internal/cache/range.go Outdated

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread internal/httputil/conditional.go

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread internal/httputil/conditional.go

alecthomas and others added 2 commits June 23, 2026 21:09

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread internal/cache/parallel_get.go Outdated

Comment thread internal/cache/parallel_get.go Outdated

alecthomas and others added 2 commits June 23, 2026 21:37

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread internal/httputil/conditional.go

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread internal/cache/disk.go

alecthomas force-pushed the aat/range-support branch from ed6532c to d0c7cbf Compare June 23, 2026 23:38

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread internal/cache/parallel_get.go

alecthomas mentioned this pull request Jun 23, 2026

Disk backend ranged reads are not revision-atomic (open fd vs. metadata) #348

Open

alecthomas force-pushed the aat/range-support branch from 251494f to ca3550c Compare June 23, 2026 23:56

alecthomas requested a review from worstell June 24, 2026 00:00

alecthomas merged commit d5e4b5b into main Jun 24, 2026
8 checks passed

alecthomas deleted the aat/range-support branch June 24, 2026 00:31

Uh oh!

Conversation

alecthomas commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Intent

Features

Byte ranges on Cache.Open

ParallelGet helper

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

alecthomas commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

alecthomas commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

alecthomas commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alecthomas commented Jun 23, 2026 •

edited

Loading

Byte ranges on `Cache.Open`

`ParallelGet` helper