Skip to content

feat(cache): support single byte ranges on Cache.Open#346

Merged
alecthomas merged 13 commits into
mainfrom
aat/range-support
Jun 24, 2026
Merged

feat(cache): support single byte ranges on Cache.Open#346
alecthomas merged 13 commits into
mainfrom
aat/range-support

Conversation

@alecthomas

@alecthomas alecthomas commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Intent

Make the cache range-aware so callers can fetch a slice of an object instead of the whole thing, and add a primitive that uses that capability to pull large objects faster by reading their chunks in parallel.

Features

Byte ranges on Cache.Open

Open accepts a Range option and resolves it end to end — through the client SDK, every backend (disk, memory, S3, remote, tiered), and the server's HTTP serving.

  • Typed, half-open API: cache.Range(start, end) requests [start, end); a negative end means "to the end of the object". cache.Range(0, -1) is the whole object, cache.Range(0, 500) the first 500 bytes.
  • If-Range (ETag): cache.IfRange(etag) only applies the range when the stored ETag matches, otherwise the full object is served.
  • Efficient slicing per backend: disk seeks + io.LimitReader, memory slices, S3 issues a single ranged GET, remote forwards the range over the wire; Tiered skips backfill on partial reads so a truncated object is never cached.
  • HTTP semantics: serving emits 206 Partial Content with Content-Range, 416 Range Not Satisfiable for out-of-bounds ranges, and advertises Accept-Ranges: bytes. Stat/HEAD ignores Range. Incoming client Range headers (including suffix bytes=-N) are forwarded verbatim by the proxy handlers.

Scope: a single byte range only — multi-range and malformed Range headers fall back to a full 200; If-Range supports the entity-tag form.

ParallelGet helper

cache.ParallelGet(ctx, c, key, dst io.WriterAt, chunkSize, concurrency) downloads an object from any range-capable Cache by fetching its chunks concurrently — generalising the S3 backend's parallel download to any implementation, so latency-bound backends like the remote cache can saturate bandwidth.

  • Open-first discovery: the first ranged Open yields chunk zero, the total size (from Content-Range), and the ETag — no separate Stat.
  • Revision-consistent: every subsequent chunk is pinned with If-Range to the discovery ETag; a chunk whose ETag differs (the key was rewritten mid-download) is reported as an error rather than splicing two revisions. Missing or short chunks are errors too.
  • Streams to a caller-owned sink: each worker copies its range straight to dst at the right offset via io.NewOffsetWriter + io.Copy (no full-object buffering); concurrency is bounded by errgroup.SetLimit. The caller owns dst's lifecycle and need not pre-size it.

alecthomas and others added 4 commits June 23, 2026 16:28
Add Range/If-Range support to Cache.Open mirroring the existing ETag
conditional pattern. RequestOptions gains Range/IfRange fields and a
ResolveRange resolver for a single byte range, gated on the stored ETag
via If-Range. Backends slice the body (disk seek, memory slice, S3
ranged GET) and set Content-Range; out-of-bounds ranges return
ErrRangeNotSatisfiable. Tiered skips backfill on partial reads to avoid
caching truncated objects. Serving emits 206/416 and advertises
Accept-Ranges; Stat ignores Range. Multi-range falls back to a full 200.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address code-review findings on the byte-range support:

- Strip Range, If-Range and Content-Range on PUT (httputil.TransportHeaders),
  alongside the existing If-Match/If-None-Match. A stored Content-Range would
  otherwise be replayed and make a plain GET spuriously answer 206.
- Return the stored headers (not nil) on the disk Seek error path, matching the
  416 path.
- Add tests: stored-Content-Range regression, full-size and suffix
  Content-Length assertions, and a zero-length-object 416 case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the raw-string Range(spec) option with a typed Range(start, end)
taking a half-open byte interval; a negative end means "to the end of the
object" (Content-Length). e.g. Range(0, 500) is the first 500 bytes and
Range(0, -1) the whole object.

The raw HTTP header form is retained as RangeHeader(spec) for forwarding a
client's Range verbatim (used by httputil and for suffix ranges that the
typed form can't express). The wire format and server-side parsing are
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The client SDK targets the apiv1 endpoint and only needs the typed
Range(start, end) form, so remove the raw RangeHeader option (and its
cache re-export). The server's proxy handlers still forward an external
client's verbatim Range header (e.g. suffix "bytes=-N") by setting the
shared RequestOptions.Range field directly in httputil.

Suffix coverage stays via the existing apiv1 end-to-end test; the
redundant suite subtest is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alecthomas alecthomas requested a review from a team as a code owner June 23, 2026 10:08
@alecthomas alecthomas requested review from joshfriend and removed request for a team June 23, 2026 10:08

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 047fa1e5ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/cache/range.go Outdated
…ases

A 416 carries no body, but the backend had already set Content-Length to the
full object size, so clients could wait for bytes that never arrive. Drop it on
the unsatisfiable-range path.

Also route the conditional/range option logic (range.go, conditional.go,
tiered.go) through the cache package's own aliases rather than referencing the
client package directly; add NewRequestOptions/RequestOptions/RangeOutcome
aliases to the cache API for that.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94b87ca6cc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/httputil/conditional.go
@alecthomas

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94b87ca6cc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/httputil/conditional.go
alecthomas and others added 2 commits June 23, 2026 21:09
ServeCacheHit infers 206 from a Content-Range header, so a Content-Range
persisted in an object's stored metadata (pre-existing objects, or direct
Cache.Create callers that bypass the APIV1 PUT filter) would turn a plain full
GET into a spurious 206. rangeShortCircuit now drops Content-Range on the
full-response path, so the 206 signal only ever reflects a real range.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalise the S3 backend's parallel chunked download to any Cache: ParallelGet
Stats the object, then fetches chunkSize-byte ranges concurrently (up to a given
concurrency) via ranged Opens, reassembling them in order through a pipe.
Latency-bound backends such as the remote cache can saturate bandwidth with
overlapping reads. Small objects or concurrency<2 fall back to a single Open.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b771b61ddb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/cache/parallel_get.go Outdated
Comment thread internal/cache/parallel_get.go Outdated
alecthomas and others added 2 commits June 23, 2026 21:37
Rework ParallelGet to write into a caller-supplied io.WriterAt instead of
returning a streaming io.ReadCloser, which removes the ordered-channel
reassembly, the io.Pipe, and every io.ReadAll the previous version inherited
from the S3 code:

- Open-first discovery: the first ranged Open yields chunk zero, the total size
  (from Content-Range) and the ETag, replacing the separate Stat round trip.
- Every remaining chunk is pinned with IfRange to that ETag; a chunk whose ETag
  differs (object rewritten mid-download) returns an error rather than splicing
  revisions. Missing or short chunks are likewise errors.
- Each worker streams its range straight to dst via io.NewOffsetWriter + io.Copy
  at non-overlapping offsets; concurrency is bounded with errgroup.SetLimit.

The caller owns dst's lifecycle and need not pre-size it (WriteAt extends it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With errgroup SetLimit, the dispatch loop kept queuing every remaining chunk
once a failure cancelled the group, blocking on the limiter only to bail
immediately. Break out of the loop when egCtx is cancelled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alecthomas

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: 4ac137be00

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The discovery Open must stay synchronous to learn the total size and ETag, but
its body was also being copied to the sink serially before the remaining chunks
were scheduled, so a large first chunk blocked all parallelism. Hand the already
-open first chunk to a goroutine in the errgroup alongside the other workers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ece7d749b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/httputil/conditional.go
Stat ignores Range and never runs rangeShortCircuit, so a Content-Range left in
an object's stored metadata was echoed on a 200 HEAD, advertising partial
metadata for the full object. Move the strip into conditionalShortCircuit, the
one step both Stat and Open run on stored headers, so Stat (and the 304 paths)
never carry it and Open only does when rangeShortCircuit sets a real range. The
now-redundant strip in rangeShortCircuit is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cc2176897

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/cache/disk.go
The TransportHeaders strip list ran only in the apiv1 PUT handler, so direct
Cache.Create callers could persist transport headers (incl. Content-Range) as
object metadata. Apply httputil.FilterHeaders with that list in the disk, memory
and S3 Create paths, so every writer gets identical hygiene. This complements —
rather than replaces — the read-time strip in conditionalShortCircuit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0c7cbf5d5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/cache/parallel_get.go
…no ETag

Without an ETag the per-chunk IfRange pinning is a no-op (IfRange("") is ignored
and an empty ETag matches an empty ETag), so chunks from different revisions
could be spliced undetected during a rewrite. Objects stored before ETags were
recorded have none, so rather than failing, fall back to a single
revision-consistent read instead of parallelising. Single-chunk objects (one
read) are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alecthomas alecthomas requested a review from worstell June 24, 2026 00:00
@alecthomas

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

Reviewed commit: ca3550cedb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@alecthomas alecthomas merged commit d5e4b5b into main Jun 24, 2026
8 checks passed
@alecthomas alecthomas deleted the aat/range-support branch June 24, 2026 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant