feat(cache): support single byte ranges on Cache.Open#346
Conversation
Add Range/If-Range support to Cache.Open mirroring the existing ETag conditional pattern. RequestOptions gains Range/IfRange fields and a ResolveRange resolver for a single byte range, gated on the stored ETag via If-Range. Backends slice the body (disk seek, memory slice, S3 ranged GET) and set Content-Range; out-of-bounds ranges return ErrRangeNotSatisfiable. Tiered skips backfill on partial reads to avoid caching truncated objects. Serving emits 206/416 and advertises Accept-Ranges; Stat ignores Range. Multi-range falls back to a full 200. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address code-review findings on the byte-range support: - Strip Range, If-Range and Content-Range on PUT (httputil.TransportHeaders), alongside the existing If-Match/If-None-Match. A stored Content-Range would otherwise be replayed and make a plain GET spuriously answer 206. - Return the stored headers (not nil) on the disk Seek error path, matching the 416 path. - Add tests: stored-Content-Range regression, full-size and suffix Content-Length assertions, and a zero-length-object 416 case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the raw-string Range(spec) option with a typed Range(start, end) taking a half-open byte interval; a negative end means "to the end of the object" (Content-Length). e.g. Range(0, 500) is the first 500 bytes and Range(0, -1) the whole object. The raw HTTP header form is retained as RangeHeader(spec) for forwarding a client's Range verbatim (used by httputil and for suffix ranges that the typed form can't express). The wire format and server-side parsing are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The client SDK targets the apiv1 endpoint and only needs the typed Range(start, end) form, so remove the raw RangeHeader option (and its cache re-export). The server's proxy handlers still forward an external client's verbatim Range header (e.g. suffix "bytes=-N") by setting the shared RequestOptions.Range field directly in httputil. Suffix coverage stays via the existing apiv1 end-to-end test; the redundant suite subtest is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 047fa1e5ce
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ases A 416 carries no body, but the backend had already set Content-Length to the full object size, so clients could wait for bytes that never arrive. Drop it on the unsatisfiable-range path. Also route the conditional/range option logic (range.go, conditional.go, tiered.go) through the cache package's own aliases rather than referencing the client package directly; add NewRequestOptions/RequestOptions/RangeOutcome aliases to the cache API for that. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94b87ca6cc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94b87ca6cc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ServeCacheHit infers 206 from a Content-Range header, so a Content-Range persisted in an object's stored metadata (pre-existing objects, or direct Cache.Create callers that bypass the APIV1 PUT filter) would turn a plain full GET into a spurious 206. rangeShortCircuit now drops Content-Range on the full-response path, so the 206 signal only ever reflects a real range. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalise the S3 backend's parallel chunked download to any Cache: ParallelGet Stats the object, then fetches chunkSize-byte ranges concurrently (up to a given concurrency) via ranged Opens, reassembling them in order through a pipe. Latency-bound backends such as the remote cache can saturate bandwidth with overlapping reads. Small objects or concurrency<2 fall back to a single Open. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b771b61ddb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Rework ParallelGet to write into a caller-supplied io.WriterAt instead of returning a streaming io.ReadCloser, which removes the ordered-channel reassembly, the io.Pipe, and every io.ReadAll the previous version inherited from the S3 code: - Open-first discovery: the first ranged Open yields chunk zero, the total size (from Content-Range) and the ETag, replacing the separate Stat round trip. - Every remaining chunk is pinned with IfRange to that ETag; a chunk whose ETag differs (object rewritten mid-download) returns an error rather than splicing revisions. Missing or short chunks are likewise errors. - Each worker streams its range straight to dst via io.NewOffsetWriter + io.Copy at non-overlapping offsets; concurrency is bounded with errgroup.SetLimit. The caller owns dst's lifecycle and need not pre-size it (WriteAt extends it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With errgroup SetLimit, the dispatch loop kept queuing every remaining chunk once a failure cancelled the group, blocking on the limiter only to bail immediately. Break out of the loop when egCtx is cancelled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! Reviewed commit: ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
The discovery Open must stay synchronous to learn the total size and ETag, but its body was also being copied to the sink serially before the remaining chunks were scheduled, so a large first chunk blocked all parallelism. Hand the already -open first chunk to a goroutine in the errgroup alongside the other workers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ece7d749b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Stat ignores Range and never runs rangeShortCircuit, so a Content-Range left in an object's stored metadata was echoed on a 200 HEAD, advertising partial metadata for the full object. Move the strip into conditionalShortCircuit, the one step both Stat and Open run on stored headers, so Stat (and the 304 paths) never carry it and Open only does when rangeShortCircuit sets a real range. The now-redundant strip in rangeShortCircuit is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cc2176897
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The TransportHeaders strip list ran only in the apiv1 PUT handler, so direct Cache.Create callers could persist transport headers (incl. Content-Range) as object metadata. Apply httputil.FilterHeaders with that list in the disk, memory and S3 Create paths, so every writer gets identical hygiene. This complements — rather than replaces — the read-time strip in conditionalShortCircuit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ed6532c to
d0c7cbf
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d0c7cbf5d5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…no ETag
Without an ETag the per-chunk IfRange pinning is a no-op (IfRange("") is ignored
and an empty ETag matches an empty ETag), so chunks from different revisions
could be spliced undetected during a rewrite. Objects stored before ETags were
recorded have none, so rather than failing, fall back to a single
revision-consistent read instead of parallelising. Single-chunk objects (one
read) are unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
251494f to
ca3550c
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Bravo. Reviewed commit: ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
Intent
Make the cache range-aware so callers can fetch a slice of an object instead of the whole thing, and add a primitive that uses that capability to pull large objects faster by reading their chunks in parallel.
Features
Byte ranges on
Cache.OpenOpenaccepts aRangeoption and resolves it end to end — through the client SDK, every backend (disk, memory, S3, remote, tiered), and the server's HTTP serving.cache.Range(start, end)requests[start, end); a negativeendmeans "to the end of the object".cache.Range(0, -1)is the whole object,cache.Range(0, 500)the first 500 bytes.If-Range(ETag):cache.IfRange(etag)only applies the range when the stored ETag matches, otherwise the full object is served.io.LimitReader, memory slices, S3 issues a single ranged GET, remote forwards the range over the wire;Tieredskips backfill on partial reads so a truncated object is never cached.206 Partial ContentwithContent-Range,416 Range Not Satisfiablefor out-of-bounds ranges, and advertisesAccept-Ranges: bytes.Stat/HEADignoresRange. Incoming clientRangeheaders (including suffixbytes=-N) are forwarded verbatim by the proxy handlers.Scope: a single byte range only — multi-range and malformed
Rangeheaders fall back to a full200;If-Rangesupports the entity-tag form.ParallelGethelpercache.ParallelGet(ctx, c, key, dst io.WriterAt, chunkSize, concurrency)downloads an object from any range-capableCacheby fetching its chunks concurrently — generalising the S3 backend's parallel download to any implementation, so latency-bound backends like the remote cache can saturate bandwidth.Openyields chunk zero, the total size (fromContent-Range), and the ETag — no separateStat.If-Rangeto the discovery ETag; a chunk whose ETag differs (the key was rewritten mid-download) is reported as an error rather than splicing two revisions. Missing or short chunks are errors too.dstat the right offset viaio.NewOffsetWriter+io.Copy(no full-object buffering); concurrency is bounded byerrgroup.SetLimit. The caller ownsdst's lifecycle and need not pre-size it.