feat(git): add parallel range download for snapshot restore#361
Conversation
Add a --download-concurrency flag (default 1) to `cachew git restore` that fetches the snapshot with bounded concurrent range requests via client.ParallelGet, downloading into a temp file and extracting from it. A --download-chunk-size-mb flag (default 8) tunes the chunk size. A concurrency of 1, an old server, or a missing ETag transparently falls back to today's single streaming download. ParallelGet drives the object-key API, but the snapshot lives behind the /git endpoint, so add a RangeReader adapter that issues ranged GETs to the snapshot URL and captures the freshen metadata (commit / bundle URL) from the discovery response. Route the cold-start serve paths through ServeCacheHit + ConditionalOptions so cold serves advertise an ETag and Accept-Ranges and honour Range, letting clients parallelise during mirror warm-up too.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee4f7bf116
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // WriteAt so it cannot stream into extraction; the temp file is removed on | ||
| // return. | ||
| func (c *GitRestoreCmd) parallelFetchAndExtract(ctx context.Context, api *client.Client) (string, string, error) { | ||
| tmp, err := os.CreateTemp("", "cachew-snapshot-*.tar.zst") |
There was a problem hiding this comment.
Store parallel snapshots on the target filesystem
When users enable --download-concurrency for a multi-GB snapshot on hosts where /tmp is a small tmpfs or separate quota, this writes the entire compressed snapshot to the default temp directory before extraction. That can fail with ENOSPC even when c.Directory has enough space for the restore; creating the temp file under the target directory's filesystem, or making the temp location configurable, avoids making parallel restore unusable in those environments.
Useful? React with 👍 / 👎.
Add a --download-concurrency flag (default 1) to
cachew git restorethat fetches the snapshot with bounded concurrent range requests via
client.ParallelGet, downloading into a temp file and extracting from it.
A --download-chunk-size-mb flag (default 8) tunes the chunk size. A
concurrency of 1, an old server, or a missing ETag transparently falls
back to today's single streaming download.
ParallelGet drives the object-key API, but the snapshot lives behind the
/git endpoint, so add a RangeReader adapter that issues ranged GETs to
the snapshot URL and captures the freshen metadata (commit / bundle URL)
from the discovery response.
Route the cold-start serve paths through ServeCacheHit + ConditionalOptions
so cold serves advertise an ETag and Accept-Ranges and honour Range,
letting clients parallelise during mirror warm-up too.