fix: fail fast on insufficient disk space instead of retry loop by ianbmacdonald · Pull Request #1534 · lemonade-sdk/lemonade

ianbmacdonald · 2026-04-04T03:22:23Z

Summary

Adds a pre-flight disk space check before downloading model files, comparing available space against the total download size. Fails immediately with a clear error message (e.g., "download requires 16.8 GB but only 0.0 GB is available") instead of starting the download.
Detects CURLE_WRITE_ERROR (code 23) + low available disk as a fatal non-retryable condition, preventing the retry loop from deleting the partial file and re-downloading from scratch repeatedly.
Adds a disk_full flag to DownloadResult to short-circuit the retry loop.

Problem

When a large model download (e.g., 16.8 GB Gemma 4) fills the disk, curl returns error 23 (CURLE_WRITE_ERROR). The existing retry logic treated this as a non-resumable error: it deleted the ~13 GB partial file and retried from zero — hitting the same wall each time in an infinite loop, burning bandwidth and I/O. In a session today, one host was redownloading a 16GB Gemma4 model at 1Gbps for an hour, as I walked away after the download started, and it just looped .. 2.5min per 16GB a LOT of times .. copied a lemonade config with the wrong HF_HOME from another device into lemonade's conf.d

Test plan

Tested on a machine with insufficient disk space — error returned instantly with clear message
Verified the error surfaces correctly in both the Download Manager panel and toast notification
Verify normal downloads still work on machines with sufficient space
Verify resume behavior is unaffected for network interruptions

🤖 Looked a the diffs; 100% vibed with Claude including deploy and testing using debian package on Debian 13

ianbmacdonald · 2026-04-05T15:15:31Z

The latest changes address the main correctness gaps in the pre-flight disk check:

50a92a1 now subtracts bytes already on disk from the required space calculation, so resumed downloads are checked against remaining bytes rather than total model size.
12b51e5 guards the remaining-bytes calculation against size_t underflow and caps partial-file credit to the manifest size, which matters when manifest entries have size=0 or incomplete size metadata.

ianbmacdonald · 2026-04-05T15:21:01Z

Did not really come up with a way to test this in CI synthetically. It was tested in the real scenario which prompted the fix. Agents suggested it might be done with a helper, wrapped with a free-space query, with synthetic manifests and don-disk byte counts, but seems like overkill to try and fake a clearly signalled scenario in a bunch of different CI environments.

One integration note: if this branch is later rebased onto or folded into #1412, the pre-flight accounting in download_from_manifest() will need one small adaptation. #1412 writes per-file download_path values into the manifest for multi-repo downloads, so the disk-space pre-check needs to mirror the real download loop and use:

file_desc.value("download_path", download_path)

when computing completed-file and .partial credit, rather than assuming everything lives under the manifest’s top-level download_path. Otherwise it will undercount bytes already present in secondary repo snapshots.

jeremyfowers · 2026-04-07T19:12:15Z

@claude review. Will this work on Windows, macOS, and Linux? Are there any drawbacks to merging this?

github-actions · 2026-04-07T19:12:31Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

github-actions · 2026-04-08T18:21:07Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

When a download fails with CURLE_WRITE_ERROR (code 23) due to disk full, the retry logic would delete the partial file and restart from zero — repeating indefinitely and wasting bandwidth. This adds: - Pre-flight disk space check before downloading, comparing available space against total download size - Detection of CURLE_WRITE_ERROR + low disk as a fatal (non-retryable) condition with a clear error message - disk_full flag on DownloadResult to short-circuit the retry loop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The pre-flight check compared the full manifest size against available space, ignoring completed files and .partial resume data already on disk. This caused false "Insufficient disk space" errors when resuming after freeing just enough space for the remaining bytes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cap per-file credit to manifest size and clamp the subtraction to zero. Prevents unsigned underflow when manifest contains size=0 entries but partial files exist on disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The pre-flight disk space check assumed all files live under the manifest's top-level download_path. With lemonade-sdk#1412's multi-repo support, each file entry can have its own download_path pointing to a repo-specific snapshot directory. Mirror the actual download loop by using file_desc.value("download_path", download_path) so completed and partial files in secondary repo snapshots are correctly credited. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ianbmacdonald · 2026-04-09T02:50:18Z

Now that #1412 has merged, the integration note from the earlier comment has been addressed in 5e1e827.

The pre-flight disk space check in download_from_manifest() was using the manifest's top-level download_path for all files, which undercounts bytes already on disk for multi-repo models where secondary checkpoints (text_encoder, vae, etc.) live in their own repo-specific snapshot directories.

The fix mirrors the actual download loop by using file_desc.value("download_path", download_path) so completed and partial files in secondary repo snapshots are correctly credited against the space requirement.

ianbmacdonald marked this pull request as ready for review April 4, 2026 03:47

ianbmacdonald force-pushed the fix/disk-full-download-retry branch from 77567fc to 656eccb Compare April 5, 2026 14:18

ianbmacdonald force-pushed the fix/disk-full-download-retry branch from 12b51e5 to baa0e6a Compare April 8, 2026 03:29

ianbmacdonald and others added 3 commits April 8, 2026 17:33

ianbmacdonald force-pushed the fix/disk-full-download-retry branch from baa0e6a to a066920 Compare April 8, 2026 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fail fast on insufficient disk space instead of retry loop#1534

fix: fail fast on insufficient disk space instead of retry loop#1534
ianbmacdonald wants to merge 4 commits intolemonade-sdk:mainfrom
ianbmacdonald:fix/disk-full-download-retry

ianbmacdonald commented Apr 4, 2026 •

edited

Loading

Uh oh!

ianbmacdonald commented Apr 5, 2026

Uh oh!

ianbmacdonald commented Apr 5, 2026

Uh oh!

jeremyfowers commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

ianbmacdonald commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ianbmacdonald commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Test plan

Uh oh!

ianbmacdonald commented Apr 5, 2026

Uh oh!

ianbmacdonald commented Apr 5, 2026

Uh oh!

jeremyfowers commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

ianbmacdonald commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ianbmacdonald commented Apr 4, 2026 •

edited

Loading