fix: make /load declarative/idempotent to prevent TOCTOU race by ianbmacdonald · Pull Request #1604 · lemonade-sdk/lemonade

ianbmacdonald · 2026-04-10T16:32:59Z

Summary

Fixes a TOCTOU race in handle_load where is_model_loaded(), unload_model(), and load_model() each acquired/released load_mutex_ independently, allowing concurrent requests to cause unnecessary eviction and reload of large models (~90s wasted for 68GB models)
Moves the "already loaded?" decision into Router::load_model() under the existing load_mutex_, with an allow_reload_on_option_change parameter for explicit /load callers
Redefines /load as declarative ("ensure loaded with these options") — same-options is a no-op, different-options atomically evicts and reloads
Adds three integration tests: concurrent race regression, sequential idempotency, and option-change reload

Test plan

Build passes (cmake --build --preset default)
Concurrent auto-load + /load for the same model: second arrival no-ops, no eviction in logs
Sequential /load with same options: backend_url unchanged (same subprocess, no restart)
Sequential /load with different ctx_size: evicts and reloads with new options
Full server_endpoints.py suite: 35/35 tests pass on Debian 13 x86_64

🤖 Generated with Claude Code

The /load endpoint previously did a non-atomic check-unload-reload sequence outside the Router's load_mutex_, causing unnecessary eviction and reload of large models when concurrent requests raced (e.g., an inference-triggered auto-load and an explicit /load for the same model). Move the "already loaded?" decision into Router::load_model() under the existing load_mutex_. Add allow_reload_on_option_change parameter so /load callers can opt into reload-if-options-differ behavior while auto-load callers remain conservative. This redefines /load as declarative: "ensure model is loaded with these options" rather than "always restart." Same-options /load is now a no-op; different-options /load atomically evicts and reloads. Tested on Debian 13 (ai4, x86_64) with the patched .deb package: - Concurrent auto-load + /load: second arrival no-ops (no eviction) - Sequential idempotent /load: backend_url unchanged (same subprocess) - Sequential option-change /load: evicts and reloads with new options - Full server_endpoints.py suite: 35/35 tests pass Closes lemonade-sdk#1603 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- test_012a: replace backend_url comparison with wall-clock time (backend_url is not a stable identity — choose_port can pick the same port after a restart) - test_012c: replace non-deterministic concurrent test with a sequential scenario that deterministically reproduces the lemonade-sdk#1603 race: load via inference first, then /load. Wall-clock time proves whether a reload occurred (0.002s no-op vs seconds for evict+reload) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ianbmacdonald and others added 2 commits April 10, 2026 12:32

ianbmacdonald marked this pull request as ready for review April 10, 2026 16:43

ianbmacdonald mentioned this pull request Apr 10, 2026

lemonade load times out before model finishes loading #1594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make /load declarative/idempotent to prevent TOCTOU race#1604

fix: make /load declarative/idempotent to prevent TOCTOU race#1604
ianbmacdonald wants to merge 2 commits intolemonade-sdk:mainfrom
ianbmacdonald:fix/load-idempotent

ianbmacdonald commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ianbmacdonald commented Apr 10, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant