fix: ensure recipe_options.json values are applied at model load time by ramkrishna2910 · Pull Request #1540 · lemonade-sdk/lemonade

ramkrishna2910 · 2026-04-05T00:30:08Z

Summary

Per-model recipe options (e.g. ctx_size) saved in ~/.cache/lemonade/recipe_options.json were not reliably reaching the llama-server subprocess. Models would always launch with the default ctx_size=4096 even when recipe_options.json specified a different value.

Root Cause

The existing code in ModelManager::build_cache() merges recipe options by model name (line 890), but due to cache timing the merged values were sometimes not present in model_info.recipe_options when Router::load_model() resolved the effective options. This meant the llama-server subprocess was always launched with the code default of ctx_size=4096.

Reproduction

Set a custom ctx_size in ~/.cache/lemonade/recipe_options.json:
```
{"Qwen3.5-35B-A3B-GGUF": {"ctx_size": 16384}}
```
Start lemond and load the model
Check the llama-server subprocess args — --ctx-size will show 4096 instead of 16384

Fix

Added a direct lookup of recipe_options.json in Router::load_model() as a safety net. This ensures saved per-model options always take effect regardless of cache state.

Changes

model_manager.h/cpp: Add get_saved_recipe_options(model_name) to expose the raw recipe_options_ map
router.cpp: Look up saved options directly at load time and merge into the options resolution chain

The options resolution order remains: load-time overrides > per-model saved options > global defaults.

Verification

After this fix, the llama-server subprocess correctly receives --ctx-size 16384 when set in recipe_options.json:

[Info] (Router) Loading Qwen3.5-35B-A3B-GGUF with: ctx_size=16384, llamacpp_backend=vulkan, llamacpp_args=(none)

Test plan

Set ctx_size in recipe_options.json for a model, verify it appears in llama-server subprocess args
Verify default ctx_size=4096 still applies when no override is set
Verify llamacpp_args from recipe_options.json also flows through
Verify load-time options (e.g. from API request) still take precedence over saved options

🤖 Generated with Claude Code

Per-model recipe options (e.g. ctx_size) saved in recipe_options.json were not reliably reaching the llama-server subprocess. The existing code in build_cache() merges recipe_options by model name, but due to cache timing the merged values were sometimes not present in model_info.recipe_options when Router::load_model() resolved the effective options. This fix adds a direct lookup of recipe_options.json in Router::load_model() as a safety net, ensuring saved per-model options always take effect regardless of cache state. Changes: - Add ModelManager::get_saved_recipe_options() to expose the raw recipe_options_ map for a given model name - In Router::load_model(), look up saved options directly and merge them into the options resolution chain - Promote the effective settings log from DEBUG to INFO for easier debugging of option resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ianbmacdonald · 2026-04-06T16:45:54Z

sounds like one of the fixes in #1412

jeremyfowers · 2026-04-06T18:26:38Z

sounds like one of the fixes in #1412

1412 should merge soon so @ramkrishna2910 can you look at whether this is a duplicate?

ramkrishna2910 · 2026-04-06T19:02:27Z

This PR adds a fix against a timing-specific race condition that PR #1412 doesn't explicitly address. The scenario where the router reads model_info.recipe_options before build_cache() has finished merging. I was running in to this during some of the vllm testing. So this is likely an addition on top of #1412

ianbmacdonald · 2026-04-08T03:48:38Z

Thanks for looking into this — recipe_options reliability has definitely been a pain point. (Claude and Codex working on behalf of Ian) traced through the code paths to understand how this interacts with the changes in #1412, and wanted to share the findings in case there's something missing about the reproduction scenario.

The current merge path (with #1412's build_recipe_options()):

In build_cache(), every model's info.recipe_options is populated via build_recipe_options(info, jro, name, recipe_options_), which applies three layers:

image_defaults as base
JSON-level recipe_options from server_models.json
User-saved options from the in-memory recipe_options_ (loaded from recipe_options.json at construction)

The same function is called from add_model_to_cache() for dynamically added models. Both paths complete synchronously before any ModelInfo is returned to the router.

The load path:

handle_load calls get_model_info() → returns the cached ModelInfo with fully-merged recipe_options → passes it to Router::load_model(), which resolves:

options.inherit(model_info.recipe_options.inherit(default_opt))

So by the time the router sees model_info.recipe_options, the saved per-model values should already be baked in.

On the stale-data angle:

(Claude and Codex working on behalf of Ian) also considered whether this could be a staleness fix — e.g., recipe_options.json edited on disk while the server is running. But get_saved_recipe_options() reads from the same in-memory recipe_options_ member that build_cache() already merges from, so it would not pick up external file edits.

Potential regression with the replacement logic:

There's also a concern with the replacement semantics in the new router code:

RecipeOptions saved_opt = model_info.recipe_options;
auto saved_json = model_manager_->get_saved_recipe_options(model_name);
if (!saved_json.empty()) {
    saved_opt = RecipeOptions(model_info.recipe, saved_json);
}

When saved_json is non-empty, saved_opt is replaced with a RecipeOptions built from only the raw saved JSON, rather than merged. This strips the image_defaults and JSON-level recipe_options that #1412's build_recipe_options() intentionally layers in. For example, Qwen-Image-GGUF with a user-saved ctx_size override would lose its sampling_method, flow_shift, and sdcpp_args (--diffusion-fa --offload-to-cpu) at load time.

What (Claude and Codex working on behalf of Ian) might be missing:

Is the vllm testing scenario using a code path that bypasses build_cache() / add_model_to_cache()? Or is there a case where get_model_info() returns a ModelInfo before the recipe_options merge loop has run? Would love to understand the specific reproduction — it's possible there's an ordering edge case not visible from the static analysis alone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure recipe_options.json values are applied at model load time#1540

fix: ensure recipe_options.json values are applied at model load time#1540
ramkrishna2910 wants to merge 1 commit intomainfrom
fix/recipe-options-at-load-time

ramkrishna2910 commented Apr 5, 2026

Uh oh!

ianbmacdonald commented Apr 6, 2026

Uh oh!

jeremyfowers commented Apr 6, 2026

Uh oh!

ramkrishna2910 commented Apr 6, 2026

Uh oh!

ianbmacdonald commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ramkrishna2910 commented Apr 5, 2026

Summary

Root Cause

Reproduction

Fix

Changes

Verification

Test plan

Uh oh!

ianbmacdonald commented Apr 6, 2026

Uh oh!

jeremyfowers commented Apr 6, 2026

Uh oh!

ramkrishna2910 commented Apr 6, 2026

Uh oh!

ianbmacdonald commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants