Skip to content

fix: ensure recipe_options.json values are applied at model load time#1540

Open
ramkrishna2910 wants to merge 1 commit intomainfrom
fix/recipe-options-at-load-time
Open

fix: ensure recipe_options.json values are applied at model load time#1540
ramkrishna2910 wants to merge 1 commit intomainfrom
fix/recipe-options-at-load-time

Conversation

@ramkrishna2910
Copy link
Copy Markdown
Contributor

Summary

Per-model recipe options (e.g. ctx_size) saved in ~/.cache/lemonade/recipe_options.json were not reliably reaching the llama-server subprocess. Models would always launch with the default ctx_size=4096 even when recipe_options.json specified a different value.

Root Cause

The existing code in ModelManager::build_cache() merges recipe options by model name (line 890), but due to cache timing the merged values were sometimes not present in model_info.recipe_options when Router::load_model() resolved the effective options. This meant the llama-server subprocess was always launched with the code default of ctx_size=4096.

Reproduction

  1. Set a custom ctx_size in ~/.cache/lemonade/recipe_options.json:
    {"Qwen3.5-35B-A3B-GGUF": {"ctx_size": 16384}}
  2. Start lemond and load the model
  3. Check the llama-server subprocess args — --ctx-size will show 4096 instead of 16384

Fix

Added a direct lookup of recipe_options.json in Router::load_model() as a safety net. This ensures saved per-model options always take effect regardless of cache state.

Changes

  • model_manager.h/cpp: Add get_saved_recipe_options(model_name) to expose the raw recipe_options_ map
  • router.cpp: Look up saved options directly at load time and merge into the options resolution chain

The options resolution order remains: load-time overrides > per-model saved options > global defaults.

Verification

After this fix, the llama-server subprocess correctly receives --ctx-size 16384 when set in recipe_options.json:

[Info] (Router) Loading Qwen3.5-35B-A3B-GGUF with: ctx_size=16384, llamacpp_backend=vulkan, llamacpp_args=(none)

Test plan

  • Set ctx_size in recipe_options.json for a model, verify it appears in llama-server subprocess args
  • Verify default ctx_size=4096 still applies when no override is set
  • Verify llamacpp_args from recipe_options.json also flows through
  • Verify load-time options (e.g. from API request) still take precedence over saved options

🤖 Generated with Claude Code

Per-model recipe options (e.g. ctx_size) saved in recipe_options.json
were not reliably reaching the llama-server subprocess. The existing
code in build_cache() merges recipe_options by model name, but due to
cache timing the merged values were sometimes not present in
model_info.recipe_options when Router::load_model() resolved the
effective options.

This fix adds a direct lookup of recipe_options.json in
Router::load_model() as a safety net, ensuring saved per-model
options always take effect regardless of cache state.

Changes:
- Add ModelManager::get_saved_recipe_options() to expose the raw
  recipe_options_ map for a given model name
- In Router::load_model(), look up saved options directly and merge
  them into the options resolution chain
- Promote the effective settings log from DEBUG to INFO for easier
  debugging of option resolution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ianbmacdonald
Copy link
Copy Markdown
Collaborator

sounds like one of the fixes in #1412

@jeremyfowers
Copy link
Copy Markdown
Member

sounds like one of the fixes in #1412

1412 should merge soon so @ramkrishna2910 can you look at whether this is a duplicate?

@ramkrishna2910
Copy link
Copy Markdown
Contributor Author

This PR adds a fix against a timing-specific race condition that PR #1412 doesn't explicitly address. The scenario where the router reads model_info.recipe_options before build_cache() has finished merging. I was running in to this during some of the vllm testing. So this is likely an addition on top of #1412

@ianbmacdonald
Copy link
Copy Markdown
Collaborator

Thanks for looking into this — recipe_options reliability has definitely been a pain point. (Claude and Codex working on behalf of Ian) traced through the code paths to understand how this interacts with the changes in #1412, and wanted to share the findings in case there's something missing about the reproduction scenario.

The current merge path (with #1412's build_recipe_options()):

In build_cache(), every model's info.recipe_options is populated via build_recipe_options(info, jro, name, recipe_options_), which applies three layers:

  1. image_defaults as base
  2. JSON-level recipe_options from server_models.json
  3. User-saved options from the in-memory recipe_options_ (loaded from recipe_options.json at construction)

The same function is called from add_model_to_cache() for dynamically added models. Both paths complete synchronously before any ModelInfo is returned to the router.

The load path:

handle_load calls get_model_info() → returns the cached ModelInfo with fully-merged recipe_options → passes it to Router::load_model(), which resolves:

options.inherit(model_info.recipe_options.inherit(default_opt))

So by the time the router sees model_info.recipe_options, the saved per-model values should already be baked in.

On the stale-data angle:

(Claude and Codex working on behalf of Ian) also considered whether this could be a staleness fix — e.g., recipe_options.json edited on disk while the server is running. But get_saved_recipe_options() reads from the same in-memory recipe_options_ member that build_cache() already merges from, so it would not pick up external file edits.

Potential regression with the replacement logic:

There's also a concern with the replacement semantics in the new router code:

RecipeOptions saved_opt = model_info.recipe_options;
auto saved_json = model_manager_->get_saved_recipe_options(model_name);
if (!saved_json.empty()) {
    saved_opt = RecipeOptions(model_info.recipe, saved_json);
}

When saved_json is non-empty, saved_opt is replaced with a RecipeOptions built from only the raw saved JSON, rather than merged. This strips the image_defaults and JSON-level recipe_options that #1412's build_recipe_options() intentionally layers in. For example, Qwen-Image-GGUF with a user-saved ctx_size override would lose its sampling_method, flow_shift, and sdcpp_args (--diffusion-fa --offload-to-cpu) at load time.

What (Claude and Codex working on behalf of Ian) might be missing:

Is the vllm testing scenario using a code path that bypasses build_cache() / add_model_to_cache()? Or is there a case where get_model_info() returns a ModelInfo before the recipe_options merge loop has run? Would love to understand the specific reproduction — it's possible there's an ordering edge case not visible from the static analysis alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants