feat: add per-phase Claude model configuration (claude_model + review_model)#272
Conversation
Add claude_model and review_model config parameters to control which Claude model is used for task execution vs review phases. This allows using a more capable model (e.g., opus) for tasks while using a faster model (e.g., sonnet) for review iterations, reducing cost on review-heavy runs. - Add Model field to ClaudeExecutor, injecting --model flag when set - Create separate review executor when review_model differs - Add --claude-model and --review-model CLI flags - Empty values preserve current behavior (no --model flag) Closes umputun#271
umputun
left a comment
There was a problem hiding this comment.
nice feature, thx. couple things to address:
- naming -
claude_modelis misleading, it only applies to task/planning phases, not to claude globally.task_modelwould be more accurate and symmetric withreview_model. same for the CLI flag ---task-model/--review-modelreads better than--claude-model/--review-model - config comment typo -
pkg/config/defaults/config:33says# claude_model =but should be# review_model = - custom wrappers -
--modelflag is silently ignored by custom providers (claude_commandconfig). not a bug, but worth documenting that model config only applies with the default Claude CLI - missing processor test - the
ReviewClaudeexecutor separation logic inrunner.gohas no test coverage. existing tests only exercise the fallback path (ReviewClaude: nil). a test with distinct task/review mock executors verifying the review phase calls the right one would close this gap - no docs update - README, CLAUDE.md, and llms.txt document all config options but don't mention the new fields or CLI flags
the approach itself is clean - separate executor only when models differ, correct config merge chain, backward compatible. no scope creep, tests pass, linter clean.
There was a problem hiding this comment.
Pull request overview
Adds per-phase Claude model selection so task execution and review phases can run on different Claude models, reducing cost for review-heavy runs while preserving existing behavior when unset.
Changes:
- Introduces
claude_modelandreview_modelconfig keys (plus--claude-model/--review-modelCLI flags) and wires them into runner construction. - Extends
ClaudeExecutorwith aModelfield that injects--model <value>into the Claude CLI invocation when configured. - Routes all review-related phases (first review, review loops, external eval, finalize) through a dedicated review executor when models differ.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/processor/runner.go | Builds/uses separate Claude executor for review phases when ReviewModel differs. |
| pkg/executor/executor.go | Adds Model field and injects --model flag into CLI args. |
| pkg/executor/executor_test.go | Adds test coverage for --model flag injection behavior. |
| pkg/config/values.go | Parses and merges new claude_model / review_model keys from INI configs. |
| pkg/config/values_test.go | Adds tests for parsing and merge behavior of new model fields. |
| pkg/config/defaults/config | Documents the new config keys in the embedded defaults template. |
| pkg/config/config.go | Plumbs new Values fields into the runtime config.Config. |
| cmd/ralphex/main.go | Adds CLI flags and resolves effective model values passed into the processor runner. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # review_model: model to use for review phases (first review, fix loops, finalize) | ||
| # falls back to claude_model if not set, then to Claude Code's default | ||
| # use a cheaper/faster model (e.g., sonnet) for reviews to reduce cost | ||
| # claude_model = |
There was a problem hiding this comment.
The template config comment for the review_model setting ends with # claude_model =, which is misleading and makes it look like the key name is claude_model. This should be # review_model = to match the documented setting.
| # claude_model = | |
| # review_model = |
| // inject --model flag if a model override is configured | ||
| if e.Model != "" { | ||
| args = append(args, "--model", e.Model) | ||
| } |
There was a problem hiding this comment.
ClaudeExecutor.Run unconditionally appends --model <Model> when Model is set. If users already include --model in claude_args (or if wrapper scripts inject it), this can lead to duplicate flags with ambiguous precedence. Consider either stripping any existing --model from the configured args before appending, or explicitly documenting/enforcing that Model overrides any --model present in Args (e.g., by removing earlier occurrences).
| // resolve review model: CLI flag > config file > claude_model > empty | ||
| reviewModel := req.Config.ReviewModel | ||
| if o.ReviewModel != "" { | ||
| reviewModel = o.ReviewModel | ||
| } |
There was a problem hiding this comment.
This comment says review model precedence includes a fallback to claude_model, but the code here only applies CLI/config values and passes them through. The actual fallback happens later in processor.New (when ReviewModel is empty). Updating the comment (or performing the fallback here) would avoid confusion for future maintainers.
|
@dwilberger do you plan to continue work on it? |
Rename claude_model to task_model for clarity, fix config comment typo, add processor test for ReviewClaude executor separation, update docs (README, CLAUDE.md, llms.txt), and normalize --flag=value form in docs.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -848,6 +862,8 @@ func createRunner(req executePlanRequest, o opts, log processor.Logger, holder * | |||
| CodexEnabled: codexEnabled, | |||
| FinalizeEnabled: req.Config.FinalizeEnabled, | |||
| DefaultBranch: req.BaseRef, | |||
| ClaudeModel: claudeModel, | |||
| ReviewModel: reviewModel, | |||
There was a problem hiding this comment.
The new per-phase model settings are only applied via createRunner, but plan mode (--plan) constructs processor.Config{...} separately and doesn't set ClaudeModel/ReviewModel. As a result, claude_model / --claude-model won't affect interactive plan creation even though it uses the Claude executor. Consider reusing the same model-resolution logic for plan mode (or passing ClaudeModel there) so configuration is consistent across modes.
| // resolve claude model: CLI flag > config file > empty (use CLI default) | ||
| claudeModel := req.Config.ClaudeModel | ||
| if o.ClaudeModel != "" { | ||
| claudeModel = o.ClaudeModel | ||
| } | ||
|
|
||
| // resolve review model: CLI flag > config file > claude_model > empty | ||
| reviewModel := req.Config.ReviewModel | ||
| if o.ReviewModel != "" { |
There was a problem hiding this comment.
createRunner uses if o.ClaudeModel != "" / if o.ReviewModel != "" to detect CLI overrides. With go-flags, a user can explicitly pass an empty value via --claude-model= / --review-model=; today that can't override a non-empty config value because it looks the same as “not provided”. To preserve the stated precedence (CLI > config) for empty values too, consider tracking claudeModelSet/reviewModelSet using isFlagSet (similar to the duration *Set flags) and applying the override when the flag is set even if the value is empty.
| // resolve claude model: CLI flag > config file > empty (use CLI default) | |
| claudeModel := req.Config.ClaudeModel | |
| if o.ClaudeModel != "" { | |
| claudeModel = o.ClaudeModel | |
| } | |
| // resolve review model: CLI flag > config file > claude_model > empty | |
| reviewModel := req.Config.ReviewModel | |
| if o.ReviewModel != "" { | |
| // detect whether a string flag was explicitly provided on the CLI, including empty values | |
| hasCLIStringFlag := func(name string) bool { | |
| long := "--" + name | |
| for _, arg := range os.Args[1:] { | |
| if arg == long || strings.HasPrefix(arg, long+"=") { | |
| return true | |
| } | |
| } | |
| return false | |
| } | |
| claudeModelSet := hasCLIStringFlag("claude-model") | |
| reviewModelSet := hasCLIStringFlag("review-model") | |
| // resolve claude model: CLI flag > config file > empty (use CLI default) | |
| claudeModel := req.Config.ClaudeModel | |
| if claudeModelSet { | |
| claudeModel = o.ClaudeModel | |
| } | |
| // resolve review model: CLI flag > config file > claude_model > empty | |
| reviewModel := req.Config.ReviewModel | |
| if reviewModelSet { |
| e.Run(context.Background(), "test") | ||
| assert.Contains(t, capturedArgs, "--model") | ||
| assert.Contains(t, capturedArgs, "sonnet") | ||
| }) | ||
|
|
||
| t.Run("model empty does not inject --model flag", func(t *testing.T) { | ||
| e := &ClaudeExecutor{cmdRunner: mock} | ||
| e.Run(context.Background(), "test") |
There was a problem hiding this comment.
This test ignores the Result returned by Run(). Asserting result.Error (e.g., require.NoError) would prevent false positives if Run starts returning an error while still capturing args, and keeps the test focused on argument injection.
| e.Run(context.Background(), "test") | |
| assert.Contains(t, capturedArgs, "--model") | |
| assert.Contains(t, capturedArgs, "sonnet") | |
| }) | |
| t.Run("model empty does not inject --model flag", func(t *testing.T) { | |
| e := &ClaudeExecutor{cmdRunner: mock} | |
| e.Run(context.Background(), "test") | |
| result := e.Run(context.Background(), "test") | |
| require.NoError(t, result.Error) | |
| assert.Contains(t, capturedArgs, "--model") | |
| assert.Contains(t, capturedArgs, "sonnet") | |
| }) | |
| t.Run("model empty does not inject --model flag", func(t *testing.T) { | |
| e := &ClaudeExecutor{cmdRunner: mock} | |
| result := e.Run(context.Background(), "test") | |
| require.NoError(t, result.Error) |
… findings - rename claude_model -> task_model and --claude-model -> --task-model for symmetry with review_model/--review-model - pass task model to plan mode (was missing) - strip duplicate --model flags from claude_args to prevent CLI errors - fix typo in pkg/config/defaults/config: review_model section had `# claude_model =` example - add processor test for ReviewClaude executor separation (nil fallback + distinct review executor) - add ReviewModel merge test, normalize --flag=value form in docs - update docs (README, CLAUDE.md, llms.txt, custom-providers.md) - update review-claude tests for post-codex-skip behavior added in #282 picks up unmerged follow-up commits from feat/claude-model-config branch
… findings (#284) * feat: rename claude_model to task_model, address unmerged #272 review findings - rename claude_model -> task_model and --claude-model -> --task-model for symmetry with review_model/--review-model - pass task model to plan mode (was missing) - strip duplicate --model flags from claude_args to prevent CLI errors - fix typo in pkg/config/defaults/config: review_model section had `# claude_model =` example - add processor test for ReviewClaude executor separation (nil fallback + distinct review executor) - add ReviewModel merge test, normalize --flag=value form in docs - update docs (README, CLAUDE.md, llms.txt, custom-providers.md) - update review-claude tests for post-codex-skip behavior added in #282 picks up unmerged follow-up commits from feat/claude-model-config branch * fix: address copilot review on PR 284 - stripFlag now handles --flag=value and bare --flag forms (was only handling --flag value space-form), prevents duplicate/orphaned --model in claude_args - add stripFlag tests for equals form, bare flag, repeated occurrences, prefix-only false-match - doc wording: clarify that task_model/review_model are passed as --model to the configured claude_command and may be ignored or implemented by custom wrappers
Summary
claude_modelandreview_modelconfig parameters to control which Claude model is used for task execution vs review phases--claude-modeland--review-modelCLI flags with standard precedence (CLI > local config > global config > default)review_modeldiffers fromclaude_model, used by all review phases (first review, fix loops, codex eval, finalize)--modelflag passed to Claude CLI)Motivation
Review phases run 5-7 parallel Claude agents that can iterate multiple times. Using the same model (e.g., Opus) for both tasks and reviews is unnecessarily expensive. This change allows using a capable model for tasks while using a faster/cheaper model for reviews.
Usage
# or via CLI flags ralphex --claude-model opus --review-model sonnet docs/plans/sprint-1.mdTest plan
claude_modelandreview_modelparsed from INI--modelflag injected when Model is set, absent when emptygo build ./...passesCloses #271