feat(ai): add Anthropic provider with chat parity (1/5)#1983
Conversation
Introduces Provider::Anthropic alongside Provider::Openai, implementing the LlmConcept chat_response contract over the official anthropic Ruby SDK. Batch ops, PDF, and RAG land in follow-up PRs. - Provider::Anthropic uses Messages API for sync and streaming responses - ChatConfig builds requests with ephemeral prompt-cache markers on the system prompt and the last tool definition - MessageFormatter reconstructs multi-turn history (text + tool_use + tool_result blocks) from raw Message records, including the paired user-role tool_result turn Anthropic requires after every tool_use - ChatParser maps Anthropic Message into the shared ChatResponse Data - Registry, Setting, User, Chat default model wired for ANTHROPIC_* envs and Setting.anthropic_*; LLM_PROVIDER selects between providers - Responder forwards raw conversation_history (Array<Message>) so providers without hosted conversation state can rebuild context - OpenAI provider accepts and ignores the new kwarg (no behavior change) Tests cover provider init, model gating, MessageFormatter for all turn shapes, ChatConfig request building (max_tokens, system cache, tool conversion), ChatParser for text / tool_use / mixed blocks, Registry discovery, and mocked chat_response success / error / function_request paths. Live VCR cassettes recorded in a follow-up with a real key. Stacked PRs: 2/5 batch ops + cost ledger, 3/5 PDF, 4/5 pgvector RAG, 5/5 settings UI + disclosure.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Anthropic as an LLM provider: dependency and settings, Anthropic provider implementation (client, model gating, chat orchestration, Langfuse/usage), request/response shaping (ChatConfig/MessageFormatter/ChatParser), registry + app routing updates, responder history handling, and comprehensive tests. ChangesAnthropic Provider Integration
Sequence Diagram(s)sequenceDiagram
participant Responder as Assistant::Responder
participant Anthropic as Provider::Anthropic
participant Config as Provider::Anthropic::ChatConfig
participant Formatter as Provider::Anthropic::MessageFormatter
participant Client as ::Anthropic::Client
participant Parser as Provider::Anthropic::ChatParser
participant Langfuse as Langfuse
Responder->>Anthropic: chat_response(prompt, messages:, conversation_history:)
Anthropic->>Config: build_request(model)
Config->>Formatter: build messages (from prompt, history, function_results)
Anthropic->>Langfuse: create trace
Anthropic->>Client: messages.create / messages.stream
Client-->>Anthropic: response + usage
Anthropic->>Parser: parse response -> ChatResponse
Anthropic->>Langfuse: end generation (success/error)
Anthropic-->>Responder: return parsed ChatResponse + usage
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1dbb51553
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
app/models/provider/anthropic/chat_config.rb (1)
78-81: ⚡ Quick winStrip
strictfor both symbol and string keyed schemas.Only deleting
:strictmisses"strict"keys and can forward OpenAI-only schema flags to Anthropic.♻️ Proposed fix
def anthropic_input_schema(schema) schema = schema.deep_dup - schema.delete(:strict) if schema.is_a?(Hash) + if schema.is_a?(Hash) + schema.delete(:strict) + schema.delete("strict") + end schema end🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/models/provider/anthropic/chat_config.rb` around lines 78 - 81, In anthropic_input_schema, the code only removes the symbol key :strict which leaves string-keyed "strict" present and can leak OpenAI-only flags to Anthropic; after deep_dup and the Hash check, delete both schema.delete(:strict) and schema.delete('strict') (or otherwise normalize keys and remove strict) so both symbol and string keyed schemas are stripped of the strict flag before returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/models/chat.rb`:
- Around line 58-62: The current selection uses Setting.llm_provider to pick a
model even if that provider lacks credentials; update the logic in the
model-selection block so it only returns Provider::Anthropic.effective_model
when Setting.llm_provider == "anthropic" AND the Anthropic provider is actually
configured (e.g. check Provider::Anthropic.credentials_present? or a similar
availability method), otherwise fall back to Provider::Openai.effective_model
(and vice‑versa: only return OpenAI model if Provider::Openai is configured);
modify the conditional around Setting.llm_provider to validate provider
availability before returning its model so downstream chat creation won't pick
an unavailable provider.
In `@app/models/provider/anthropic.rb`:
- Around line 75-87: The chat_response method in Provider::Anthropic (method
chat_response in app/models/provider/anthropic.rb) must accept the messages:
keyword because Assistant::Responder calls llm.chat_response(..., messages: ...,
conversation_history: ...); update the method signature to include messages: nil
(or appropriate default), ensure any internal logic uses or forwards that
messages argument (e.g., to the API call or to the conversation_history
handling), and propagate messages when delegating to helper methods so the call
no longer raises ArgumentError: unknown keyword: :messages.
In `@app/models/setting.rb`:
- Around line 13-16: The new secret field anthropic_access_token is currently
stored unencrypted; add :anthropic_access_token to the
EncryptedSettingFields::ENCRYPTED_FIELDS list so it is persisted encrypted
(update the constant/array where ENCRYPTED_FIELDS is declared), run or add any
necessary schema/migration or spec updates that assert encrypted fields include
anthropic_access_token, and ensure the existing default behavior using
ENV["ANTHROPIC_ACCESS_TOKEN"] / ENV["ANTHROPIC_API_KEY"] remains unchanged in
the Setting model.
In `@test/models/provider/anthropic_test.rb`:
- Around line 43-47: Provider::Anthropic.effective_model currently uses
ENV.fetch("ANTHROPIC_MODEL", Setting.anthropic_model) which eagerly calls
Setting.anthropic_model; change the implementation to avoid evaluating the
default when ENV is present (use ENV.fetch("ANTHROPIC_MODEL") {
Setting.anthropic_model } or explicit conditional with ENV.key?), and update the
test in test/models/provider/anthropic_test.rb so that inside
ClimateControl.modify("ANTHROPIC_MODEL" => "claude-haiku-4-5") you also assert
that Setting.anthropic_model is not invoked (e.g., expect(Setting).not_to
receive(:anthropic_model)) before calling Provider::Anthropic.effective_model to
harden the test.
---
Nitpick comments:
In `@app/models/provider/anthropic/chat_config.rb`:
- Around line 78-81: In anthropic_input_schema, the code only removes the symbol
key :strict which leaves string-keyed "strict" present and can leak OpenAI-only
flags to Anthropic; after deep_dup and the Hash check, delete both
schema.delete(:strict) and schema.delete('strict') (or otherwise normalize keys
and remove strict) so both symbol and string keyed schemas are stripped of the
strict flag before returning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: af0254e9-6d68-4426-b4aa-991f674874c0
⛔ Files ignored due to path filters (1)
Gemfile.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
Gemfileapp/models/assistant/responder.rbapp/models/chat.rbapp/models/provider/anthropic.rbapp/models/provider/anthropic/chat_config.rbapp/models/provider/anthropic/chat_parser.rbapp/models/provider/anthropic/message_formatter.rbapp/models/provider/llm_concept.rbapp/models/provider/openai.rbapp/models/provider/registry.rbapp/models/setting.rbapp/models/user.rbtest/models/provider/anthropic/chat_config_test.rbtest/models/provider/anthropic/chat_parser_test.rbtest/models/provider/anthropic/message_formatter_test.rbtest/models/provider/anthropic_test.rbtest/models/provider/registry_test.rb
Surface fixes raised by Codex + CodeRabbit on PR 1/5:
- Provider::Anthropic#chat_response now accepts (and ignores) a
`messages:` kwarg. Assistant::Responder passes both `messages:`
(OpenAI-shape) and `conversation_history:` (raw Message records) for
cross-provider parity, so the previous signature raised
ArgumentError on the first chat turn through the Anthropic provider.
- Provider::Anthropic#supports_model? bypasses the `claude` prefix
gate when a custom base_url is configured, mirroring the OpenAI
provider. Bedrock-shaped IDs like
`anthropic.claude-sonnet-4-5-20250929-v1:0` and
`claude-opus-4@20250514` are otherwise rejected by
Assistant::Provided#get_model_provider and the chat dies.
- Setting.anthropic_access_token is now in
EncryptedSettingFields::ENCRYPTED_FIELDS so the Anthropic API key
is encrypted at rest like every other provider secret. Previously
plaintext while siblings (openai_access_token, twelve_data_api_key,
external_assistant_token) were ciphertext.
- Chat.default_model falls back to whichever provider is actually
configured. Previously, with LLM_PROVIDER=anthropic but no
Anthropic credentials, the default model resolved to a Claude ID
that no registered provider supported, so chats failed even when
OpenAI was fully configured. Adds Provider::{Anthropic,Openai}#configured?
class methods for the readable callsite.
- Provider::Anthropic.effective_model uses
`ENV["ANTHROPIC_MODEL"].presence || Setting.anthropic_model` so the
Setting lookup is only performed when the env var is absent — the
previous `ENV.fetch(KEY, default)` evaluated the default arg
eagerly on every call.
- Provider::Anthropic::ChatConfig#anthropic_input_schema strips both
`:strict` and `"strict"` keys so JSON-decoded schemas with string
keys cannot leak the OpenAI-only flag through to Anthropic.
Test coverage added: supports_model? bypass on custom endpoints,
chat_response messages: kwarg compatibility, default_model fallback
in the three credential combinations, configured? against ENV +
Setting, strict-flag stripping for both key types, and a
`Setting.expects(:anthropic_model).never` assertion proving the
ENV-precedence test now exercises the lazy path.
All 4365 tests pass (1 pre-existing libvips env error unrelated).
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.
- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
ProviderMerchantEnhancer each define a single output tool whose
input_schema mirrors the desired output, then force the model to call
it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
Anthropic guarantees the tool_use.input matches the schema, so there
is no JSON parsing fragility, no <think> tag stripping, and no
json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
cache_creation_input_tokens / cache_read_input_tokens to dedicated
columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
$3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
ledger sees the same shape regardless of which provider ran a batch.
Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.
Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/models/provider/anthropic.rb (1)
170-194:⚠️ Potential issue | 🔴 CriticalFix Anthropic streaming event class constants
stream_chat_responsematches::Anthropic::Streaming::TextEvent/::Anthropic::Streaming::MessageStopEvent, butclient.messages.streamyieldsAnthropic::Helpers::Streaming::TextEventandAnthropic::Helpers::Streaming::MessageStopEvent. Update the constants so thecasebranches fire and streaming doesn’t fail to recognize events (app/models/provider/anthropic.rb:170-194).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/models/provider/anthropic.rb` around lines 170 - 194, In stream_chat_response, the case currently checks ::Anthropic::Streaming::TextEvent and ::Anthropic::Streaming::MessageStopEvent but client.messages.stream yields Anthropic::Helpers::Streaming events; update the event class references in the case to Anthropic::Helpers::Streaming::TextEvent and Anthropic::Helpers::Streaming::MessageStopEvent so the when branches in stream_chat_response (which iterates stream.each from client.messages.stream) match and handle events correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/models/chat_test.rb`:
- Around line 65-86: The tests assume Chat.default_model equals
Provider::<X>::DEFAULT_MODEL but Chat.default_model prefers
Provider::<X>::effective_model when set; make the tests deterministic by either
stubbing the provider effective_model to return the DEFAULT_MODEL or clearing
any env/model overrides before asserting. Specifically, in the tests referencing
Chat.default_model (the three cases using Provider::Anthropic::DEFAULT_MODEL and
Provider::Openai::DEFAULT_MODEL), stub Provider::Anthropic.effective_model and
Provider::Openai.effective_model to return nil (or to return the corresponding
DEFAULT_MODEL) so the method falls back predictably, or explicitly clear
OPENAI_MODEL/ANTHROPIC_MODEL values via stubs/mocks prior to the assert. Ensure
you modify the tests that set Setting.stubs(:llm_provider) and
Provider::<...>.stubs(:configured?) to also control
Provider::<...>.stubs(:effective_model) to avoid intermittent failures.
---
Outside diff comments:
In `@app/models/provider/anthropic.rb`:
- Around line 170-194: In stream_chat_response, the case currently checks
::Anthropic::Streaming::TextEvent and ::Anthropic::Streaming::MessageStopEvent
but client.messages.stream yields Anthropic::Helpers::Streaming events; update
the event class references in the case to
Anthropic::Helpers::Streaming::TextEvent and
Anthropic::Helpers::Streaming::MessageStopEvent so the when branches in
stream_chat_response (which iterates stream.each from client.messages.stream)
match and handle events correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 33a04125-557d-4510-9928-75a3d949397c
📒 Files selected for processing (8)
app/models/chat.rbapp/models/provider/anthropic.rbapp/models/provider/anthropic/chat_config.rbapp/models/provider/openai.rbapp/models/setting.rbtest/models/chat_test.rbtest/models/provider/anthropic/chat_config_test.rbtest/models/provider/anthropic_test.rb
CodeRabbit flagged on PR review: the new default_model tests asserted against Provider::*::DEFAULT_MODEL, but Chat.default_model actually returns Provider::*.effective_model.presence (which reads OPENAI_MODEL / ANTHROPIC_MODEL from the environment). With either env var set, the tests would fail intermittently even though routing was correct. - New default_model tests now assert against the provider's effective_model directly, so they verify the routing decision (which provider's value wins) without coupling to the constant. - Pre-existing "creates with default model" assertions had the same brittleness; switch them to compare against Chat.default_model so the chosen model is whatever the env / Setting cascade resolves to. Verified by running `ANTHROPIC_MODEL=claude-haiku-4-5 OPENAI_MODEL=gpt-4o bin/rails test test/models/chat_test.rb` — 16 runs, 0 failures (previously 2 pre-existing failures + 0 from the new tests).
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.
- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
ProviderMerchantEnhancer each define a single output tool whose
input_schema mirrors the desired output, then force the model to call
it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
Anthropic guarantees the tool_use.input matches the schema, so there
is no JSON parsing fragility, no <think> tag stripping, and no
json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
cache_creation_input_tokens / cache_read_input_tokens to dedicated
columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
$3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
ledger sees the same shape regardless of which provider ran a batch.
Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.
Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
- Provider::Anthropic#supports_pdf_processing? bypasses prefix gate for custom endpoints, mirroring supports_model? - Provider::Anthropic#initialize raises Error when custom_endpoint? AND model.blank?, parity with Provider::Openai - stream_chat_response captures partial usage on mid-stream errors and records it via the new on_partial callback so chat_response can skip the duplicate error row in the outer rescue - safe_accumulated_message swallows the secondary failure when the SDK cannot reconstruct a snapshot - langfuse_client memoizes properly (||= instead of =) so repeated calls don't churn Langfuse instances - MessageFormatter sorts tool_calls by created_at then id so the message array is deterministic across replays; skips tool_calls missing both provider_call_id and provider_id rather than sending `id: nil` and getting rejected by Anthropic - Setting.anthropic_access_token default falls back through ENV["ANTHROPIC_API_KEY"].presence (was missing .presence, so an empty-string env value bled through) - User#openai_configured? / #anthropic_configured? delegate to the Provider::* class methods — single source of truth - Assistant::Responder renames the OpenAI-shape history builder conversation_history → openai_messages_payload so the kwarg name matches the local method name (messages: openai_messages_payload, conversation_history: chat_message_records) - Assistant::Builtin stale-history comment updated to reference both builders Adds a streaming chat_response test using ad-hoc subclasses of the SDK event types so the case/when dispatch matches via is_a? without stubbing class-level === behavior.
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.
- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
ProviderMerchantEnhancer each define a single output tool whose
input_schema mirrors the desired output, then force the model to call
it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
Anthropic guarantees the tool_use.input matches the schema, so there
is no JSON parsing fragility, no <think> tag stripping, and no
json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
cache_creation_input_tokens / cache_read_input_tokens to dedicated
columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
$3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
ledger sees the same shape regardless of which provider ran a batch.
Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.
Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
|
Docs PR opened: we-promise/docs#82 Added documentation for the new Anthropic LLM provider, including configuration, environment variables, supported models, and provider selection via LLM_PROVIDER. |
|
Solid foundation PR. A few observations: Message format handling: The Prompt caching: Attaching
VCR cassettes: Noted as a follow-up requiring a real Stacked PRs (#1984–#1987): All currently in draft — appropriate while #1983 awaits review. The series is well-structured with each PR independently green. Generated by Claude Code |
Addresses @jjmata's "worth confirming" note on PR #1983: tool-use turns from prior assistant messages must round-trip correctly when retrieved from the database. - New `ChatParser → ToolCall::Function → MessageFormatter` test walks the full path: Anthropic response with a tool_use block → ChatFunctionRequest → ToolCall::Function.from_function_request → persisted on the AssistantMessage → MessageFormatter rebuild on the next turn. Asserts the original `tool_use.id` is preserved end-to-end as both `tool_use.id` and the paired `tool_result.tool_use_id`, and that the original `input` hash and serialized result content survive. - New multi-tool assistant turn test confirms two tool_use blocks on a single assistant message render as two tool_use blocks followed by two paired tool_result blocks in a single user-role follow-up, matching Anthropic's required alternation. Both tests exercise the existing PR1 code without behavior changes.
|
Thanks for the review, @jjmata. Quick responses on each point: Round-trip confirmation — addressed in
VCR cassettes — yes, slated as a follow-up bundled with #1985 (PDF tests need a real PDF + key too). Mocked client coverage is the bar for the foundation PR. Stacked PRs all rebased and pushed earlier today; each independently green. |
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.
- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
ProviderMerchantEnhancer each define a single output tool whose
input_schema mirrors the desired output, then force the model to call
it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
Anthropic guarantees the tool_use.input matches the schema, so there
is no JSON parsing fragility, no <think> tag stripping, and no
json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
cache_creation_input_tokens / cache_read_input_tokens to dedicated
columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
$3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
ledger sees the same shape regardless of which provider ran a batch.
Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.
Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/models/provider/anthropic/message_formatter_test.rb`:
- Around line 94-100: The test uses OpenStruct (see anthropic_response in
test/models/provider/anthropic/message_formatter_test.rb) but doesn’t require
it; add require "ostruct" at the top of that test file so OpenStruct is always
defined (place the require above the test class or before the anthropic_response
fixture/usage).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8eda0b17-1836-4ff3-b8c6-2ae23ea43747
📒 Files selected for processing (1)
test/models/provider/anthropic/message_formatter_test.rb
|
Three functional blockers still open after the latest commits — flagging for resolution before merge: 1. Streaming class namespace mismatch (silent failure) 2. 3. None of these are blocked by missing VCR cassettes — they're wiring bugs that should be reproducible with existing stubs. Generated by Claude Code |
OpenStruct is moving out of Ruby's default load path (warning in 3.4+, removed in 3.5+). Tests work today because ActiveSupport transitively loads it, but that's incidental. Match the existing convention in test/controllers/settings/hostings_controller_test.rb which explicitly requires ostruct for the same reason.
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.
- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
ProviderMerchantEnhancer each define a single output tool whose
input_schema mirrors the desired output, then force the model to call
it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
Anthropic guarantees the tool_use.input matches the schema, so there
is no JSON parsing fragility, no <think> tag stripping, and no
json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
cache_creation_input_tokens / cache_read_input_tokens to dedicated
columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
$3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
ledger sees the same shape regardless of which provider ran a batch.
Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.
Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
|
@sure-design Thanks for the second pass. Walking through each — all three are already resolved on the current tip ( 1. Streaming class namespace mismatch — this isn't actually a mismatch. The module Anthropic
Streaming = Anthropic::Helpers::Streaming
end
Empirical confirmation: 2. def chat_response(
prompt,
model:,
instructions: nil,
functions: [],
function_results: [],
messages: nil, # ← accepted (ignored, Anthropic uses conversation_history)
conversation_history: [],
...
)Regression test at 3. Worth re-checking against |
|
@coderabbitai review again plz |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
app/models/assistant/responder.rb (1)
123-132: ⚡ Quick winAvoid double-querying chat history for both payload builders.
Line 126 and Line 141 run nearly the same query; this adds avoidable overhead every request. Reuse one fetched record set and derive both shapes from it.
♻️ Proposed refactor
+ def complete_chat_messages + return [] unless chat&.messages + `@complete_chat_messages` ||= chat.messages + .where(type: [ "UserMessage", "AssistantMessage" ], status: "complete") + .includes(:tool_calls) + .ordered + .to_a + end + def chat_message_records - return [] unless chat&.messages - - chat.messages - .where(type: [ "UserMessage", "AssistantMessage" ], status: "complete") - .where.not(id: message.id) - .includes(:tool_calls) - .ordered - .to_a + complete_chat_messages.reject { |m| m.id == message.id } end def openai_messages_payload messages = [] - return messages unless chat&.messages - - chat.messages - .where(type: [ "UserMessage", "AssistantMessage" ], status: "complete") - .includes(:tool_calls) - .ordered - .each do |chat_message| + complete_chat_messages.each do |chat_message| if chat_message.tool_calls.any? messages << {Also applies to: 137-145
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/models/assistant/responder.rb` around lines 123 - 132, The chat_message_records method currently causes two nearly identical DB queries by re-fetching chat.messages for both payload builders; instead, fetch the records once into a local variable (using chat.messages.where(type: ["UserMessage","AssistantMessage"], status: "complete").where.not(id: message.id).includes(:tool_calls).ordered.to_a) inside chat_message_records (or the caller), then pass or derive the two required shapes from that single in-memory array rather than re-running chat.messages for each payload builder; update any callers that expect separate queries to accept the cached array and compute their specific shapes from it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/models/provider/anthropic.rb`:
- Around line 265-266: The current logs call Rails.logger.warn with
e.full_message which can expose sensitive request/output payloads; update the
logging in the Langfuse trace creation/error handlers (the Rails.logger.warn
calls that reference e.full_message and e.message) to only log the exception
class, sanitized message, and a trace identifier (if available) without the full
backtrace or payload, e.g., replace usage of e.full_message with a short
sanitized string (exception class + e.message + trace_id) and remove any raw
payload/backtrace printing; apply the same change for the other occurrences
around the Rails.logger.warn calls at the later error handlers mentioned (the
similar calls near lines 289-290 and 300-301).
In `@app/models/provider/anthropic/message_formatter.rb`:
- Around line 114-123: parse_arguments can return non-Hash values
(scalars/arrays) which makes Anthropic tool_use.input invalid; modify
parse_arguments so it always returns a Hash: compute the parsed result (handle
nil, Hash, String with JSON.parse/rescue, and other types), then if the result
is not a Hash wrap it as a Hash like { "input" => result } (or symbolized :input
if codebase prefers), and return that Hash; update parse_arguments to perform
this normalization so tool_use.input is always a JSON object.
---
Nitpick comments:
In `@app/models/assistant/responder.rb`:
- Around line 123-132: The chat_message_records method currently causes two
nearly identical DB queries by re-fetching chat.messages for both payload
builders; instead, fetch the records once into a local variable (using
chat.messages.where(type: ["UserMessage","AssistantMessage"], status:
"complete").where.not(id: message.id).includes(:tool_calls).ordered.to_a) inside
chat_message_records (or the caller), then pass or derive the two required
shapes from that single in-memory array rather than re-running chat.messages for
each payload builder; update any callers that expect separate queries to accept
the cached array and compute their specific shapes from it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eac52cd6-9fb5-4fa9-bffe-84b46d04b0ae
⛔ Files ignored due to path filters (1)
Gemfile.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
Gemfileapp/models/assistant/builtin.rbapp/models/assistant/responder.rbapp/models/chat.rbapp/models/provider/anthropic.rbapp/models/provider/anthropic/chat_config.rbapp/models/provider/anthropic/chat_parser.rbapp/models/provider/anthropic/message_formatter.rbapp/models/provider/llm_concept.rbapp/models/provider/openai.rbapp/models/provider/registry.rbapp/models/setting.rbapp/models/user.rbtest/models/chat_test.rbtest/models/provider/anthropic/chat_config_test.rbtest/models/provider/anthropic/chat_parser_test.rbtest/models/provider/anthropic/message_formatter_test.rbtest/models/provider/anthropic_test.rbtest/models/provider/registry_test.rb
… history fetch Addresses three open CodeRabbit findings on PR #1983. - Provider::Anthropic Langfuse rescue branches no longer include `e.full_message` in `Rails.logger.warn`. `full_message` bundles the backtrace + cause chain and on some SDK error types includes the serialized request/response payload (prompt, model output). Logs now report `#{e.class}: #{e.message}` only. Three sites: create_langfuse_trace, log_langfuse_generation, upsert_langfuse_trace. Note: Provider::Openai has the same pattern (copy-pasted source) — harmonization deferred to a follow-up cleanup PR; this commit fixes only the Anthropic provider to keep PR scope tight. - MessageFormatter#parse_arguments now coerces any non-Hash parsed result to `{}`. Anthropic's Messages API requires `tool_use.input` to be a JSON object (map); a stored ToolCall::Function record whose arguments parse to a scalar, bool, or array (corrupt row, legacy data, cross-provider bleed) would otherwise produce a payload the API rejects. Normal flow stores Hash arguments end-to-end so the fix is defensive — adds 2 tests covering scalar/array JSON strings and non-String non-Hash inputs. - Assistant::Responder dedups the chat-history fetch. The previous layout fired two near-identical `chat.messages.where(...).includes( :tool_calls).ordered` queries per LLM turn (one for the OpenAI-shape payload, one for the raw-records kwarg). A new memoized `complete_chat_messages` fetches once; `chat_message_records` filters out the current message via `Array#reject`, `openai_messages_payload` iterates the cached array unchanged. One SQL query per turn instead of two. Memoization scope = single Responder instance (per LLM call), so cache invalidation is not a concern. All 4370 tests pass (1 pre-existing libvips env error unrelated). Rubocop + brakeman clean.
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.
- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
ProviderMerchantEnhancer each define a single output tool whose
input_schema mirrors the desired output, then force the model to call
it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
Anthropic guarantees the tool_use.input matches the schema, so there
is no JSON parsing fragility, no <think> tag stripping, and no
json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
cache_creation_input_tokens / cache_read_input_tokens to dedicated
columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
$3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
ledger sees the same shape regardless of which provider ran a batch.
Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.
Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
|
Latest review round addressed in
Stack rebased onto the latest |
Summary
Introduces
Provider::Anthropicas a first-class LLM provider alongsideProvider::Openai, implementing theLlmConcept#chat_responsecontract over the officialanthropicRuby SDK. Chat (text + streaming + tool use) is at parity. Batch ops, PDF, and RAG land in stacked follow-up PRs.This is PR 1 of 5 in the Anthropic integration series. Each PR ships independently green.
Why Anthropic
The Sure assistant currently only speaks to OpenAI. Anthropic offers (a) native PDF understanding without rasterization, (b) explicit prompt-cache control, (c) competitive Sonnet pricing for chat and Haiku for batch ops, and (d) an alternative for users who already standardize on Claude. Adding a parallel provider gives operators a choice without locking anyone in.
What's in this PR
New code
Provider::Anthropic— uses Messages API for sync + streaming responses, wraps SDK errors asProvider::Anthropic::Error, records LLM usage with provider"anthropic", threads Langfuse traces with the same span name (chat_response) so dashboards stay comparable across providers.Provider::Anthropic::ChatConfig— builds Messages API requests; tags the system prompt and the final tool definition withcache_control: { type: "ephemeral" }(Anthropic's prompt caching).Provider::Anthropic::MessageFormatter— translatesChat#conversation_messagesinto Anthropic's content-block schema. Each historical assistant turn that used tools is rebuilt as the requiredtool_useblock + paireduser-roletool_resultblock before the next assistant turn. In-flightfunction_results(the responder's second-leg call) get the same treatment.Provider::Anthropic::ChatParser— converts the SDK'sMessage(or a hash-shaped equivalent) into the sharedProvider::LlmConcept::ChatResponseData struct, surfacing both text blocks andtool_useblocks as function requests.Wiring
Provider::Registry— newanthropicfactory; readsANTHROPIC_ACCESS_TOKEN/ANTHROPIC_API_KEY/Setting.anthropic_access_token, plus optionalANTHROPIC_BASE_URL(for Bedrock / Vertex proxies) andANTHROPIC_MODEL. Listed under both:llmand the default available_providers set so existingsupports_model?routing picks the right provider forclaude-*models.Setting— addsanthropic_access_token,anthropic_model,anthropic_base_url, andllm_provider(default"openai") fields.User#ai_available?— now returns true when either OpenAI or Anthropic is configured; exposesopenai_configured?/anthropic_configured?for consumers.Chat.default_model— resolves fromSetting.llm_provider, so installs that flip the provider don't have to update every chat default.Assistant::Responder— passes a newconversation_history:kwarg (rawArray<Message>) alongside the existingmessages:(OpenAI-shaped) so providers without hosted conversation state can rebuild context. The OpenAI provider accepts and ignores the new kwarg — no behavior change.Provider::LlmConcept— signature extended to document the new kwarg.What's not in this PR (intentional)
auto_categorize,auto_detect_merchants,enhance_provider_merchants— raise a clearnot yet implementederror on this provider. PR 2/5 lands these via forced tool calls (tool_choice: {type: "tool", name: ...}) plus the cost-ledger migration.process_pdf,extract_bank_statement— same story. PR 3/5 uses nativedocumentcontent blocks.Vector::Store::*/ RAG — PR 4/5 ships a pgvector-backed store so RAG works without Anthropic needing a hosted vector store._openai_settings.html.erband adds the data-retention disclosure (parity with OpenAI's panel).chat_response. Cassettes need to be recorded with a realANTHROPIC_API_KEY; mocked tests cover the wiring in the meantime.Configuration (new env vars)
ANTHROPIC_ACCESS_TOKEN(or)ANTHROPIC_API_KEYSetting.anthropic_access_tokenANTHROPIC_MODELSetting.anthropic_modelclaude-sonnet-4-6ANTHROPIC_BASE_URLSetting.anthropic_base_urlANTHROPIC_REQUEST_TIMEOUTANTHROPIC_MAX_TOKENSLLM_PROVIDERSetting.llm_provideropenaiTest plan
bin/rails test test/models/provider/anthropic_test.rb— 8 tests, init / supports_model / supports_pdf_processing / effective_model / chat_response success + error + tool_use pathsbin/rails test test/models/provider/anthropic/— 14 tests across ChatConfig, ChatParser, MessageFormatterbin/rails test test/models/provider/registry_test.rb— 8 tests, includes new:anthropicfactory + env-empty fallback to Settingbin/rails test test/models/provider/openai_test.rb— 27 tests, unchangedbin/rubocopclean on all touched filesbin/brakeman --no-pager— 0 security warningsANTHROPIC_API_KEY)Provider::Anthropic(follow-up — bundled with cassettes)Compliance notes
anthropicgem is MIT-licensed, compatible with Sure's AGPLv3.Summary by CodeRabbit
New Features
Behavior Changes
Chores
Tests