Skip to content

feat(ai): add Anthropic provider with chat parity (1/5)#1983

Open
gariasf wants to merge 8 commits into
mainfrom
feature/anthropic-foundation
Open

feat(ai): add Anthropic provider with chat parity (1/5)#1983
gariasf wants to merge 8 commits into
mainfrom
feature/anthropic-foundation

Conversation

@gariasf
Copy link
Copy Markdown
Collaborator

@gariasf gariasf commented May 25, 2026

Summary

Introduces Provider::Anthropic as a first-class LLM provider alongside Provider::Openai, implementing the LlmConcept#chat_response contract over the official anthropic Ruby SDK. Chat (text + streaming + tool use) is at parity. Batch ops, PDF, and RAG land in stacked follow-up PRs.

This is PR 1 of 5 in the Anthropic integration series. Each PR ships independently green.

Why Anthropic

The Sure assistant currently only speaks to OpenAI. Anthropic offers (a) native PDF understanding without rasterization, (b) explicit prompt-cache control, (c) competitive Sonnet pricing for chat and Haiku for batch ops, and (d) an alternative for users who already standardize on Claude. Adding a parallel provider gives operators a choice without locking anyone in.

What's in this PR

New code

  • Provider::Anthropic — uses Messages API for sync + streaming responses, wraps SDK errors as Provider::Anthropic::Error, records LLM usage with provider "anthropic", threads Langfuse traces with the same span name (chat_response) so dashboards stay comparable across providers.
  • Provider::Anthropic::ChatConfig — builds Messages API requests; tags the system prompt and the final tool definition with cache_control: { type: "ephemeral" } (Anthropic's prompt caching).
  • Provider::Anthropic::MessageFormatter — translates Chat#conversation_messages into Anthropic's content-block schema. Each historical assistant turn that used tools is rebuilt as the required tool_use block + paired user-role tool_result block before the next assistant turn. In-flight function_results (the responder's second-leg call) get the same treatment.
  • Provider::Anthropic::ChatParser — converts the SDK's Message (or a hash-shaped equivalent) into the shared Provider::LlmConcept::ChatResponse Data struct, surfacing both text blocks and tool_use blocks as function requests.

Wiring

  • Provider::Registry — new anthropic factory; reads ANTHROPIC_ACCESS_TOKEN / ANTHROPIC_API_KEY / Setting.anthropic_access_token, plus optional ANTHROPIC_BASE_URL (for Bedrock / Vertex proxies) and ANTHROPIC_MODEL. Listed under both :llm and the default available_providers set so existing supports_model? routing picks the right provider for claude-* models.
  • Setting — adds anthropic_access_token, anthropic_model, anthropic_base_url, and llm_provider (default "openai") fields.
  • User#ai_available? — now returns true when either OpenAI or Anthropic is configured; exposes openai_configured? / anthropic_configured? for consumers.
  • Chat.default_model — resolves from Setting.llm_provider, so installs that flip the provider don't have to update every chat default.
  • Assistant::Responder — passes a new conversation_history: kwarg (raw Array<Message>) alongside the existing messages: (OpenAI-shaped) so providers without hosted conversation state can rebuild context. The OpenAI provider accepts and ignores the new kwarg — no behavior change.
  • Provider::LlmConcept — signature extended to document the new kwarg.

What's not in this PR (intentional)

  • auto_categorize, auto_detect_merchants, enhance_provider_merchants — raise a clear not yet implemented error on this provider. PR 2/5 lands these via forced tool calls (tool_choice: {type: "tool", name: ...}) plus the cost-ledger migration.
  • process_pdf, extract_bank_statement — same story. PR 3/5 uses native document content blocks.
  • Vector::Store::* / RAG — PR 4/5 ships a pgvector-backed store so RAG works without Anthropic needing a hosted vector store.
  • ❌ Settings UI panel — PR 5/5 mirrors _openai_settings.html.erb and adds the data-retention disclosure (parity with OpenAI's panel).
  • ❌ Live VCR cassettes for chat_response. Cassettes need to be recorded with a real ANTHROPIC_API_KEY; mocked tests cover the wiring in the meantime.

Configuration (new env vars)

Env Setting field Default
ANTHROPIC_ACCESS_TOKEN (or) ANTHROPIC_API_KEY Setting.anthropic_access_token nil
ANTHROPIC_MODEL Setting.anthropic_model claude-sonnet-4-6
ANTHROPIC_BASE_URL Setting.anthropic_base_url nil (uses api.anthropic.com)
ANTHROPIC_REQUEST_TIMEOUT 600
ANTHROPIC_MAX_TOKENS 4096
LLM_PROVIDER Setting.llm_provider openai

Test plan

  • bin/rails test test/models/provider/anthropic_test.rb — 8 tests, init / supports_model / supports_pdf_processing / effective_model / chat_response success + error + tool_use paths
  • bin/rails test test/models/provider/anthropic/ — 14 tests across ChatConfig, ChatParser, MessageFormatter
  • bin/rails test test/models/provider/registry_test.rb — 8 tests, includes new :anthropic factory + env-empty fallback to Setting
  • bin/rails test test/models/provider/openai_test.rb — 27 tests, unchanged
  • Full suite: 4358 runs, 18008 assertions, 0 failures, 26 pre-existing skips, 1 pre-existing env error (libvips unavailable on the dev host)
  • bin/rubocop clean on all touched files
  • bin/brakeman --no-pager — 0 security warnings
  • Record live VCR cassettes (follow-up — needs ANTHROPIC_API_KEY)
  • System test exercising a chat turn through Provider::Anthropic (follow-up — bundled with cassettes)

Compliance notes

  • Anthropic's standard tier does not train on API inputs; 30-day Trust & Safety retention. The settings UI in PR 5/5 surfaces this alongside OpenAI's equivalent disclosure.
  • The financial-advice disclaimer already shown in the chat applies equally to either provider.
  • anthropic gem is MIT-licensed, compatible with Sure's AGPLv3.
  • OpenAI integration is untouched.

Summary by CodeRabbit

  • New Features

    • Anthropic added as an alternative LLM provider with configurable credentials and optional custom endpoint.
  • Behavior Changes

    • Default LLM selection now respects the deploy-time provider setting and prefers the configured provider.
    • Chat handling now includes prior conversation history to improve responses.
    • User AI availability now reflects either OpenAI or Anthropic being configured.
  • Chores

    • Added Anthropic client dependency to the project.
  • Tests

    • Expanded Anthropic-related test coverage (configuration, parsing, formatting, streaming).

Review Change Stack

Introduces Provider::Anthropic alongside Provider::Openai, implementing
the LlmConcept chat_response contract over the official anthropic Ruby
SDK. Batch ops, PDF, and RAG land in follow-up PRs.

- Provider::Anthropic uses Messages API for sync and streaming responses
- ChatConfig builds requests with ephemeral prompt-cache markers on the
  system prompt and the last tool definition
- MessageFormatter reconstructs multi-turn history (text + tool_use +
  tool_result blocks) from raw Message records, including the paired
  user-role tool_result turn Anthropic requires after every tool_use
- ChatParser maps Anthropic Message into the shared ChatResponse Data
- Registry, Setting, User, Chat default model wired for ANTHROPIC_*
  envs and Setting.anthropic_*; LLM_PROVIDER selects between providers
- Responder forwards raw conversation_history (Array<Message>) so
  providers without hosted conversation state can rebuild context
- OpenAI provider accepts and ignores the new kwarg (no behavior change)

Tests cover provider init, model gating, MessageFormatter for all turn
shapes, ChatConfig request building (max_tokens, system cache, tool
conversion), ChatParser for text / tool_use / mixed blocks, Registry
discovery, and mocked chat_response success / error / function_request
paths. Live VCR cassettes recorded in a follow-up with a real key.

Stacked PRs: 2/5 batch ops + cost ledger, 3/5 PDF, 4/5 pgvector RAG,
5/5 settings UI + disclosure.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Anthropic as an LLM provider: dependency and settings, Anthropic provider implementation (client, model gating, chat orchestration, Langfuse/usage), request/response shaping (ChatConfig/MessageFormatter/ChatParser), registry + app routing updates, responder history handling, and comprehensive tests.

Changes

Anthropic Provider Integration

Layer / File(s) Summary
Dependency and settings setup
Gemfile, app/models/setting.rb
Adds anthropic gem (~> 1.0) and new anthropic_access_token, anthropic_model, anthropic_base_url settings; makes llm_provider default configurable via ENV["LLM_PROVIDER"] and adds anthropic_access_token to encrypted fields.
Provider contract: conversation_history param
app/models/provider/llm_concept.rb, app/models/provider/openai.rb
Adds conversation_history: [] keyword to Provider::LlmConcept#chat_response and adds Provider::Openai.configured?/signature update for compatibility.
Provider registry and discovery
app/models/provider/registry.rb, test/models/provider/registry_test.rb
Registers Anthropic in the :llm providers list and constructs Provider::Anthropic from ANTHROPIC_* env vars or Setting fallbacks; extends registry tests to cover Anthropic credential resolution.
Anthropic provider implementation
app/models/provider/anthropic.rb
Adds Provider::Anthropic with client initialization, effective_model/configured?, model-support gating, chat_response orchestration (sync/stream), Langfuse logging, usage normalization/persistence, and error handling.
ChatConfig / MessageFormatter / ChatParser
app/models/provider/anthropic/chat_config.rb, app/models/provider/anthropic/message_formatter.rb, app/models/provider/anthropic/chat_parser.rb
Builds Anthropic request (system_, tools, messages), formats conversation history and function results into Anthropic blocks, strips OpenAI-only strict from schemas, and parses Anthropic responses into a universal ChatResponse including function_requests.
Application-level routing and history
app/models/chat.rb, app/models/assistant/responder.rb, app/models/assistant/builtin.rb
Chat.default_model now selects provider defaults based on Setting.llm_provider and each provider’s configured?; Assistant::Responder forwards raw chat message records as conversation_history and uses OpenAI-shaped messages payloads; minor builtin comment update.
User availability & integration tests
app/models/user.rb, test/models/chat_test.rb
User#ai_available? now checks either provider configured; chat tests updated to use Chat.default_model and verify provider-selection routing.
Anthropic provider tests
test/models/provider/anthropic/*
Adds tests for ChatConfig request building, MessageFormatter history/tool/result behavior, ChatParser parsing of text/tool_use blocks, Provider::Anthropic config/model support, error wrapping, streaming behavior, usage recording, and tool-use → function_request mapping.

Sequence Diagram(s)

sequenceDiagram
  participant Responder as Assistant::Responder
  participant Anthropic as Provider::Anthropic
  participant Config as Provider::Anthropic::ChatConfig
  participant Formatter as Provider::Anthropic::MessageFormatter
  participant Client as ::Anthropic::Client
  participant Parser as Provider::Anthropic::ChatParser
  participant Langfuse as Langfuse

  Responder->>Anthropic: chat_response(prompt, messages:, conversation_history:)
  Anthropic->>Config: build_request(model)
  Config->>Formatter: build messages (from prompt, history, function_results)
  Anthropic->>Langfuse: create trace
  Anthropic->>Client: messages.create / messages.stream
  Client-->>Anthropic: response + usage
  Anthropic->>Parser: parse response -> ChatResponse
  Anthropic->>Langfuse: end generation (success/error)
  Anthropic-->>Responder: return parsed ChatResponse + usage
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • jjmata
  • sokie

"🐰 In Claude's new glade I hop and sing,
A second path for chat this change will bring.
History in paws, messages neat and clear,
Two providers now, both ready to hear. 🥕"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change as adding an Anthropic provider with chat feature parity and correctly marks this as part 1 of a multi-part series.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/anthropic-foundation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gariasf gariasf marked this pull request as ready for review May 25, 2026 15:44
@superagent-security superagent-security Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 25, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1dbb51553

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread app/models/provider/anthropic.rb
Comment thread app/models/setting.rb Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
app/models/provider/anthropic/chat_config.rb (1)

78-81: ⚡ Quick win

Strip strict for both symbol and string keyed schemas.

Only deleting :strict misses "strict" keys and can forward OpenAI-only schema flags to Anthropic.

♻️ Proposed fix
     def anthropic_input_schema(schema)
       schema = schema.deep_dup
-      schema.delete(:strict) if schema.is_a?(Hash)
+      if schema.is_a?(Hash)
+        schema.delete(:strict)
+        schema.delete("strict")
+      end
       schema
     end
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/models/provider/anthropic/chat_config.rb` around lines 78 - 81, In
anthropic_input_schema, the code only removes the symbol key :strict which
leaves string-keyed "strict" present and can leak OpenAI-only flags to
Anthropic; after deep_dup and the Hash check, delete both schema.delete(:strict)
and schema.delete('strict') (or otherwise normalize keys and remove strict) so
both symbol and string keyed schemas are stripped of the strict flag before
returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/models/chat.rb`:
- Around line 58-62: The current selection uses Setting.llm_provider to pick a
model even if that provider lacks credentials; update the logic in the
model-selection block so it only returns Provider::Anthropic.effective_model
when Setting.llm_provider == "anthropic" AND the Anthropic provider is actually
configured (e.g. check Provider::Anthropic.credentials_present? or a similar
availability method), otherwise fall back to Provider::Openai.effective_model
(and vice‑versa: only return OpenAI model if Provider::Openai is configured);
modify the conditional around Setting.llm_provider to validate provider
availability before returning its model so downstream chat creation won't pick
an unavailable provider.

In `@app/models/provider/anthropic.rb`:
- Around line 75-87: The chat_response method in Provider::Anthropic (method
chat_response in app/models/provider/anthropic.rb) must accept the messages:
keyword because Assistant::Responder calls llm.chat_response(..., messages: ...,
conversation_history: ...); update the method signature to include messages: nil
(or appropriate default), ensure any internal logic uses or forwards that
messages argument (e.g., to the API call or to the conversation_history
handling), and propagate messages when delegating to helper methods so the call
no longer raises ArgumentError: unknown keyword: :messages.

In `@app/models/setting.rb`:
- Around line 13-16: The new secret field anthropic_access_token is currently
stored unencrypted; add :anthropic_access_token to the
EncryptedSettingFields::ENCRYPTED_FIELDS list so it is persisted encrypted
(update the constant/array where ENCRYPTED_FIELDS is declared), run or add any
necessary schema/migration or spec updates that assert encrypted fields include
anthropic_access_token, and ensure the existing default behavior using
ENV["ANTHROPIC_ACCESS_TOKEN"] / ENV["ANTHROPIC_API_KEY"] remains unchanged in
the Setting model.

In `@test/models/provider/anthropic_test.rb`:
- Around line 43-47: Provider::Anthropic.effective_model currently uses
ENV.fetch("ANTHROPIC_MODEL", Setting.anthropic_model) which eagerly calls
Setting.anthropic_model; change the implementation to avoid evaluating the
default when ENV is present (use ENV.fetch("ANTHROPIC_MODEL") {
Setting.anthropic_model } or explicit conditional with ENV.key?), and update the
test in test/models/provider/anthropic_test.rb so that inside
ClimateControl.modify("ANTHROPIC_MODEL" => "claude-haiku-4-5") you also assert
that Setting.anthropic_model is not invoked (e.g., expect(Setting).not_to
receive(:anthropic_model)) before calling Provider::Anthropic.effective_model to
harden the test.

---

Nitpick comments:
In `@app/models/provider/anthropic/chat_config.rb`:
- Around line 78-81: In anthropic_input_schema, the code only removes the symbol
key :strict which leaves string-keyed "strict" present and can leak OpenAI-only
flags to Anthropic; after deep_dup and the Hash check, delete both
schema.delete(:strict) and schema.delete('strict') (or otherwise normalize keys
and remove strict) so both symbol and string keyed schemas are stripped of the
strict flag before returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: af0254e9-6d68-4426-b4aa-991f674874c0

📥 Commits

Reviewing files that changed from the base of the PR and between d8a12ad and c1dbb51.

⛔ Files ignored due to path filters (1)
  • Gemfile.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • Gemfile
  • app/models/assistant/responder.rb
  • app/models/chat.rb
  • app/models/provider/anthropic.rb
  • app/models/provider/anthropic/chat_config.rb
  • app/models/provider/anthropic/chat_parser.rb
  • app/models/provider/anthropic/message_formatter.rb
  • app/models/provider/llm_concept.rb
  • app/models/provider/openai.rb
  • app/models/provider/registry.rb
  • app/models/setting.rb
  • app/models/user.rb
  • test/models/provider/anthropic/chat_config_test.rb
  • test/models/provider/anthropic/chat_parser_test.rb
  • test/models/provider/anthropic/message_formatter_test.rb
  • test/models/provider/anthropic_test.rb
  • test/models/provider/registry_test.rb

Comment thread app/models/chat.rb Outdated
Comment thread app/models/provider/anthropic.rb
Comment thread app/models/setting.rb Outdated
Comment thread test/models/provider/anthropic_test.rb Outdated
Surface fixes raised by Codex + CodeRabbit on PR 1/5:

- Provider::Anthropic#chat_response now accepts (and ignores) a
  `messages:` kwarg. Assistant::Responder passes both `messages:`
  (OpenAI-shape) and `conversation_history:` (raw Message records) for
  cross-provider parity, so the previous signature raised
  ArgumentError on the first chat turn through the Anthropic provider.
- Provider::Anthropic#supports_model? bypasses the `claude` prefix
  gate when a custom base_url is configured, mirroring the OpenAI
  provider. Bedrock-shaped IDs like
  `anthropic.claude-sonnet-4-5-20250929-v1:0` and
  `claude-opus-4@20250514` are otherwise rejected by
  Assistant::Provided#get_model_provider and the chat dies.
- Setting.anthropic_access_token is now in
  EncryptedSettingFields::ENCRYPTED_FIELDS so the Anthropic API key
  is encrypted at rest like every other provider secret. Previously
  plaintext while siblings (openai_access_token, twelve_data_api_key,
  external_assistant_token) were ciphertext.
- Chat.default_model falls back to whichever provider is actually
  configured. Previously, with LLM_PROVIDER=anthropic but no
  Anthropic credentials, the default model resolved to a Claude ID
  that no registered provider supported, so chats failed even when
  OpenAI was fully configured. Adds Provider::{Anthropic,Openai}#configured?
  class methods for the readable callsite.
- Provider::Anthropic.effective_model uses
  `ENV["ANTHROPIC_MODEL"].presence || Setting.anthropic_model` so the
  Setting lookup is only performed when the env var is absent — the
  previous `ENV.fetch(KEY, default)` evaluated the default arg
  eagerly on every call.
- Provider::Anthropic::ChatConfig#anthropic_input_schema strips both
  `:strict` and `"strict"` keys so JSON-decoded schemas with string
  keys cannot leak the OpenAI-only flag through to Anthropic.

Test coverage added: supports_model? bypass on custom endpoints,
chat_response messages: kwarg compatibility, default_model fallback
in the three credential combinations, configured? against ENV +
Setting, strict-flag stripping for both key types, and a
`Setting.expects(:anthropic_model).never` assertion proving the
ENV-precedence test now exercises the lazy path.

All 4365 tests pass (1 pre-existing libvips env error unrelated).
gariasf added a commit that referenced this pull request May 25, 2026
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/models/provider/anthropic.rb (1)

170-194: ⚠️ Potential issue | 🔴 Critical

Fix Anthropic streaming event class constants

stream_chat_response matches ::Anthropic::Streaming::TextEvent / ::Anthropic::Streaming::MessageStopEvent, but client.messages.stream yields Anthropic::Helpers::Streaming::TextEvent and Anthropic::Helpers::Streaming::MessageStopEvent. Update the constants so the case branches fire and streaming doesn’t fail to recognize events (app/models/provider/anthropic.rb:170-194).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/models/provider/anthropic.rb` around lines 170 - 194, In
stream_chat_response, the case currently checks
::Anthropic::Streaming::TextEvent and ::Anthropic::Streaming::MessageStopEvent
but client.messages.stream yields Anthropic::Helpers::Streaming events; update
the event class references in the case to
Anthropic::Helpers::Streaming::TextEvent and
Anthropic::Helpers::Streaming::MessageStopEvent so the when branches in
stream_chat_response (which iterates stream.each from client.messages.stream)
match and handle events correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/models/chat_test.rb`:
- Around line 65-86: The tests assume Chat.default_model equals
Provider::<X>::DEFAULT_MODEL but Chat.default_model prefers
Provider::<X>::effective_model when set; make the tests deterministic by either
stubbing the provider effective_model to return the DEFAULT_MODEL or clearing
any env/model overrides before asserting. Specifically, in the tests referencing
Chat.default_model (the three cases using Provider::Anthropic::DEFAULT_MODEL and
Provider::Openai::DEFAULT_MODEL), stub Provider::Anthropic.effective_model and
Provider::Openai.effective_model to return nil (or to return the corresponding
DEFAULT_MODEL) so the method falls back predictably, or explicitly clear
OPENAI_MODEL/ANTHROPIC_MODEL values via stubs/mocks prior to the assert. Ensure
you modify the tests that set Setting.stubs(:llm_provider) and
Provider::<...>.stubs(:configured?) to also control
Provider::<...>.stubs(:effective_model) to avoid intermittent failures.

---

Outside diff comments:
In `@app/models/provider/anthropic.rb`:
- Around line 170-194: In stream_chat_response, the case currently checks
::Anthropic::Streaming::TextEvent and ::Anthropic::Streaming::MessageStopEvent
but client.messages.stream yields Anthropic::Helpers::Streaming events; update
the event class references in the case to
Anthropic::Helpers::Streaming::TextEvent and
Anthropic::Helpers::Streaming::MessageStopEvent so the when branches in
stream_chat_response (which iterates stream.each from client.messages.stream)
match and handle events correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 33a04125-557d-4510-9928-75a3d949397c

📥 Commits

Reviewing files that changed from the base of the PR and between c1dbb51 and 714cf0b.

📒 Files selected for processing (8)
  • app/models/chat.rb
  • app/models/provider/anthropic.rb
  • app/models/provider/anthropic/chat_config.rb
  • app/models/provider/openai.rb
  • app/models/setting.rb
  • test/models/chat_test.rb
  • test/models/provider/anthropic/chat_config_test.rb
  • test/models/provider/anthropic_test.rb

Comment thread test/models/chat_test.rb Outdated
CodeRabbit flagged on PR review: the new default_model tests asserted
against Provider::*::DEFAULT_MODEL, but Chat.default_model actually
returns Provider::*.effective_model.presence (which reads
OPENAI_MODEL / ANTHROPIC_MODEL from the environment). With either env
var set, the tests would fail intermittently even though routing was
correct.

- New default_model tests now assert against the provider's
  effective_model directly, so they verify the routing decision
  (which provider's value wins) without coupling to the constant.
- Pre-existing "creates with default model" assertions had the same
  brittleness; switch them to compare against Chat.default_model so
  the chosen model is whatever the env / Setting cascade resolves to.

Verified by running `ANTHROPIC_MODEL=claude-haiku-4-5 OPENAI_MODEL=gpt-4o
bin/rails test test/models/chat_test.rb` — 16 runs, 0 failures
(previously 2 pre-existing failures + 0 from the new tests).
gariasf added a commit that referenced this pull request May 25, 2026
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
- Provider::Anthropic#supports_pdf_processing? bypasses prefix gate for
  custom endpoints, mirroring supports_model?
- Provider::Anthropic#initialize raises Error when custom_endpoint? AND
  model.blank?, parity with Provider::Openai
- stream_chat_response captures partial usage on mid-stream errors and
  records it via the new on_partial callback so chat_response can skip
  the duplicate error row in the outer rescue
- safe_accumulated_message swallows the secondary failure when the SDK
  cannot reconstruct a snapshot
- langfuse_client memoizes properly (||= instead of =) so repeated calls
  don't churn Langfuse instances
- MessageFormatter sorts tool_calls by created_at then id so the
  message array is deterministic across replays; skips tool_calls
  missing both provider_call_id and provider_id rather than sending
  `id: nil` and getting rejected by Anthropic
- Setting.anthropic_access_token default falls back through
  ENV["ANTHROPIC_API_KEY"].presence (was missing .presence, so an
  empty-string env value bled through)
- User#openai_configured? / #anthropic_configured? delegate to the
  Provider::* class methods — single source of truth
- Assistant::Responder renames the OpenAI-shape history builder
  conversation_history → openai_messages_payload so the kwarg name
  matches the local method name (messages: openai_messages_payload,
  conversation_history: chat_message_records)
- Assistant::Builtin stale-history comment updated to reference both
  builders

Adds a streaming chat_response test using ad-hoc subclasses of the
SDK event types so the case/when dispatch matches via is_a? without
stubbing class-level === behavior.
gariasf added a commit that referenced this pull request May 25, 2026
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 25, 2026

Docs PR opened: we-promise/docs#82

Added documentation for the new Anthropic LLM provider, including configuration, environment variables, supported models, and provider selection via LLM_PROVIDER.

Copy link
Copy Markdown
Collaborator

jjmata commented May 26, 2026

Solid foundation PR. A few observations:

Message format handling: The MessageFormatter rebuilding tool-use turns into the required tool_use block + paired user-role tool_result block is the trickiest part of Anthropic's Messages API (Anthropic requires tool results to be in the user role, not assistant). Glad to see it's explicitly handled rather than bolted on later.

Prompt caching: Attaching cache_control: { type: "ephemeral" } to the system prompt and final tool definition in ChatConfig is the correct Anthropic caching pattern. System prompts and tool definitions are the highest-leverage targets since they're large and repeated on every turn.

conversation_history: kwarg on Responder: Backward-compatible (OpenAI path ignores it). Worth confirming that chat_message_records is correctly serialized into Anthropic's content-block format by MessageFormatter — specifically that tool-use turns from prior assistant messages round-trip correctly when retrieved from the database.

Chat.default_model via Setting.llm_provider: Clean abstraction. One edge case to confirm: if llm_provider is "openai" but no OpenAI key is configured (while an Anthropic key is), User#ai_available? would return true (Anthropic configured), but the chat would try to use an OpenAI model. The PR description says Chat.default_model resolves from Setting.llm_provider, so if the operator set llm_provider = "anthropic", the default model would be claude-sonnet-4-6. That's correct. The edge case only applies if an operator sets llm_provider = "openai" with no key, which is a misconfiguration.

VCR cassettes: Noted as a follow-up requiring a real ANTHROPIC_API_KEY. The mocked tests cover the wiring adequately for a foundation PR.

Stacked PRs (#1984#1987): All currently in draft — appropriate while #1983 awaits review. The series is well-structured with each PR independently green.


Generated by Claude Code

Addresses @jjmata's "worth confirming" note on PR #1983: tool-use turns
from prior assistant messages must round-trip correctly when retrieved
from the database.

- New `ChatParser → ToolCall::Function → MessageFormatter` test walks
  the full path: Anthropic response with a tool_use block →
  ChatFunctionRequest → ToolCall::Function.from_function_request →
  persisted on the AssistantMessage → MessageFormatter rebuild on the
  next turn. Asserts the original `tool_use.id` is preserved end-to-end
  as both `tool_use.id` and the paired `tool_result.tool_use_id`, and
  that the original `input` hash and serialized result content survive.
- New multi-tool assistant turn test confirms two tool_use blocks on a
  single assistant message render as two tool_use blocks followed by
  two paired tool_result blocks in a single user-role follow-up,
  matching Anthropic's required alternation.

Both tests exercise the existing PR1 code without behavior changes.
@gariasf
Copy link
Copy Markdown
Collaborator Author

gariasf commented May 26, 2026

Thanks for the review, @jjmata. Quick responses on each point:

Round-trip confirmation — addressed in b26306d8. Added a ChatParser → ToolCall::Function → MessageFormatter round-trip test that walks the full path: Anthropic response w/ a tool_use block → ChatFunctionRequestToolCall::Function.from_function_request → persisted on the AssistantMessage → MessageFormatter rebuild on the next turn. Asserts the original tool_use.id survives end-to-end as both the assistant turn's tool_use.id and the paired user turn's tool_result.tool_use_id. Also added a multi-tool turn test confirming two tool_use blocks render as two tool_use blocks followed by two paired tool_result blocks (Anthropic's required alternation).

Chat.default_model edge caseprefers_anthropic && Provider::Anthropic.configured? short-circuits the first branch when Anthropic isn't actually configured. The next branch (elsif Provider::Openai.configured?) wins. If neither is configured the final else returns OpenAI's effective_model for backward compat. The reverse case you described — llm_provider="openai" + only Anthropic key set — falls through elsif Provider::Anthropic.configured? and returns Claude, which is the right call.

VCR cassettes — yes, slated as a follow-up bundled with #1985 (PDF tests need a real PDF + key too). Mocked client coverage is the bar for the foundation PR.

Stacked PRs all rebased and pushed earlier today; each independently green.

gariasf added a commit that referenced this pull request May 26, 2026
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/models/provider/anthropic/message_formatter_test.rb`:
- Around line 94-100: The test uses OpenStruct (see anthropic_response in
test/models/provider/anthropic/message_formatter_test.rb) but doesn’t require
it; add require "ostruct" at the top of that test file so OpenStruct is always
defined (place the require above the test class or before the anthropic_response
fixture/usage).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8eda0b17-1836-4ff3-b8c6-2ae23ea43747

📥 Commits

Reviewing files that changed from the base of the PR and between 6675331 and b26306d.

📒 Files selected for processing (1)
  • test/models/provider/anthropic/message_formatter_test.rb

Comment thread test/models/provider/anthropic/message_formatter_test.rb
Copy link
Copy Markdown

Three functional blockers still open after the latest commits — flagging for resolution before merge:

1. Streaming class namespace mismatch (silent failure)
stream_chat_response matches ::Anthropic::Streaming::TextEvent / ::Anthropic::Streaming::MessageStopEvent, but client.messages.stream yields Anthropic::Helpers::Streaming::TextEvent / Anthropic::Helpers::Streaming::MessageStopEvent. The case branches never fire, so streaming returns empty output silently.

2. Provider::Anthropic#chat_response missing messages: kwarg
Assistant::Responder calls llm.chat_response(..., messages: ..., conversation_history: ...). If the Anthropic method signature doesn't accept messages:, every Anthropic-backed chat turn raises ArgumentError: unknown keyword: :messages at runtime — not caught by the current unit tests because they don't go through Responder.

3. anthropic_access_token stored unencrypted in Setting
The field needs to be added to ENCRYPTED_FIELDS alongside the other secret tokens. Storing it in plaintext means DB dumps expose the key.

None of these are blocked by missing VCR cassettes — they're wiring bugs that should be reproducible with existing stubs.


Generated by Claude Code

OpenStruct is moving out of Ruby's default load path (warning in 3.4+,
removed in 3.5+). Tests work today because ActiveSupport transitively
loads it, but that's incidental. Match the existing convention in
test/controllers/settings/hostings_controller_test.rb which explicitly
requires ostruct for the same reason.
gariasf added a commit that referenced this pull request May 27, 2026
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
@gariasf
Copy link
Copy Markdown
Collaborator Author

gariasf commented May 27, 2026

@sure-design Thanks for the second pass. Walking through each — all three are already resolved on the current tip (17e9e0d9):

1. Streaming class namespace mismatch — this isn't actually a mismatch. The anthropic gem ships an explicit alias in lib/anthropic/streaming.rb:

module Anthropic
  Streaming = Anthropic::Helpers::Streaming
end

::Anthropic::Streaming::TextEvent and ::Anthropic::Helpers::Streaming::TextEvent resolve to the same class object — Ruby's constant resolution follows the assignment. The case/when branches in stream_chat_response match correctly via is_a? (which is what Class#=== delegates to).

Empirical confirmation: anthropic_test.rb:147 creates ad-hoc subclasses of ::Anthropic::Streaming::TextEvent and ::Anthropic::Streaming::MessageStopEvent, fires them through the stream proc, and asserts that two text deltas and one final-response chunk are emitted. The test passes — if the namespaces were distinct classes the case/when wouldn't dispatch and the assertions would fail.

2. Provider::Anthropic#chat_response missing messages: kwarg — fixed in 714cf0bb. Signature on the current tip (anthropic.rb:96):

def chat_response(
  prompt,
  model:,
  instructions: nil,
  functions: [],
  function_results: [],
  messages: nil,           # ← accepted (ignored, Anthropic uses conversation_history)
  conversation_history: [],
  ...
)

Regression test at anthropic_test.rb:100 exercises a Responder-shaped call with both messages: and conversation_history: kwargs.

3. anthropic_access_token stored unencrypted — fixed in 714cf0bb. :anthropic_access_token is in EncryptedSettingFields::ENCRYPTED_FIELDS in setting.rb:77 alongside :openai_access_token, :twelve_data_api_key, :external_assistant_token, etc. The MessageEncryptor wrapper at setting.rb:80-89 intercepts reads (decrypt) and writes (encrypt) for every listed field; existing plaintext values fall back via the InvalidSignature/InvalidMessage rescue and re-encrypt on next write.

Worth re-checking against 17e9e0d9 if the bot was running against an older snapshot.

@gariasf
Copy link
Copy Markdown
Collaborator Author

gariasf commented May 27, 2026

@coderabbitai review again plz

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

@gariasf Sure! I'll kick off a fresh review of the PR against the latest commits.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
app/models/assistant/responder.rb (1)

123-132: ⚡ Quick win

Avoid double-querying chat history for both payload builders.

Line 126 and Line 141 run nearly the same query; this adds avoidable overhead every request. Reuse one fetched record set and derive both shapes from it.

♻️ Proposed refactor
+    def complete_chat_messages
+      return [] unless chat&.messages
+      `@complete_chat_messages` ||= chat.messages
+                                      .where(type: [ "UserMessage", "AssistantMessage" ], status: "complete")
+                                      .includes(:tool_calls)
+                                      .ordered
+                                      .to_a
+    end
+
     def chat_message_records
-      return [] unless chat&.messages
-
-      chat.messages
-          .where(type: [ "UserMessage", "AssistantMessage" ], status: "complete")
-          .where.not(id: message.id)
-          .includes(:tool_calls)
-          .ordered
-          .to_a
+      complete_chat_messages.reject { |m| m.id == message.id }
     end

     def openai_messages_payload
       messages = []
-      return messages unless chat&.messages
-
-      chat.messages
-          .where(type: [ "UserMessage", "AssistantMessage" ], status: "complete")
-          .includes(:tool_calls)
-          .ordered
-          .each do |chat_message|
+      complete_chat_messages.each do |chat_message|
         if chat_message.tool_calls.any?
           messages << {

Also applies to: 137-145

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/models/assistant/responder.rb` around lines 123 - 132, The
chat_message_records method currently causes two nearly identical DB queries by
re-fetching chat.messages for both payload builders; instead, fetch the records
once into a local variable (using chat.messages.where(type:
["UserMessage","AssistantMessage"], status: "complete").where.not(id:
message.id).includes(:tool_calls).ordered.to_a) inside chat_message_records (or
the caller), then pass or derive the two required shapes from that single
in-memory array rather than re-running chat.messages for each payload builder;
update any callers that expect separate queries to accept the cached array and
compute their specific shapes from it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/models/provider/anthropic.rb`:
- Around line 265-266: The current logs call Rails.logger.warn with
e.full_message which can expose sensitive request/output payloads; update the
logging in the Langfuse trace creation/error handlers (the Rails.logger.warn
calls that reference e.full_message and e.message) to only log the exception
class, sanitized message, and a trace identifier (if available) without the full
backtrace or payload, e.g., replace usage of e.full_message with a short
sanitized string (exception class + e.message + trace_id) and remove any raw
payload/backtrace printing; apply the same change for the other occurrences
around the Rails.logger.warn calls at the later error handlers mentioned (the
similar calls near lines 289-290 and 300-301).

In `@app/models/provider/anthropic/message_formatter.rb`:
- Around line 114-123: parse_arguments can return non-Hash values
(scalars/arrays) which makes Anthropic tool_use.input invalid; modify
parse_arguments so it always returns a Hash: compute the parsed result (handle
nil, Hash, String with JSON.parse/rescue, and other types), then if the result
is not a Hash wrap it as a Hash like { "input" => result } (or symbolized :input
if codebase prefers), and return that Hash; update parse_arguments to perform
this normalization so tool_use.input is always a JSON object.

---

Nitpick comments:
In `@app/models/assistant/responder.rb`:
- Around line 123-132: The chat_message_records method currently causes two
nearly identical DB queries by re-fetching chat.messages for both payload
builders; instead, fetch the records once into a local variable (using
chat.messages.where(type: ["UserMessage","AssistantMessage"], status:
"complete").where.not(id: message.id).includes(:tool_calls).ordered.to_a) inside
chat_message_records (or the caller), then pass or derive the two required
shapes from that single in-memory array rather than re-running chat.messages for
each payload builder; update any callers that expect separate queries to accept
the cached array and compute their specific shapes from it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eac52cd6-9fb5-4fa9-bffe-84b46d04b0ae

📥 Commits

Reviewing files that changed from the base of the PR and between 0342958 and c120a05.

⛔ Files ignored due to path filters (1)
  • Gemfile.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • Gemfile
  • app/models/assistant/builtin.rb
  • app/models/assistant/responder.rb
  • app/models/chat.rb
  • app/models/provider/anthropic.rb
  • app/models/provider/anthropic/chat_config.rb
  • app/models/provider/anthropic/chat_parser.rb
  • app/models/provider/anthropic/message_formatter.rb
  • app/models/provider/llm_concept.rb
  • app/models/provider/openai.rb
  • app/models/provider/registry.rb
  • app/models/setting.rb
  • app/models/user.rb
  • test/models/chat_test.rb
  • test/models/provider/anthropic/chat_config_test.rb
  • test/models/provider/anthropic/chat_parser_test.rb
  • test/models/provider/anthropic/message_formatter_test.rb
  • test/models/provider/anthropic_test.rb
  • test/models/provider/registry_test.rb

Comment thread app/models/provider/anthropic.rb Outdated
Comment thread app/models/provider/anthropic/message_formatter.rb
… history fetch

Addresses three open CodeRabbit findings on PR #1983.

- Provider::Anthropic Langfuse rescue branches no longer include
  `e.full_message` in `Rails.logger.warn`. `full_message` bundles the
  backtrace + cause chain and on some SDK error types includes the
  serialized request/response payload (prompt, model output). Logs
  now report `#{e.class}: #{e.message}` only. Three sites:
  create_langfuse_trace, log_langfuse_generation, upsert_langfuse_trace.
  Note: Provider::Openai has the same pattern (copy-pasted source) —
  harmonization deferred to a follow-up cleanup PR; this commit fixes
  only the Anthropic provider to keep PR scope tight.

- MessageFormatter#parse_arguments now coerces any non-Hash parsed
  result to `{}`. Anthropic's Messages API requires `tool_use.input`
  to be a JSON object (map); a stored ToolCall::Function record whose
  arguments parse to a scalar, bool, or array (corrupt row, legacy
  data, cross-provider bleed) would otherwise produce a payload the
  API rejects. Normal flow stores Hash arguments end-to-end so the
  fix is defensive — adds 2 tests covering scalar/array JSON strings
  and non-String non-Hash inputs.

- Assistant::Responder dedups the chat-history fetch. The previous
  layout fired two near-identical `chat.messages.where(...).includes(
  :tool_calls).ordered` queries per LLM turn (one for the OpenAI-shape
  payload, one for the raw-records kwarg). A new memoized
  `complete_chat_messages` fetches once; `chat_message_records` filters
  out the current message via `Array#reject`, `openai_messages_payload`
  iterates the cached array unchanged. One SQL query per turn instead
  of two. Memoization scope = single Responder instance (per LLM call),
  so cache invalidation is not a concern.

All 4370 tests pass (1 pre-existing libvips env error unrelated).
Rubocop + brakeman clean.
gariasf added a commit that referenced this pull request May 27, 2026
Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.
@gariasf
Copy link
Copy Markdown
Collaborator Author

gariasf commented May 27, 2026

Latest review round addressed in 5ab9bb33:

  • 🟠 Sanitized Langfuse warn logs (sensitive payload leak) — inline reply on the affected thread.
  • 🟠 Normalized parse_arguments so tool_use.input is always a Hash — inline reply on the affected thread.
  • 🟡 Deduped the chat-history fetch in Assistant::Responder. New memoized complete_chat_messages runs once per LLM turn; chat_message_records filters out the current message via Array#reject, openai_messages_payload iterates the cached array. One SQL query per turn instead of two. (This was in the review-body Nitpick section without a dedicated inline thread, hence the summary here.)

Stack rebased onto the latest c120a050 (main merge), all 4409 tests green, rubocop + brakeman clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants