feat(ai): Anthropic batch ops + LLM cost ledger (2/5) by gariasf · Pull Request #1984 · we-promise/sure

gariasf · 2026-05-25T14:38:44Z

Summary

Implements auto_categorize, auto_detect_merchants, and enhance_provider_merchants on Provider::Anthropic via forced tool calls, plus the cost-ledger plumbing they need. This is PR 2 of 5, stacked on #1983.

Why forced tool calls

Anthropic has no first-class JSON-mode flag. The idiomatic replacement is to define a single output tool whose input_schema mirrors the desired output, then force the model to invoke it with tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }. The model returns exactly one tool_use block whose input is guaranteed to validate against the schema.

Net effect:

No `` tag stripping
No `json_schema` ↔ `json_object` ↔ `none` fallback ladder
No `parse_json_flexibly` cascade with markdown-code-block heuristics
One clear failure mode ("model did not invoke the tool"), raised as Provider::Anthropic::Error

The three Anthropic task classes end up ~30% smaller than their OpenAI siblings while covering the same surface.

What's in this PR

Task classes (each is small: forced-tool-call request + schema + normalize)

Provider::Anthropic::AutoCategorizer → tool: report_categorizations
Provider::Anthropic::AutoMerchantDetector → tool: report_merchants
Provider::Anthropic::ProviderMerchantEnhancer → tool: report_enhancements

Each:

Caps batch size at 25 (parity with OpenAI) and raises a clear error above it.
Tags Langfuse spans with the operation name so dashboards aggregate across providers.
Records usage via Concerns::UsageRecorder (mirror of the OpenAI sibling).
Normalizes "null" strings and case-insensitive matches against the user's category / merchant list.

Cost ledger

Migration 20260525120000_add_anthropic_cache_tokens_to_llm_usages.rb adds nullable cache_creation_tokens and cache_read_tokens integer columns. OpenAI rows leave them null; Anthropic rows populate them.
LlmUsage::PRICING gains claude-opus-4-7 ($15/$75), claude-opus-4-6 ($15/$75), claude-sonnet-4-6 ($3/$15), claude-sonnet-4-5 ($3/$15), claude-haiku-4-5 ($1/$5) per MTok (Anthropic public pricing).
LlmUsage.infer_provider returns "anthropic" for claude-* via the existing exact/prefix lookup — no code change needed beyond the PRICING rows.
Provider::Anthropic#chat_response (introduced in PR 1) now persists cache columns directly instead of stashing them in metadata.

Not changed (intentional)

Anthropic cache-creation tokens are billed at ~1.25x input rate and reads at 0.1x. LlmUsage.calculate_cost continues to bill them at the regular input rate for now — a deliberate slight over/under depending on cache lifetime, refined in a follow-up if real-world bills warrant it.
OpenAI batch ops are untouched.

Test plan

test/models/provider/anthropic/auto_categorizer_test.rb — 3 tests: forced-tool-call request shape, null/None normalization, missing-tool_use error path
test/models/provider/anthropic/auto_merchant_detector_test.rb — 3 tests: same shape + case-insensitive user_merchants matching
test/models/provider/anthropic/provider_merchant_enhancer_test.rb — 2 tests: forced-tool-call mapping + error path
test/models/llm_usage_test.rb (new file) — Claude pricing math, provider inference
All PR 1 tests still green
Full suite: 4371 runs, 18048 assertions, 0 failures, 26 pre-existing skips, 1 pre-existing libvips env error
bin/rubocop clean
bin/brakeman --no-pager clean

Migration

bin/rails db:migrate

Backwards compatible: new columns are nullable and existing OpenAI write paths are untouched.

coderabbitai · 2026-05-25T14:38:51Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb7a0217-2197-422f-abe9-6227a16472d5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/anthropic-batch-ops

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Implements process_pdf and extract_bank_statement on Provider::Anthropic using the native `document` content block — no rasterization, no text pre-extraction. - Provider::Anthropic::PdfProcessor classifies the document, summarizes it, and extracts statement metadata via a forced report_document_analysis tool whose input_schema mirrors the existing Provider::Openai output (document_type from Import::DOCUMENT_TYPES, summary, extracted_data). - Provider::Anthropic::BankStatementExtractor returns the same { transactions, period, account_holder, account_number, bank_name, opening_balance, closing_balance } shape via report_bank_statement so downstream pdf_import code is provider-agnostic. - Both attach the PDF as { type: "document", source: { type: "base64", media_type: "application/pdf", data: <b64> } } — Claude 3.5+ / 4.x accept this natively (up to 32MB / 100 pages). No pdf-reader, no pdftoppm, no chunking for typical statements. - supports_pdf_processing? (introduced in PR 1) already returns true for claude-* models, gating process_pdf with a clear error otherwise. - Cost ledger rows are persisted via the shared UsageRecorder concern, including cache_creation/cache_read tokens. Tests verify the document block shape, tool_choice forcing, normalized document_type for unknown classifications, transaction normalization (date / amount / reference → notes), and the missing-tool_use error path. Blank pdf_content raises before any client call. Stacked on #1984 (PR 2/5). 4/5 pgvector RAG next.

Implements auto_categorize, auto_detect_merchants, and enhance_provider_merchants on Provider::Anthropic via forced tool calls, plus the cost-ledger plumbing they need. - Provider::Anthropic::AutoCategorizer, AutoMerchantDetector, ProviderMerchantEnhancer each define a single output tool whose input_schema mirrors the desired output, then force the model to call it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }. Anthropic guarantees the tool_use.input matches the schema, so there is no JSON parsing fragility, no <think> tag stripping, and no json_object/json_schema fallback ladders. - Concerns::UsageRecorder mirrors the OpenAI sibling but persists cache_creation_input_tokens / cache_read_input_tokens to dedicated columns instead of metadata. - Migration adds cache_creation_tokens, cache_read_tokens (nullable integers) to llm_usages. OpenAI rows leave them null. - LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6 $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic" for claude-* via the existing exact/prefix lookup. - Provider::Anthropic#chat_response now persists cache columns directly rather than stashing them in metadata. - 25-transaction batch cap mirrors the OpenAI provider so the cost ledger sees the same shape regardless of which provider ran a batch. Tests cover the forced-tool-call path, null/None normalization, case-insensitive merchant matching, the missing-tool_use error path, and Anthropic-specific pricing + provider inference on LlmUsage. Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.

- LlmUsage.infer_provider now returns "anthropic" for Bedrock / Vertex shaped IDs (anthropic.* and anthropic/*), so cost-ledger filtering by provider stays correct even when no per-MTok rate is stored. Previously these IDs fell through to the "openai" default. - AutoCategorizer drops the redundant nil sentinel from the category_name enum — the union type [string, null] already permits null, and some JSON Schema validators reject nil literals inside enum arrays.

Same rationale as the PR1 ostruct fix — explicit require so the tests don't depend on ActiveSupport's transitive load when Ruby 3.5+ removes OpenStruct from the default load path.

Implements process_pdf and extract_bank_statement on Provider::Anthropic using the native `document` content block — no rasterization, no text pre-extraction. - Provider::Anthropic::PdfProcessor classifies the document, summarizes it, and extracts statement metadata via a forced report_document_analysis tool whose input_schema mirrors the existing Provider::Openai output (document_type from Import::DOCUMENT_TYPES, summary, extracted_data). - Provider::Anthropic::BankStatementExtractor returns the same { transactions, period, account_holder, account_number, bank_name, opening_balance, closing_balance } shape via report_bank_statement so downstream pdf_import code is provider-agnostic. - Both attach the PDF as { type: "document", source: { type: "base64", media_type: "application/pdf", data: <b64> } } — Claude 3.5+ / 4.x accept this natively (up to 32MB / 100 pages). No pdf-reader, no pdftoppm, no chunking for typical statements. - supports_pdf_processing? (introduced in PR 1) already returns true for claude-* models, gating process_pdf with a clear error otherwise. - Cost ledger rows are persisted via the shared UsageRecorder concern, including cache_creation/cache_read tokens. Tests verify the document block shape, tool_choice forcing, normalized document_type for unknown classifications, transaction normalization (date / amount / reference → notes), and the missing-tool_use error path. Blank pdf_content raises before any client call. Stacked on #1984 (PR 2/5). 4/5 pgvector RAG next.

github-actions Bot added the not-gittensor label May 25, 2026

gariasf mentioned this pull request May 25, 2026

feat(ai): Anthropic native PDF processing (3/5) #1985

Draft

6 tasks

gariasf force-pushed the feature/anthropic-batch-ops branch from f7b0ff8 to 487b714 Compare May 25, 2026 17:50

gariasf force-pushed the feature/anthropic-batch-ops branch from 487b714 to 1d35650 Compare May 25, 2026 17:58

gariasf force-pushed the feature/anthropic-batch-ops branch from 1d35650 to a35b5ae Compare May 25, 2026 18:30

jjmata mentioned this pull request May 26, 2026

feat(ai): add Anthropic provider with chat parity (1/5) #1983

Open

9 tasks

gariasf force-pushed the feature/anthropic-batch-ops branch from a35b5ae to 7cd947e Compare May 26, 2026 08:39

gariasf force-pushed the feature/anthropic-batch-ops branch from 7cd947e to 8913007 Compare May 27, 2026 08:09

gariasf added 3 commits May 27, 2026 10:42

test(ai): require "ostruct" in Anthropic batch op tests

764424e

Same rationale as the PR1 ostruct fix — explicit require so the tests don't depend on ActiveSupport's transitive load when Ruby 3.5+ removes OpenStruct from the default load path.

gariasf force-pushed the feature/anthropic-batch-ops branch from 8913007 to 764424e Compare May 27, 2026 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): Anthropic batch ops + LLM cost ledger (2/5)#1984

feat(ai): Anthropic batch ops + LLM cost ledger (2/5)#1984
gariasf wants to merge 3 commits into
feature/anthropic-foundationfrom
feature/anthropic-batch-ops

gariasf commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gariasf commented May 25, 2026

Summary

Why forced tool calls

What's in this PR

Test plan

Migration

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 25, 2026 •

edited

Loading