fix: prefer provided grading provider in llm-rubric and redteam grader #5739

MrFlounder · 2025-09-25T22:59:26Z

Ensure grading respects the provided provider (from caller) instead of defaulting to OpenAI or redteam defaults.
Applies to both llm-rubric assertion handling and redteam grader.

use-tusk · 2025-09-25T22:59:40Z

⏩ No test execution environment matched (b9a6834) View output ↗

Tip

New to Tusk? Learn more here.

View check history

Commit	Status	Output	Created (UTC)
`113a823`	⏩ No test execution environment matched	Output	Sep 25, 2025 10:59PM
`efc520d`	⏩ No test execution environment matched	Output	Sep 25, 2025 11:39PM
`b9a6834`	⏩ No test execution environment matched	Output	Sep 26, 2025 12:02AM

View output in GitHub ↗

coderabbitai · 2025-09-25T23:02:17Z

📝 Walkthrough

Walkthrough

The change adds an optional provider parameter to handleLlmRubric and threads provider selection through the assertion and redteam grading paths. If a provider is passed in and test.options lacks one, it is merged into the options sent to matchesLlmRubric; otherwise existing test.options are used. RedteamGraderBase and RedteamPluginBase now compute a single providerToUse from either the caller-supplied provider or the resolved default and pass it to matchesLlmRubric. Tests are updated to reflect the new options shape and to verify provider precedence and propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly summarizes the core change by stating that the pull request makes the code prefer a provided grading provider in both the llm-rubric assertion handler and the redteam grader. It is concise, specific, and follows conventional commit style without unnecessary detail or vagueness. A reader scanning the history can immediately understand the main intent of the changeset.
Description Check	✅ Passed	The description succinctly explains that grading will now respect a caller-provided provider instead of defaulting to OpenAI or redteam defaults and notes that this applies to both the llm-rubric and redteam grader. It is directly related to the changes in the code and provides appropriate context for reviewers. The level of detail is sufficient and correctly aligns with the changeset.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/support-provider-passed-in

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/redteam/plugins/base.ts (1)
456-476: Provider object propagated into grading config

Same issue as in handleLlmRubric: the ternary drops an ApiProvider instance into test.options.provider, but matchesLlmRubric expects a provider config/identifier, not the instantiated object. When getAndCheckProvider receives this, it will mis-handle the type (e.g., treat the instance like a config map). We need to keep test.options.provider untouched and instead pass the instantiated provider through a dedicated channel so matchesLlmRubric can short-circuit provider resolution.

Update the call to preserve test.options and pass the ready-to-use provider through the options argument:
-    const grade = await matchesLlmRubric(finalRubric, llmOutput, {
-      ...test.options,
-      provider: providerToUse,
-    });
+    const grade = await matchesLlmRubric(
+      finalRubric,
+      llmOutput,
+      test.options,
+      test.vars,
+      undefined,
+      { gradingProvider: providerToUse },
+    );
Then teach matchesLlmRubric to honor options.gradingProvider (before falling back to grading.provider) so the caller-supplied instance is actually used.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a260271 and 113a823.

📒 Files selected for processing (4)

src/assertions/llmRubric.ts (2 hunks)
src/redteam/plugins/base.ts (4 hunks)
test/assertions/llmRubric.test.ts (4 hunks)
test/redteam/plugins/base.test.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; use existing interfaces whenever possible

**/*.{ts,tsx}: Follow consistent import order (Biome will handle sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

Files:

src/assertions/llmRubric.ts
test/redteam/plugins/base.test.ts
src/redteam/plugins/base.ts
test/assertions/llmRubric.test.ts

**/*.{test,spec}.{js,ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Avoid disabling or skipping tests unless absolutely necessary and documented

Files:

test/redteam/plugins/base.test.ts
test/assertions/llmRubric.test.ts

test/**/*.{test,spec}.ts

📄 CodeRabbit inference engine (.cursor/rules/jest.mdc)

test/**/*.{test,spec}.ts: Mock as few functions as possible to keep tests realistic
Never increase the function timeout - fix the test instead
Organize tests in descriptive describe and it blocks
Prefer assertions on entire objects rather than individual keys when writing expectations
Clean up after tests to prevent side effects (e.g., use afterEach(() => { jest.resetAllMocks(); }))
Run tests with --randomize flag to ensure your mocks setup and teardown don't affect other tests
Use Jest's mocking utilities rather than complex custom mocks
Prefer shallow mocking over deep mocking
Mock external dependencies but not the code being tested
Reset mocks between tests to prevent test pollution
For database tests, use in-memory instances or proper test fixtures
Test both success and error cases for each provider
Mock API responses to avoid external dependencies in tests
Validate that provider options are properly passed to the underlying service
Test error handling and edge cases (rate limits, timeouts, etc.)
Ensure provider caching behaves as expected
Always include both --coverage and --randomize flags when running tests
Run tests in a single pass (no watch mode for CI)
Ensure all tests are independent and can run in any order
Clean up any test data or mocks after each test

Files:

test/redteam/plugins/base.test.ts
test/assertions/llmRubric.test.ts

test/**/*.{test.ts,test.tsx,spec.ts,spec.tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Follow Jest best practices with describe/it blocks in tests

Files:

test/redteam/plugins/base.test.ts
test/assertions/llmRubric.test.ts

🧠 Learnings (5)