Skip to content

Conversation

MrFlounder
Copy link
Contributor

Ensure grading respects the provided provider (from caller) instead of defaulting to OpenAI or redteam defaults.
Applies to both llm-rubric assertion handling and redteam grader.

@MrFlounder MrFlounder requested a review from a team September 25, 2025 22:59
Copy link
Contributor

use-tusk bot commented Sep 25, 2025

⏩ No test execution environment matched (b9a6834) View output ↗

Tip

New to Tusk? Learn more here.


View check history

Commit Status Output Created (UTC)
113a823 ⏩ No test execution environment matched Output Sep 25, 2025 10:59PM
efc520d ⏩ No test execution environment matched Output Sep 25, 2025 11:39PM
b9a6834 ⏩ No test execution environment matched Output Sep 26, 2025 12:02AM

View output in GitHub ↗

Copy link
Contributor

coderabbitai bot commented Sep 25, 2025

📝 Walkthrough

Walkthrough

The change adds an optional provider parameter to handleLlmRubric and threads provider selection through the assertion and redteam grading paths. If a provider is passed in and test.options lacks one, it is merged into the options sent to matchesLlmRubric; otherwise existing test.options are used. RedteamGraderBase and RedteamPluginBase now compute a single providerToUse from either the caller-supplied provider or the resolved default and pass it to matchesLlmRubric. Tests are updated to reflect the new options shape and to verify provider precedence and propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the core change by stating that the pull request makes the code prefer a provided grading provider in both the llm-rubric assertion handler and the redteam grader. It is concise, specific, and follows conventional commit style without unnecessary detail or vagueness. A reader scanning the history can immediately understand the main intent of the changeset.
Description Check ✅ Passed The description succinctly explains that grading will now respect a caller-provided provider instead of defaulting to OpenAI or redteam defaults and notes that this applies to both the llm-rubric and redteam grader. It is directly related to the changes in the code and provides appropriate context for reviewers. The level of detail is sufficient and correctly aligns with the changeset.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/support-provider-passed-in

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/redteam/plugins/base.ts (1)

456-476: Provider object propagated into grading config

Same issue as in handleLlmRubric: the ternary drops an ApiProvider instance into test.options.provider, but matchesLlmRubric expects a provider config/identifier, not the instantiated object. When getAndCheckProvider receives this, it will mis-handle the type (e.g., treat the instance like a config map). We need to keep test.options.provider untouched and instead pass the instantiated provider through a dedicated channel so matchesLlmRubric can short-circuit provider resolution.

Update the call to preserve test.options and pass the ready-to-use provider through the options argument:

-    const grade = await matchesLlmRubric(finalRubric, llmOutput, {
-      ...test.options,
-      provider: providerToUse,
-    });
+    const grade = await matchesLlmRubric(
+      finalRubric,
+      llmOutput,
+      test.options,
+      test.vars,
+      undefined,
+      { gradingProvider: providerToUse },
+    );

Then teach matchesLlmRubric to honor options.gradingProvider (before falling back to grading.provider) so the caller-supplied instance is actually used.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a260271 and 113a823.

📒 Files selected for processing (4)
  • src/assertions/llmRubric.ts (2 hunks)
  • src/redteam/plugins/base.ts (4 hunks)
  • test/assertions/llmRubric.test.ts (4 hunks)
  • test/redteam/plugins/base.test.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; use existing interfaces whenever possible

**/*.{ts,tsx}: Follow consistent import order (Biome will handle sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

Files:

  • src/assertions/llmRubric.ts
  • test/redteam/plugins/base.test.ts
  • src/redteam/plugins/base.ts
  • test/assertions/llmRubric.test.ts
**/*.{test,spec}.{js,ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Avoid disabling or skipping tests unless absolutely necessary and documented

Files:

  • test/redteam/plugins/base.test.ts
  • test/assertions/llmRubric.test.ts
test/**/*.{test,spec}.ts

📄 CodeRabbit inference engine (.cursor/rules/jest.mdc)

test/**/*.{test,spec}.ts: Mock as few functions as possible to keep tests realistic
Never increase the function timeout - fix the test instead
Organize tests in descriptive describe and it blocks
Prefer assertions on entire objects rather than individual keys when writing expectations
Clean up after tests to prevent side effects (e.g., use afterEach(() => { jest.resetAllMocks(); }))
Run tests with --randomize flag to ensure your mocks setup and teardown don't affect other tests
Use Jest's mocking utilities rather than complex custom mocks
Prefer shallow mocking over deep mocking
Mock external dependencies but not the code being tested
Reset mocks between tests to prevent test pollution
For database tests, use in-memory instances or proper test fixtures
Test both success and error cases for each provider
Mock API responses to avoid external dependencies in tests
Validate that provider options are properly passed to the underlying service
Test error handling and edge cases (rate limits, timeouts, etc.)
Ensure provider caching behaves as expected
Always include both --coverage and --randomize flags when running tests
Run tests in a single pass (no watch mode for CI)
Ensure all tests are independent and can run in any order
Clean up any test data or mocks after each test

Files:

  • test/redteam/plugins/base.test.ts
  • test/assertions/llmRubric.test.ts
test/**/*.{test.ts,test.tsx,spec.ts,spec.tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Follow Jest best practices with describe/it blocks in tests

Files:

  • test/redteam/plugins/base.test.ts
  • test/assertions/llmRubric.test.ts
🧠 Learnings (5)
📚 Learning: 2025-07-18T17:26:23.087Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Validate that provider options are properly passed to the underlying service

Applied to files:

  • src/assertions/llmRubric.ts
  • test/redteam/plugins/base.test.ts
  • test/assertions/llmRubric.test.ts
📚 Learning: 2025-07-18T17:26:23.087Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Test both success and error cases for each provider

Applied to files:

  • test/redteam/plugins/base.test.ts
  • test/assertions/llmRubric.test.ts
📚 Learning: 2025-07-18T17:26:23.087Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Ensure provider caching behaves as expected

Applied to files:

  • test/redteam/plugins/base.test.ts
  • test/assertions/llmRubric.test.ts
📚 Learning: 2025-07-18T17:26:23.087Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Prefer assertions on entire objects rather than individual keys when writing expectations

Applied to files:

  • test/assertions/llmRubric.test.ts
📚 Learning: 2025-07-18T17:26:23.087Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Mock as few functions as possible to keep tests realistic

Applied to files:

  • test/assertions/llmRubric.test.ts
🧬 Code graph analysis (4)
src/assertions/llmRubric.ts (1)
src/matchers.ts (1)
  • matchesLlmRubric (460-598)
test/redteam/plugins/base.test.ts (2)
src/types/index.ts (1)
  • GradingResult (367-401)
src/matchers.ts (1)
  • matchesLlmRubric (460-598)
src/redteam/plugins/base.ts (2)
src/redteam/providers/shared.ts (1)
  • redteamProviderManager (117-117)
src/matchers.ts (1)
  • matchesLlmRubric (460-598)
test/assertions/llmRubric.test.ts (1)
src/assertions/llmRubric.ts (1)
  • handleLlmRubric (5-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Redteam (Production API)
  • GitHub Check: Run Integration Tests
  • GitHub Check: webui tests
  • GitHub Check: Style Check
  • GitHub Check: Share Test
  • GitHub Check: Generate Assets
  • GitHub Check: Test on Node 22.x and macOS-latest
  • GitHub Check: Test on Node 24.x and windows-latest
  • GitHub Check: Redteam (Staging API)
  • GitHub Check: Test on Node 24.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and windows-latest
  • GitHub Check: Test on Node 22.x and windows-latest
  • GitHub Check: Test on Node 20.x and macOS-latest
  • GitHub Check: Test on Node 22.x and ubuntu-latest
  • GitHub Check: Build Docs
  • GitHub Check: Test on Node 20.x and ubuntu-latest
  • GitHub Check: Build on Node 24.x
  • GitHub Check: Build on Node 22.x
  • GitHub Check: Build on Node 20.x
  • GitHub Check: Analyze (javascript-typescript)

@MrFlounder MrFlounder marked this pull request as draft September 26, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant