You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tests/eval/DESIGN.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -155,7 +155,8 @@ These are copied into the temp clone so that any local modifications to the revi
155
155
156
156
## Phase 2
157
157
158
-
### Cost Tracking
158
+
### Cost Tracking ✅ IMPLEMENTED
159
+
159
160
160
161
Use `--output-format json` to capture `total_cost_usd` from each Claude invocation. Accumulate across all calls (review + judge) and print the total in `AfterSuite`.
161
162
@@ -254,4 +255,4 @@ The API review step is the slowest part of the eval suite. Options to improve:
254
255
255
256
3.**Parallel test execution** - Run golden tests in parallel (requires separate repo clones per test).
256
257
257
-
4.**Smaller/faster model for development** - Use Haiku for rapid iteration, Sonnet/Opus for CI validation.
258
+
4.**Smaller/faster model for development** - Use Haiku for rapid iteration, Sonnet/Opus for CI validation.
Expect(result.Pass).To(BeTrue(), "API review did not match expected issues.\nJudge reason: %s\nReview output:\n%s\nExpected issues:\n%s", result.Reason, reviewOutput, expectedIssues)
0 commit comments