SWE-bench: fix SWE-agent hanging, adjust expected scores#1202
Conversation
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
📝 WalkthroughWalkthroughUpdates numeric example values in evaluation documentation, adds explicit rich package downgrade (v14.2.0) to SWE-agent setup in swebench.py, and updates metric ranges and reference date in test validation thresholds to reflect new measurement data. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
rich==14.2.0in the SWE-agent environment.Summary by CodeRabbit
Release Notes
Documentation
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.