Fixes to support SWE-bench Multilingual#1064
Conversation
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
[Update] I found this (Kipok/SWE-bench#2) for 3. Good to keep a reference here for record. |
📝 WalkthroughWalkthroughThe changes add jq binary support to the OpenHands/SWE-agent environment setup procedure and introduce a new multilingual configuration template for SWE-agent prompts with language-agnostic instruction guidance and tool configuration. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
nemo_skills/inference/eval/swebench.py(2 hunks)nemo_skills/prompt/config/eval/swe-bench/swe-agent/multilingual.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
🔇 Additional comments (5)
nemo_skills/inference/eval/swebench.py (1)
538-547: LGTM! jq integration follows the established pattern.The implementation for copying jq from
/root_mountand creating symbolic links mirrors the approach used for tmux and poetry, ensuring consistency across the codebase.nemo_skills/prompt/config/eval/swe-bench/swe-agent/multilingual.yaml (4)
1-6: Clear documentation of config origin and purpose.The header comments appropriately document the base configuration source and the language-agnostic modifications, making the intent clear for future maintainers.
7-35: LGTM! Templates are language-agnostic as intended.The system and instance templates successfully avoid Python-specific language while providing clear instructions. The step-by-step guidance in the instance template is well-structured and appropriate for multilingual code repositories.
67-78: LGTM! Model configuration with clear override documentation.The model configuration appropriately sets placeholder values with clear documentation that these are overridden by Nemo-Skills, preventing confusion about which parameters take precedence.
44-47: The tool bundle paths are legitimate, documented SWE-agent tools. They are standard upstream bundles (tools/registry,tools/edit_anthropic,tools/review_on_submit_m) already used consistently across multiple configuration files in this repository. No verification is needed.Likely an incorrect or invalid review comment.
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
This PR adds two changes to support evaluation on SWE-bench Multilingual:
jqas a dependency for OpenHands. This is required for some SWE-bench Multilingual containers where it is not installed by default.These changes should not affect standard evaluation on SWE-bench Verified.
The Slurm tests for SWE-bench pass.
Summary by CodeRabbit
Release Notes
New Features
Infrastructure
✏️ Tip: You can customize this high-level summary in your review settings.