Fixes to support SWE-bench Multilingual by ludwig-n · Pull Request #1064 · NVIDIA-NeMo/Skills

ludwig-n · 2025-12-02T18:10:01Z

This PR adds two changes to support evaluation on SWE-bench Multilingual:

Adds jq as a dependency for OpenHands. This is required for some SWE-bench Multilingual containers where it is not installed by default.
Adds a new config variant for SWE-agent where mentions of Python are removed from the prompt, which is more suitable for multilingual datasets.

These changes should not affect standard evaluation on SWE-bench Verified.

The Slurm tests for SWE-bench pass.

Summary by CodeRabbit

Release Notes

New Features
- Added multilingual configuration for SWE-agent enabling language-agnostic prompt handling with repository context and problem-solving guidance.
Infrastructure
- Integrated jq utility into the OpenHands runtime environment to support SWE-bench evaluation pipeline.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

wasiahmad · 2025-12-03T00:03:38Z

@ludwig-n

Can you add benchmark numbers for Qwen3 coder models? This will be in record.
Just for my understanding, is jq installed in each instance containers? (the child ones)
Did we make any change to the eval harness for multilingual support?
The prompt config is added only for SWE-agent, don't we need one for OH? Or, is it already language agnostic?

[Update] I found this (Kipok/SWE-bench#2) for 3. Good to keep a reference here for record.

coderabbitai · 2025-12-03T00:07:38Z

📝 Walkthrough

Walkthrough

The changes add jq binary support to the OpenHands/SWE-agent environment setup procedure and introduce a new multilingual configuration template for SWE-agent prompts with language-agnostic instruction guidance and tool configuration.

Changes

Cohort / File(s)	Summary
OpenHands/SWE-agent jq tooling setup `nemo_skills/inference/eval/swebench.py`	Adds jq binary installation, downloads the binary, sets permissions, creates a symlink in `/usr/local/bin`, and extends container setup to copy and link the jq directory, mirroring existing tmux/poetry handling.
Multilingual SWE-agent configuration `nemo_skills/prompt/config/eval/swe-bench/swe-agent/multilingual.yaml`	New configuration file defining language-agnostic SWE-agent prompt templates, instruction flow for minimal file changes, tool environment variables, function calling setup, and model configuration (cost/token limits set to zero for Nemo-Skills override).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Review jq integration in swebench.py follows established patterns (similar to tmux/poetry setup) but verify proper binary handling and permissions
Verify multilingual.yaml configuration is syntactically correct and aligns with SWE-agent requirements
Ensure jq additions don't interfere with existing OpenHands environment setup logic

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fixes to support SWE-bench Multilingual' accurately captures the main changes: adding jq dependency support and introducing a multilingual config variant for SWE-agent.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ludwig-n/swe-bench-multilingual

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48cbf77 and 779a655.

📒 Files selected for processing (2)

nemo_skills/inference/eval/swebench.py (2 hunks)
nemo_skills/prompt/config/eval/swe-bench/swe-agent/multilingual.yaml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (5)

nemo_skills/inference/eval/swebench.py (1)

538-547: LGTM! jq integration follows the established pattern.

The implementation for copying jq from /root_mount and creating symbolic links mirrors the approach used for tmux and poetry, ensuring consistency across the codebase.

nemo_skills/prompt/config/eval/swe-bench/swe-agent/multilingual.yaml (4)

1-6: Clear documentation of config origin and purpose.

The header comments appropriately document the base configuration source and the language-agnostic modifications, making the intent clear for future maintainers.

7-35: LGTM! Templates are language-agnostic as intended.

The system and instance templates successfully avoid Python-specific language while providing clear instructions. The step-by-step guidance in the instance template is well-structured and appropriate for multilingual code repositories.

67-78: LGTM! Model configuration with clear override documentation.

The model configuration appropriately sets placeholder values with clear documentation that these are overridden by Nemo-Skills, preventing confusion about which parameters take precedence.

44-47: The tool bundle paths are legitimate, documented SWE-agent tools. They are standard upstream bundles (tools/registry, tools/edit_anthropic, tools/review_on_submit_m) already used consistently across multiple configuration files in this repository. No verification is needed.

Likely an incorrect or invalid review comment.

nemo_skills/inference/eval/swebench.py

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

ludwig-n and others added 4 commits November 25, 2025 19:22

Install jq for OpenHands

ad45c70

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

Add language-agnostic prompt config for SWE-agent

cfed7b0

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

Merge branch 'main' into ludwig-n/openhands-download-jq

e033329

Merge branch 'main' into ludwig-n/swe-bench-multilingual

779a655

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

nemo_skills/inference/eval/swebench.py Show resolved Hide resolved

gwarmstrong approved these changes Dec 3, 2025

View reviewed changes

Merge branch 'main' into ludwig-n/swe-bench-multilingual

359ae8f

ludwig-n merged commit dbfad3d into main Dec 4, 2025
5 checks passed

ludwig-n deleted the ludwig-n/swe-bench-multilingual branch December 4, 2025 12:55

melllinia pushed a commit that referenced this pull request Dec 5, 2025

Fixes to support SWE-bench Multilingual (#1064)

fe5bad5

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

Jorjeous pushed a commit that referenced this pull request Dec 11, 2025

Fixes to support SWE-bench Multilingual (#1064)

3eaab3e

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Dec 12, 2025

Fixes to support SWE-bench Multilingual (#1064)

5ef3f05

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai bot mentioned this pull request Jan 30, 2026

SWE-bench: fix SWE-agent hanging, adjust expected scores #1202

Merged

wasiahmad pushed a commit that referenced this pull request Feb 4, 2026

Fixes to support SWE-bench Multilingual (#1064)

c376270

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

coderabbitai bot mentioned this pull request Feb 10, 2026

Support mini-swe-agent as agent harness #1212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fixes to support SWE-bench Multilingual#1064

Fixes to support SWE-bench Multilingual#1064
ludwig-n merged 5 commits intomainfrom
ludwig-n/swe-bench-multilingual

ludwig-n commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

wasiahmad commented Dec 3, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 3, 2025

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

ludwig-n commented Dec 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

wasiahmad commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Dec 3, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ludwig-n commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

wasiahmad commented Dec 3, 2025 •

edited

Loading