Skip to content

Commit dbfad3d

Browse files
authored
Fixes to support SWE-bench Multilingual (#1064)
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
1 parent df9fbd9 commit dbfad3d

2 files changed

Lines changed: 88 additions & 3 deletions

File tree

nemo_skills/inference/eval/swebench.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,10 @@ def __init__(self, cfg: SweBenchGenerationConfig):
246246
"mkdir -p /root/tmux && "
247247
"curl -Lf https://github.com/nelsonenzo/tmux-appimage/releases/download/3.5a/tmux.appimage -o /root/tmux/tmux && "
248248
"chmod 777 /root/tmux/tmux && "
249+
# download jq
250+
"mkdir -p /root/jq && "
251+
"curl -Lf https://github.com/jqlang/jq/releases/download/jq-1.8.1/jq-linux-amd64 -o /root/jq/jq && "
252+
"chmod 777 /root/jq/jq && "
249253
# clone the openhands repo
250254
"rm -rf /root/OpenHands && "
251255
f"git clone {self.cfg.agent_framework_repo} /root/OpenHands && "
@@ -531,13 +535,16 @@ async def _run_openhands(self, data_point, api_base):
531535
" echo 'This is because OpenHands DELETES EVERYTHING in the /workspace folder if it exists.' && "
532536
" exit 1; "
533537
"fi && "
534-
# copy installed repo, uv & tmux dirs from /root_mount
538+
# copy installed repo, uv, tmux & jq dirs from /root_mount
535539
"cp -r /root_mount/OpenHands /root && "
536540
"cp -r /root_mount/uv /root && "
537541
"cp -r /root_mount/tmux /root && "
542+
"cp -r /root_mount/jq /root && "
538543
"cd /root/OpenHands && "
539-
# add poetry & tmux to PATH
540-
"export PATH=/root/uv/tool-bin:/root/tmux:$PATH && "
544+
# make soft links to poetry, tmux & jq in /usr/local/bin, so OpenHands can run them from the command line
545+
"ln -sf /root/uv/tool-bin/poetry /usr/local/bin/poetry && "
546+
"ln -sf /root/tmux/tmux /usr/local/bin/tmux && "
547+
"ln -sf /root/jq/jq /usr/local/bin/jq && "
541548
# enable tmux appimage to run without fusermount
542549
# https://docs.appimage.org/user-guide/troubleshooting/fuse.html#extract-and-run-type-2-appimages
543550
"export APPIMAGE_EXTRACT_AND_RUN=1 && "
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Based on the default config from the SWE-agent repo:
2+
# https://github.com/SWE-agent/SWE-agent/blob/1375ec4fa69d300b432b9ca61d6b0e5d7259131c/config/default.yaml
3+
# but mentions of Python are removed to make the prompt language-agnostic.
4+
5+
# note that this doesn't use nemo-skills prompt logic and instead is passed directly to swe-agent
6+
7+
agent:
8+
templates:
9+
system_template: |-
10+
You are a helpful assistant that can interact with a computer to solve tasks.
11+
instance_template: |-
12+
<uploaded_files>
13+
{{working_dir}}
14+
</uploaded_files>
15+
I've uploaded a code repository in the directory {{working_dir}}. Consider the following PR description:
16+
17+
<pr_description>
18+
{{problem_statement}}
19+
</pr_description>
20+
21+
Can you help me implement the necessary changes to the repository so that the requirements specified in the <pr_description> are met?
22+
I've already taken care of all changes to any of the test files described in the <pr_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
23+
Your task is to make the minimal changes to non-tests files in the {{working_dir}} directory to ensure the <pr_description> is satisfied.
24+
Follow these steps to resolve the issue:
25+
1. As a first step, it might be a good idea to find and read code relevant to the <pr_description>
26+
2. Create a script to reproduce the error and execute it using the bash tool, to confirm the error
27+
3. Edit the sourcecode of the repo to resolve the issue
28+
4. Rerun your reproduce script and confirm that the error is fixed!
29+
5. Think about edgecases and make sure your fix handles them as well
30+
Your thinking should be thorough and so it's fine if it's very long.
31+
next_step_template: |-
32+
OBSERVATION:
33+
{{observation}}
34+
next_step_no_output_template: |-
35+
Your command ran successfully and did not produce any output.
36+
tools:
37+
env_variables:
38+
PAGER: cat
39+
MANPAGER: cat
40+
LESS: -R
41+
PIP_PROGRESS_BAR: 'off'
42+
TQDM_DISABLE: '1'
43+
GIT_PAGER: cat
44+
bundles:
45+
- path: tools/registry
46+
- path: tools/edit_anthropic
47+
- path: tools/review_on_submit_m
48+
registry_variables:
49+
USE_FILEMAP: 'true'
50+
SUBMIT_REVIEW_MESSAGES:
51+
- |
52+
Thank you for your work on this issue. Please carefully follow the steps below to help review your changes.
53+
54+
1. If you made any changes to your code after running the reproduction script, please run the reproduction script again.
55+
If the reproduction script is failing, please revisit your changes and make sure they are correct.
56+
If you have already removed your reproduction script, please ignore this step.
57+
2. Remove your reproduction script (if you haven't done so already).
58+
3. If you have modified any TEST files, please revert them to the state they had before you started fixing the issue.
59+
You can do this with `git checkout -- /path/to/test/file`. Use below <diff> to find the files you need to revert.
60+
4. Run the submit command again to confirm.
61+
62+
Here is a list of all of your changes:
63+
64+
<diff>
65+
{{diff}}
66+
</diff>
67+
enable_bash_tool: true
68+
parse_function:
69+
type: function_calling
70+
history_processors: []
71+
model:
72+
# The following parameters are overridden by Nemo-Skills:
73+
# name, api_base, temperature, top_p, completion_kwargs, per_instance_call_limit.
74+
# Specifying them here will have no effect! Use Nemo-Skills options instead.
75+
per_instance_cost_limit: 0
76+
total_cost_limit: 0
77+
max_input_tokens: 0
78+
max_output_tokens: 0

0 commit comments

Comments
 (0)