This directory contains small utilities for running OpenHands and managing datasets/images. Each section starts with what the script is for, then shows usage and parameters.
Launches the OpenHands async server (FastAPI) to process evaluation requests. Manages lifecycle, worker pools, LLM endpoints, and timeouts.
Usage:
export LOG_LEVEL=ERROR
export DEBUG=False
python start_server.py [OPTIONS]Parameters:
--max-init-workersMaximum initialization workers (default: 64)--max-run-workersMaximum run workers (default: 64)--timeoutGlobal job timeout in seconds (default: 300)--hostBind host (default: 0.0.0.0)--portBind port (default: 8006)--allow-skip-evalSkip eval if git_patch is None/empty (default: True)--reward-server-ipReward server IP for math/code/reasoning gym (default: [])
Notes:
- Endpoints include
/start,/stop,/status,/add_llm_server,/clear_llm_server,/process. - Add LLM addresses before sending
/processrequests.
Builds and manages Singularity images required by SWE‑Bench/SWE‑Bench multimodal. Converts Docker images referenced in parquet to .sif files.
Usage:
# Optional: set where images are stored
export OH_RUNTIME_SINGULARITY_IMAGE_REPO=/path/to/singularity_images
python pull_swe_images.py [OPTIONS]Required:
--parquet-filePath to a SWE‑Bench parquet file
Optional:
--prefixOverride Docker image namespace prefix--dest-dirDirectory to store.sifimages (default:<workspace>/singularity_images)--temp-baseBase directory for temporary build folders (default:<dest>/temp_dif)--start-index1-based index of first image to process (default: 1)--end-index1-based index of last image (inclusive)--log-nameWrite combined logs underimages_process/log/
Tip:
- If the shared image repo already has images, copy them locally for faster access.
Bulk evaluation on SWE‑Bench datasets. Starts the async server, sends requests with concurrency, balances across multiple LLM endpoints, and writes results to .jsonl.
Usage:
python run_swe.py [OPTIONS]Common options:
--dataset-pathPath to SWE‑Bench parquet (train/val)--outputOutput results file (.jsonl) (default:eval_results.jsonl)--llm-addressesOne or more LLM base URLs (e.g.,http://10.0.0.2:8000/v1 ...)--hostHost for the OpenHands server (default:localhost)--portPort for the server (default:8006)--concurrencyMax concurrent eval requests (default:32)--num-instancesLimit number of instances (for quick testing)--sampling-paramsJSON string merged into default sampling params
Notes:
- Requests are distributed round‑robin across provided
--llm-addresses. - Adjust
--concurrencybased on LLM capacity to avoid overload.
Example:
python run_swe.py \
--dataset-path /path/to/train.parquet \
--output swe_results.jsonl \
--llm-addresses http://10.0.0.2:8000/v1 http://10.0.0.3:8000/v1 \
--concurrency 64 \
--sampling-params '{"temperature": 0.3, "top_p": 0.95}'Merges and standardizes parquet datasets (SWE‑Bench, SWE‑Bench multimodal, R2E‑Gym) into a single merged.parquet. Automatically detects dataset type and normalizes fields.
Usage:
python prepare_data.py --data-dir /path/to/parquet_directoryParameters:
--data-dirDirectory containing one or more parquet files (recursive)
Output:
- Writes
merged.parquetunder the same--data-dir.