We utilize a client-server architecture for evaluation. This repository serves as the client, responsible for managing the server, sending requests, and executing evaluations upon receiving actions. Before proceeding, please ensure that the server-side environment and code are properly set up.
Please refer to READMEEnv.md.
This module provides a unified evaluation script based on torch.distributed (DDP), designed to simplify the multi-GPU/multi-task evaluation workflow in Client-Server mode.
- Unified Architecture: Uses
torchrunto launch a single Python script. - Automatic Task Distribution: Automatically splits task lists leveraging DDP's
rankandworld_size, removing the need for manual assignment. - Robust Process Management: Utilizes Python Context Manager to manage the Server lifecycle, ensuring that the Server and its subprocesses are cleanly terminated regardless of normal completion or abnormal exit.
- Decoupled Design: All paths (Server script, model, task descriptions) are passed via arguments rather than being hardcoded.
- Resumable Execution: Automatically skips tasks that already have existing logs.
- PyTorch (for
torch.distributed) - Existing
vidarServer script - Existing
script/eval_policy.pyClient script
conda activate RoboTwin-hb
# eval with vidarc
bash run_eval_ddp_causal.sh
# eval with vidar
bash run_eval_ddp.sh | Argument | Description | Default |
|---|---|---|
--server_script |
Path to the Server startup script (Required) | - |
--model |
Path to the model (Required) | - |
--idm |
Path to the Inverse Dynamics Model | - |
--prefix |
Prefix for the output directory (Required) | "debug" |
--task_dir |
Directory containing task description files | "./description/task_instruction" |
--server_cwd |
Working directory for the Server script | "../cosmos-predict2" |
--base_port |
Starting port number (Rank 0 uses base, Rank 1 uses base+1...) | 25400 |