-
Notifications
You must be signed in to change notification settings - Fork 60
Description
Every time NeMo gym starts a server subprocess, it runs uv venv + uv pip install into a per-server .venv directory.
RunHelper.start() launches each server subprocess via Popen, shelling out a command constructed by _setup_env_command.
This creates a .venv inside each server's source directory (e.g. resources_servers/math_with_judge/.venv).
The Gym root __init__.py hardcodes UV_CACHE_DIR to a relative path at Gym/cache/uv.
Each server subprocess inherits this environment and independently runs uv venv + uv pip install before starting its FastAPI entrypoint.
This incurs the following problems:
Startup Latency: Each server subprocess runs the uv setup as a blocking call before launching its FastAPI server. Subprocesses are launched in parallel, so the wall time is bounded by the slowest caller.
Race Conditions: Multiple subprocesses with the same server type share the same .venv directory. Concurrent uv pip install operations can write to the same site-packages/ directory, which can corrupt the venv.
Inode exhaustion and portability: Each server type has its own dependencies listed, so many packages are duplicated across venvs because the default UV_LINK_MODE is not set.
The .venv directories live inside the source tree. When code is mounted into a container, the .venv is created on the filesystem mount, not inside the container’s writable layer. In this scenario, venv creation goes through the NFS I/O, and created inodes count against the user’s quota.
If users need to reproduce an experiment, copying the source tree for reproducibility includes all the created .venv directories.
If the source tree is later mounted at a different path, the absolute paths baked into .venv/bin/python wrapper scripts break.
Proposal:
Follow what NeMo-RL does: prefetch venvs, run in container build
- Centralize venv location via an environment variable
NEMO_GYM_VENV_DIR:
Check for this environment variable when setting up the environment command. When set, venvs are created at $NEMO_GYM_VENV_DIR/{server_name}. When unset, we use the original per-directory .venv behavior.
- Set
UV_LINK_MODEto share packages via UV cache
Each venv’s site-packages/ entries become symlinks pointing to the cache directory
- Pre-build venvs via
prefetch_gym_envs.py
Add a script to discover all server directories with dependencies listed, and create a venv for each. This script should acept the venv directory, filtering to exclude servers, and the ability to delete and recreate stale venvs.