-
Notifications
You must be signed in to change notification settings - Fork 248
Description
Describe the bug
This setup tutorial contains steps that are almost impossible to reproduce in general or are really brittle.
- This Tip section can very easily fail depending on the cluster setup:
mkdir -p "$(dirname "$CONTAINER_IMAGE_PATH")"
enroot import -o "$CONTAINER_IMAGE_PATH" "docker://${CONTAINER_IMAGE_PATH}"
# Swap to local container path
CONTAINER_IMAGE_PATH=./$CONTAINER_IMAGE_PATH
For me enroot used $HOME/.cache as the directory for container layers cache. The container image is 28G, while NFS that is mounted as user's home is only 10G on my cluster. This leads to out of disk space error.
Even if we use some other directory for cache, we still might fail in extracting stage, because of restrictions on number of threads available for a user on a given cluster. The solution is to run this on a compute node, while mounting storage.
- Using
MOUNTS="$PWD:$PWD"is in opposition to the containerisation philosophy. The code should already be present inside the container. In addition mounting home as in the setup tutorial led to a runtime error:
ERROR unit/environments/test_nemo_gym.py::test_nemo_gym_sanity - ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::NemoGym.__init__() (pid=3905017, ip=10.65.18.29, actor_id=9d4200400d583b7144594fd801000000, repr=<nemo_rl.environments.nemo_gym.NemoGym object at 0x155555151850>)
- This line:
echo "hf_token: {your HF token}" >> env.yamlis dangerous! If you follow the instructions from the setup tutorial this file is saved in the external FS and is a security issue. Depending on access patterns on cluster's filesystem this token can be visible to anyone.
Steps/Code to reproduce bug
Follow this setup tutorial
Expected behavior
The tutorial should be general, work on slurm cluster regardless of the setup and should pose security issues.
Additional context
This was discovered during an effort to run an example script from the docs.
@bxyu-nvidia for vis