Skip to content

Container-based Slurm cluster with support for running on multiple ssh-accessible computers. Currently it is based on podman, systemd, norouter and sshocker (sshfs).

License

Notifications You must be signed in to change notification settings

eriksjolund/slurm-container-cluster

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Status: It seems to work, but a systemctl --user restart might be needed at startup time (see Troubleshooting). Running Nextflow on top of slurm-container-cluster also seems to work. (See the Nextflow pipeline example where slurm-container-cluster runs a two-node Slurm cluster on a laptop and a desktop).

slurm-container-cluster

Run a Slurm cluster in containers as a non-root user on multiple hosts, by making use of

  • podman for running containers. (Replacing podman with docker might also work but it is untested)
  • norouter for communication
  • sshocker for sharing a local folder to the remote computers (reverse sshfs)
  • slurm-docker-cluster. The slurm-container-cluster project reuses the basic architecture of the slurm-docker-cluster project but introduces multi-host functionality with the help of norouter and sshocker. Another difference is that slurm-container-cluster uses Systemd instead of Docker Compose.

Each Slurm software component slurmd, slurmdbd, slurmctld and mysql runs in a separate container. Multiple slurmd containers may be used. The slurmd containers act as "compute nodes" in Slurm so it makes sense to have a number of them. If you have ssh access to remote computers, you may run the slurmd compute node containers there too. See also the section Boot Fedora CoreOS in live mode from a USB stick) on how to boot up a computer in live mode to let it become a remote ssh-accessible computer.

Requirements local computer

  • podman version >= 2.1.0

(Installing podman might require root permissions, otherwise no root permissions are needed)

Requirements remote computers

Using remote computers is optional as everything can be run locally. If you want some remote computers to act as extra compute nodes they need to be accessible via ssh and need to have

  • podman version >= 2.1.0
  • sshfs

installed.

(Installing sshfs and podman might require root permissions, otherwise no root permissions are needed)

A tip: The Linux distribution Fedora CoreOS comes with both podman and sshfs pre-installed.

Introduction

Systemd service Description
[email protected] Template unit file for Slurm compute nodes running slurmd in the container localhost/slurm
slurm-create-datadir.service Creates some empty directories under ~/.config/slurm-podman/ that will be used by the other services
slurm-install-norouter.service Install the executable norouter to ~/.config/slurm-podman/install-norouter/norouter
slurm-install-sshocker.service Install the executable sshocker to ~/.config/slurm-podman/install-sshocker/sshocker
slurm-mysql.service Runs mysqld in the container localhost/mysql-with-norouter
slurm-slurmctld.service Runs slurmctld in the container localhost/slurm-with-norouter
slurm-slurmdbd.service Runs slurmdbd in the container localhost/slurm-with-norouter

Installation (no root permission required)

Prepare the installation files

  1. Clone this Git repo
$ git clone URL
  1. cd into the Git repo directory
$ cd slurm-container-cluster
  1. Build or pull the container images

Build the container images:

podman build -t slurm-container-cluster .
podman build -t mysql-with-norouter container/mysql-with-norouter/
podman image tag localhost/slurm-container-cluster localhost/slurm-with-norouter

or pull the container images:

podman pull docker.io/eriksjolund/slurm-container-cluster:podman-v2.1.1-slurm-slurm-20-11-2-1-norouter-v0.6.1
podman pull docker.io/eriksjolund/mysql-with-norouter:mysql-5.7-norouter-v0.6.1
podman image tag docker.io/eriksjolund/slurm-container-cluster:podman-v2.1.1-slurm-slurm-20-11-2-1-norouter-v0.6.1 localhost/slurm-container-cluster
podman image tag docker.io/eriksjolund/mysql-with-norouter:mysql-5.7-norouter-v0.6.1 localhost/mysql-with-norouter
podman image tag localhost/slurm-container-cluster localhost/slurm-with-norouter

(the identifiers localhost/slurm-with-norouter and localhost/mysql-with-norouter are used in the systemd service files)

  1. Create an empty directory
mkdir ~/installation_files
installation_files_dir=~/installation_files

(The variable is just used to simplify the instructions in this README.md)

bash prepare-installation-files.sh $installation_files_dir

Add extra container images to the installation files. These container images can be run by podman in your sbatch scripts.

podman pull docker.io/library/alpine:3.12.1
bash add-extra-containerimage.sh $installation_files_dir docker.io/library/alpine:3.12.1

Adjust SLURM configuration

Before running the scripts local-install.sh and remote-install.sh you might want to modify the configuration file $installation_files_dir/slurm/slurm.conf. (The default $installation_files_dir/slurm/slurm.conf defines the cluster as having the compute nodes c1 and c2)

Install on local computer

If you want to run any of the slurm-related containers on the local computer, then

  1. In the git repo directory run
bash ./local-install.sh $installation_files_dir

The script local-install.sh should only modify files and directories under these directories

  • ~/.config/slurm-container-cluster (e.g. mysql datadir, Slurm shared jobdir, log files, sshocker exectutable and norouter executable)
  • ~/.local/share/containers/ (the default directory where Podman stores its images and containers)
  • ~/.config/systemd/user (installing all the services slurm-*.service)

Install on remote computers

  1. For each remote computer, run bash ./remote-install.sh $installation_files_dir remoteuser@remotehost on the local computer. It is expected that SSH keys have been set up so that ssh remoteuser@remotehost succeeds without having to type any password.
bash ./remote-install.sh $installation_files_dir remoteuser@remotehost

Start mysqld, slurmdbd and slurmctld

On the computer that you would like to have mysqld, slurmdbd and slurmctld running (i.e. most probably the local computer), run

systemctl --user enable --now slurm-mysql.service slurm-slurmdbd.service slurm-slurmctld.service

(Advanced tip: If your local computer is not running Linux, you might be able to use one of the remote computers instead and only use the local computer for running sshocker and norouter. This is currently untested.)

Start the compute node containers

The default $installation_files_dir/slurm/slurm.conf defines the cluster as having the compute nodes c1 and c2.

To start the compute node c1 on localhost, run

systemctl --user enable --now [email protected]

To start the compute node c2, run

systemctl --user enable --now [email protected]

They can both be running on the same computer but also on different computers. Run the command on the computer where you would like to have the Slurm computenode running.

Configure and start norouter

In case you have

  • mysqld, slurmdbd, slurmctld and c1 running on localhost
  • and c2 running on a remote computer accessible with [email protected]

you could just copy

  1. cp ./norouter.yaml ~

  2. start norouter with norouter ~/norouter.yaml

otherwise you need to modify the file ~/norouter.yaml to match your setup.

Start sshocker to share ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared with remote computers

sshocker is used for having a local directory accessible on the remote computers.

Assuming the remote computer has the IP address 192.0.2.10 and the user is remoteuser. (Using a hostname instead of IP address is also possible). To make it easier to copy-paste from this documentation, let us set two shell variables

user=remoteuser
host=192.0.2.10

Share the local directory ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared

~/.config/install-sshocker/sshocker -v ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared:/home/$user/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared $user@$host

(The command is not returning)

Now both the local ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared and the remote ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared should contain the same files.

If you have other remote computers, you need to run sshocker commands for them as well.

Register the cluster

Register the cluster

podman exec -it slurmctld bash -c "sacctmgr --immediate add cluster name=linux"

Show cluster status

podman exec -it slurmctld bash -c "sinfo"

Run compute jobs

Create a shell script in the directory ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared

vim ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/test.sh

with this content

#!/bin/sh

echo -n "hostname : "
hostname
sleep 10

and make it executable

chmod 755 ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/test.sh

Submit a compute job

podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"

Example session:

user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && ls -l test.sh"
-rwxr-xr-x 1 root root 53 Nov 14 13:42 test.sh
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && cat test.sh"
#!/bin/sh

echo -n "hostname : "
hostname
sleep 10

user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 24
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 25
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 26
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 27
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 28
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 29
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 30
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 31
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && squeue"
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                26    normal  test.sh     root PD       0:00      1 (Resources)
                27    normal  test.sh     root PD       0:00      1 (Priority)
                28    normal  test.sh     root PD       0:00      1 (Priority)
                29    normal  test.sh     root PD       0:00      1 (Priority)
                30    normal  test.sh     root PD       0:00      1 (Priority)
                31    normal  test.sh     root PD       0:00      1 (Priority)
                24    normal  test.sh     root  R       0:08      1 c1
                25    normal  test.sh     root  R       0:08      1 c2
user@laptop:~$ 

When the jobs have finished, run

user@laptop:~$ ls -l ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/slurm-*.out 
slurm-24.out
slurm-25.out
slurm-26.out
slurm-27.out
slurm-28.out
slurm-29.out
slurm-30.out
slurm-31.out
user@laptop:~$ cat _~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/slurm-*.out
hostname : c1
hostname : c2
hostname : c1
hostname : c2
hostname : c1
hostname : c1
hostname : c2
hostname : c1
user@laptop:~$

Here is an example of how to to run a container with podman. The container docker.io/library/alpine:3.12.1 was previously added to the installation files with the script add-extra-containerimage.sh)

user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && cat podman-example.sh"
#!/bin/sh
podman run --user 0 --cgroups disabled --runtime crun --volume /data:/data:rw --events-backend=file --rm docker.io/library/alpine:3.12.1 cat /etc/os-release

user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./podman-example.sh"
Submitted batch job 32

When the job has finished, run

user@laptop:~$ ls -l ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/slurm-32.out
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.12.1
PRETTY_NAME="Alpine Linux v3.12"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"

Inspecting log files to debug errors

Interesting logs can be seen by running

podman logs c1
podman logs slurmdbd
podman logs slurmctld
podman logs mysql

(The container must still be running in order for the podman logs command to succeed).

Troubleshooting

Norouter warnings

At startup time there might be a few warnings for just a short while:

me@laptop:~$ ~/.config/slurm-container-cluster/install-norouter/norouter ~/norouter.yaml
laptop: INFO[0000] Ready: 127.0.29.100
laptop: INFO[0000] Ready: 127.0.29.3
laptop: INFO[0000] Ready: 127.0.30.1
laptop: INFO[0000] Ready: 127.0.29.2
laptop: INFO[0000] Ready: 127.0.30.2
laptop: WARN[0002] stderr[slurmctld-norouter(127.0.29.3)]: slurmctld: time="2020-12-05T09:48:29Z" level=error msg="failed to dial to \"127.0.0.1:7817\" (\"tcp\")" error="dial tcp 127.0.0.1:7817: connect: connection refused"
laptop: WARN[0002] stderr[slurmctld-norouter(127.0.29.3)]: slurmctld: time="2020-12-05T09:48:29Z" level=error msg="failed to dial to \"127.0.0.1:7817\" (\"tcp\")" error="dial tcp 127.0.0.1:7817: connect: connection refused"
laptop: WARN[0003] stderr[slurmctld-norouter(127.0.29.3)]: slurmctld: time="2020-12-05T09:48:30Z" level=error msg="failed to dial to \"127.0.0.1:7817\" (\"tcp\")" error="dial tcp 127.0.0.1:7817: connect: connection refused"

slurm-container-cluster seems to work though, so they can probably be ignored.

But the warning laptop: WARN[0004] error while handling L3 packet error="write |1: broken pipe" seems to be more severe.

laptop: WARN[0003] stderr[slurmdbd(127.0.29.2)]: d6ade94bd628: time="2020-12-05T08:50:33Z" level=error msg="failed to dial to \"127.0.0.1:7819\" (\"tcp\")" error="dial tcp 127.0.0.1:7819: connect: connection refused"
laptop: WARN[0003] stderr[slurmdbd(127.0.29.2)]: d6ade94bd628: time="2020-12-05T08:50:33Z" level=error msg="failed to dial to \"127.0.0.1:7819\" (\"tcp\")" error="dial tcp 127.0.0.1:7819: connect: connection refused"
laptop: WARN[0004] error while handling L3 packet                error="write |1: broken pipe"
laptop: WARN[0004] error while handling L3 packet                error="write |1: broken pipe"
laptop: WARN[0004] error while handling L3 packet                error="write |1: broken pipe"

For those warnings, it seems that a restart of all the slurm-* services is needed.

Restarting the services

If you experience problems, try this

  1. Stop norouter (by pressing Ctrl-c)

  2. Restart all services

systemctl --user restart slurm-mysql slurm-slurmdbd slurm-slurmctld  slurm-create-datadir
systemctl --user restart  [email protected]
systemctl --user restart  [email protected]

(Note: the restart command should be run on the computer where the service was once enabled).

  1. Run podman logs

For the different containers

  • mysql
  • slurmdbd
  • slurmctld
  • c1
  • c2

run podman logs containername, for instance

$ podman logs c1
---> Starting the MUNGE Authentication service (munged) ...
-- Waiting for norouter to start. Sleeping 2 seconds ...
-- Waiting for norouter to start. Sleeping 2 seconds ...
-- Waiting for norouter to start. Sleeping 2 seconds ...
-- Waiting for norouter to start. Sleeping 2 seconds ...

Except for mysql, the containers should be all waiting for norouter to start.

  1. Start norouter

Using Fedora CoreOS to run compute node containers

The Linux distribution Fedora CoreOS comes with both podman and sshfs pre-installed. If you have some extra computers that are not in use, you could boot them up with a Fedora CoreOS USB stick to get extra Slurm compute nodes.

Boot Fedora CoreOS in live mode from a USB stick

Create a customized Fedora CoreOS iso containing your public SSH key

Assuming your

  • public ssh key is located in the file ~/.ssh/id_rsa.pub
  • the command podman is installed
  • the architecture for the iso is x86_64
  • your preferred choice of username is myuser

then run this command

bash create-fcos-iso-with-ssh-key.sh podman x86_64 stable ~/.ssh/id_rsa.pub myuser

to create the customized iso file. The path is written to stdout. The bash script and more documentation is available here

https://github.com/eriksjolund/create-fcos-iso-with-ssh-key

If you would like to have sudo permissions you need choose the username core.

About

Container-based Slurm cluster with support for running on multiple ssh-accessible computers. Currently it is based on podman, systemd, norouter and sshocker (sshfs).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 74.1%
  • Dockerfile 25.9%