Yascheduler is a simple job scheduler designed for submitting scientific calculations and copying back the results from the computing clouds.
Currently it supports several scientific simulation codes in chemistry
and solid state physics.
Any other scientific simulation code can be supported via the declarative
control template system (see yascheduler.conf settings file).
There is an example dummy C++ code with its configuration template.
Use pip and PyPI: pip install yascheduler.
By default, no cloud connectors are installed. To install the appropriate connector, use one of the commands:
- for Microsoft Azure:
pip install yascheduler[azure] - for Hetzner Cloud:
pip install yascheduler[hetzner] - for UpCloud:
pip install yascheduler[upcloud]
The last updates and bugfixes can be obtained cloning the repository:
git clone https://github.com/tilde-lab/yascheduler.git
pip install yascheduler/The installation procedure creates the configuration file located at
/etc/yascheduler/yascheduler.conf.
The file contains credentials for Postgres database access, used directories,
cloud providers and scientific simulation codes (called engines).
Please check and amend this file with the correct credentials. The database
and the system service should then be initialized with yainit script.
from yascheduler import Yascheduler
yac = Yascheduler()
label = "test assignment"
engine = "pcrystal"
struct_input = str(...) # simulation control file: crystal structure
setup_input = str(...) # simulation control file: main setup, can include struct_input
result = yac.queue_submit_task(
label, {"fort.34": struct_input, "INPUT": setup_input}, engine
)
print(result)Or run directly in console with yascheduler (use a key -l DEBUG to change the log level).
Supervisor config reads e.g.:
[program:scheduler]
command=/usr/local/bin/yascheduler
user=root
autostart=true
autorestart=true
stderr_logfile=/data/yascheduler.log
stdout_logfile=/data/yascheduler.log
File paths can be set using the environment variables:
-
YASCHEDULER_CONF_PATHConfiguration file.
Default:
/etc/yascheduler/yascheduler.conf -
YASCHEDULER_LOG_PATHLog file path.
Default:
/var/log/yascheduler.log -
YASCHEDULER_PID_PATHPID file.
Default:
/var/run/yascheduler.pid
Connection to a PostgreSQL database.
-
userThe username to connect to the PostgreSQL server with.
-
passwordThe user password to connect to the server with. This parameter is optional
-
hostThe hostname of the PostgreSQL server to connect with.
-
portThe TCP/IP port of the PostgreSQL server instance.
Default:
5432 -
databaseThe name of the database instance to connect with.
Default: Same as
user
-
data_dirPath to root directory of local data files. Can be relative to the current working directory.
Default:
./data(but it's always a good idea to set up explicitly!)Example:
/srv/yadata -
tasks_dirPath to directory with tasks results.
Default:
tasksunderdata_dirExample:
%(data_dir)s/tasks -
keys_dirPath to directory with SSH keys. Make sure it only contains the private keys.
Default:
keysunderdata_dirExample:
%(data_dir)s/keys -
engines_dirPath to directory with engines repository.
Default:
enginesunderdata_dirExample:
%(data_dir)s/engines -
webhook_reqs_limitMaximum number of in-flight webhook http requests.
Default: 5
-
conn_machine_limitMaximum number of concurrent SSH connection's
connectrequests.Default: 10
-
conn_machine_pendingMaximum number of pending SSH connection's
connectrequests.Default: 10
-
allocate_limitMaximum number of concurrent task or node allocation requests.
Default: 20
-
allocate_pendingMaximum number of pending task or node allocation requests.
Default: 1
-
consume_limitMaximum number of concurrent task's results downloads.
Default: 20
-
consume_pendingMaximum number of pending task's results downloads.
Default: 1
-
deallocate_limitMaximum number of concurrent node deallocation requests.
Default: 5
-
deallocate_pendingMaximum number of pending node deallocation requests.
Default: 1
-
data_dirPath to root directory of data files on remote node. Can be relative to the remote current working directory (usually
$HOME).Default:
./dataExample:
/src/yadata -
tasks_dirPath to directory with tasks results on remote node.
Default:
tasksunderdata_dirExample:
%(data_dir)s/tasks -
engines_dirPath to directory with engines on remote node.
Default:
enginesunderdata_dirExample:
%(data_dir)s/engines -
userDefault ssh username.
Default:
root -
jump_userUsername of default SSH jump host (if used).
-
jump_hostHost of default SSH jump host (if used).
All cloud providers settings are set in the [cloud] group.
Each provider has its own settings prefix.
These settings are common to all the providers:
-
*_max_nodesThe maximum number of nodes for a given provider. The provider is not used if the value is less than 1.
-
*_userPer provider override of
remote.user. -
*_priorityPer provider priority of node allocation. Sorted in descending order, so the cloud with the highest value is the first.
-
*_idle_tolerancePer provider idle tolerance (in seconds) for deallocation of nodes.
Default: different for providers, starting from 120 seconds.
-
*_jump_userUsername of this cloud SSH jump host (if used).
-
*_jump_hostHost of this cloud SSH jump host (if used).
Settings prefix is hetzner.
-
hetzner_tokenAPI token with Read & Write permissions for the project.
-
hetzner_server_typeServer type (size).
Default:
cx52 -
hetzner_locationLocation name.
-
hetzner_image_nameImage name for new nodes.
Default:
debian-11
Azure Cloud should be pre-configured for yascheduler. See Cloud Providers.
Settings prefix is az.
-
az_tenant_idTenant ID of Azure Active Directory.
-
az_client_idApplication ID.
-
az_client_secretClient Secret value from the Application Registration.
-
az_subscription_idSubscription ID
-
az_resource_groupResource Group name.
Default:
yascheduler-rg -
az_userSSH username.
rootis not supported. -
az_locationDefault location for resources.
Default:
westeurope -
az_vnetVirtual network name.
Default:
yascheduler-vnet -
az_subnetSubnet name.
Default:
yascheduler-subnet -
az_nsgNetwork security group name.
Default:
yascheduler-nsg -
az_vm_imageOS image name.
Default:
Debian -
az_vm_sizeMachine size.
Default:
Standard_B1s
Settings prefix is upcloud.
-
upcloud_loginUsername.
-
upcloud_passwordPassword.
Supported engines should be defined in the section(s) [engine.name].
The name is alphanumeric string to represent the real engine name.
Once set, it cannot be changed later.
-
platformsList of supported platform, separated by space or newline.
Default:
debian-10Example:mY-cOoL-OS another-cool-os -
platform_packagesA list of required packages, separated by space or newline, which will be installed by the system package manager.
Default: [] Example:
openmpi-bin wget -
deploy_local_filesA list of filenames, separated by space or newline, which will be copied from local
%(engines_dir)s/%(engine_name)sto remote%(engines_dir)s/%(engine_name)s. Conflicts withdeploy_local_archiveanddeploy_remote_archive.Example:
dummyengine -
deploy_local_archiveA name of the local archive (
.tar.gz) which will be copied from local%(engines_dir)s/%(engine_name)sto the remote machine and then unarchived to the%(engines_dir)s/%(engine_name)s. Conflicts withdeploy_local_archiveanddeploy_remote_archive.Example:
dummyengine.tar.gz -
deploy_remote_archiveThe url to the engine arhive (
.tar.gz) which will be downloaded to the remote machine and then unarchived to the%(engines_dir)s/%(engine_name)s. Conflicts withdeploy_local_archiveanddeploy_remote_archive.Example:
https://example.org/dummyengine.tar.gz -
spawnThis command is used by the scheduler to initiate calculations.
cp {task_path}/INPUT OUTPUT && mpirun -np {ncpus} --allow-run-as-root \ -wd {task_path} {engine_path}/Pcrystal >> OUTPUT 2>&1Example:
{engine_path}/gulp < INPUT > OUTPUT -
check_pnameProcess name used to check that the task is still running. Conflicts with
check_cmd.Example:
dummyengine -
check_cmdCommand used to check that the task is still running. Conflicts with
check_pname. See alsocheck_cmd_code.Example:
ps ax -ocomm= | grep -q dummyengine -
check_cmd_codeExpected exit code of command from
check_cmd. If code matches than task is running.Default:
0 -
sleep_intervalInterval in seconds between the task checks. Set to a higher value if you are expecting long running jobs.
Default:
10 -
input_filesA list of task input file names, separated by a space or new line, that will be copied to the remote directory of the task before it is started. The first input is considered as the main input.
Example:
INPUT sibling.file -
output_filesA list of task output file names, separated by a space or new line, that will be copied from the remote directory of the task after it is finished.
Example:
INPUT OUTPUT
See the detailed instructions for the MPDS-AiiDA-CRYSTAL workflows as well as the ansible-mpds repository. In essence:
ssh aiidauser@localhost # important
reentry scan
verdi computer setup
verdi computer test $COMPUTER
verdi code setup