Parallel Execution
Example - Batch script
#!/bin/bash
#SBATCH --job-name parallel ## name that will show up in the queue
#SBATCH --output slurm-%j.out ## filename of the output; the %j is equal to jobID; default is slurm-[jobID].out
#SBATCH --ntasks=3 ## number of tasks (analyses) to run
#SBATCH --cpus-per-task=2 ## the number of threads allocated to each task
#SBATCH --mem-per-cpu=1G # memory per CPU core
#SBATCH --partition=normal ## the partitions to run in (comma seperated)
#SBATCH --time=0-00:10:00 ## time for analysis (day-hour:min:sec)
# Execute job steps
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 10; echo 'hello 1'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 20; echo 'hello 2'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 30; echo 'hello 3'" &
wait
The above script will start three job steps. The first step starts the Linux sleep command run for 10 seconds, and then print Hello 1. The second step starts the Linux sleep command run for 20 seconds, and then print Hello 2. The third step starts the Linux sleep command run for 30 seconds, and then print Hello 3. This script is supposed to print:
Hello 1
Hello 2
Hello 3
To check the statistics of the job, run the sacct
command.
sacct -j 7217 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30
Output
JobID Start End Elapsed ReqCPUS AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
7217 2022-09-27T23:07:40 2022-09-27T23:08:11 00:00:31 6 billing=6,cpu=6,mem=6G,node=1
7217.batch 2022-09-27T23:07:40 2022-09-27T23:08:11 00:00:31 6 cpu=6,mem=6G,node=1
7217.extern 2022-09-27T23:07:40 2022-09-27T23:08:11 00:00:31 6 billing=6,cpu=6,mem=6G,node=1
7217.0 2022-09-27T23:07:40 2022-09-27T23:07:51 00:00:11 2 cpu=2,mem=2G,node=1
7217.1 2022-09-27T23:07:40 2022-09-27T23:08:01 00:00:21 2 cpu=2,mem=2G,node=1
7217.2 2022-09-27T23:07:40 2022-09-27T23:08:11 00:00:31 2 cpu=2,mem=2G,node=1
Explanation
In the above example, there are 3 job steps and the statistics show that all the job steps (7217.0, 7217.1, 7217.2)
started executing at the same time 3:07:40
but finished at different times. This means that the job steps were executed simultaneously.
The ampersand (&) symbol at the end of every srun command is used to run commands simultaneously. It removes the blocking feature of the srun command which makes it interactive but non-blocking. It’s vital to use the wait command when using ampersand to run commands simultaneously. This is because it ensures that a given task doesn’t cancel itself due to the completion of another task or sibling tasks. In other words, without the wait command, task 0 would cancel itself, given task 1, or 2 completed successfully.
Note that the total number of tasks in the above job script is 3. Also, each job step will run only once (srun --ntasks=1
). The above script requested 2 CPUs for each task (#SBATCH --cpus-per-task=2
).
--cpus-per-task
is set at the srun
level to get the correct value. The enviromental variable SLURM_CPUS_PER_TASK
is the number of CPUs allocated to the batch step.
Note that --cpus-per-task
is set to 2. If you change it to an odd number, the three job steps in the above script won’t run in parallel. Two steps run in parallel and then you will get an error that looks like:
srun: Job 18701 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for job 18701
If you check the statistics of the job, you will find that 2 steps run in parallel, and once those steps finish execution, the third one started. This is because the multithreading is enabled by default and individual srun commands don’t share CPU cores. With Hyper-Threading, there are 2 threads per core. When you request 2 CPUs, it’s will give you 2 threads on the same core. On the other hand, if you request 3 CPUs SLURM will give you 2 cores, and the remaining 4 threads on the second core won’t be used.
To overcome this issue, you need to either make --cpus-per-task
even or disable multithreading. The following script shows how to disable multithreading when --cpus-per-task
is odd
#!/bin/bash
#SBATCH --job-name parallel ## name that will show up in the queue
#SBATCH --output slurm-%j.out ## filename of the output; the %j is equal to jobID; default is slurm-[jobID].out
#SBATCH --ntasks=3 ## number of tasks (analyses) to run
#SBATCH --cpus-per-task=1 ## the number of threads allocated to each task
#SBATCH --mem-per-cpu=1G # memory per CPU core
#SBATCH --partition=normal ## the partitions to run in (comma seperated)
#SBATCH --time=0-00:10:00 ## time for analysis (day-hour:min:sec)
#SBATCH --hint=nomultithread
# Execute job steps
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 10; echo 'hello 1'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 20; echo 'hello 2'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 30; echo 'hello 3'" &
wait
After submitting the job and the job is finished check the statistics of the job:
sacct -j 7217 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30
Output
JobID Start End Elapsed ReqCPUS AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
18704 2022-10-14T01:50:11 2022-10-14T01:50:42 00:00:31 3 billing=3,cpu=3,mem=3G,node=1
18704.batch 2022-10-14T01:50:11 2022-10-14T01:50:42 00:00:31 3 cpu=3,mem=3G,node=1
18704.extern 2022-10-14T01:50:11 2022-10-14T01:50:42 00:00:31 3 billing=3,cpu=3,mem=3G,node=1
18704.0 2022-10-14T01:50:12 2022-10-14T01:50:42 00:00:30 2 cpu=2,mem=1G,node=1
18704.1 2022-10-14T01:50:12 2022-10-14T01:50:42 00:00:30 2 cpu=2,mem=1G,node=1
18704.2 2022-10-14T01:50:12 2022-10-14T01:50:42 00:00:30 2 cpu=2,mem=1G,node=1
The above statistics show that the three steps started at the same time (in parallel).
Summary
srun
in a submission script is used to create job steps. It’s used to launch the processes. If you have a parallel MPI program, srun
takes care of creating all the MPI processes. Prefixing srun to your job steps causes the script to be executed on the compute nodes. The -ntasks
flag in the srun command is similar to the --ntasks
in the #SBATCH
directives.
Passing Multiple Arguments
The below example shows how you can pass multiple arguments to your program and make it execute in parallel.
Example
Create a python script with the following content:
#!/bin/python3
import sys
import platform
from datetime import datetime
from time import sleep
# sleep for seconds
12 sleep(20)
current_time = datetime.now()
dt_format = current_time.strftime("%H:%M:%S")
print('Hello From "{}" on host "{}" at {}.'.format(sys.argv[1],platform.node(), dt_format))
The above python program gets the current time, hostname and an argument from user to print Hello From "argument" on host "hostname" at current time.
To run three different instances of the python program, you need to create a job script having three job steps (srun commands) with the required arguments as shown in the following script:
#!/bin/bash
#SBATCH --job-name pytest
#SBATCH --output pytest.out
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=500M
#SBATCH --partition=interactive
#SBATCH --time=0-00:8:00
# load modules
module load spack/2022a gcc/12.1.0-2022a-gcc_8.5.0-ivitefn python/3.9.12-2022a-gcc_12.1.0-ys2veed
# job steps
srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "1" &
srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "2" &
srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "3" &
wait
In the resource request section of the batch script, 3 tasks
with 2 cpus per task
and 500mb
of RAM were requested and allocated to each task for 8 minutes
.
In the job steps section, the job steps were compartmentalized by specifying how each step should be treated by Slurm (number of processes per step). Srun won’t inherit the --cpus-per-task value requested by salloc or sbatch. It must be requested again with the call to srun or set with the SLARM_CPUS_PER_TASK
environment variable if desired for the task(s). The python command was called against the python script with different arguments ("1", "2", and "3"
).
After the job is completed, you can check the output file.
cat pytest.out
Hello From "3" on host "discovery-c34.cluster.local" at 22:34:14.
Hello From "1" on host "discovery-c34.cluster.local" at 22:34:14.
Hello From "2" on host "discovery-c34.cluster.local" at 22:34:14.
The output file shows that the three steps started at the same time (in parallel).
To check the job statistics, run the sacct
command.
sacct -j 7588 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30
Stats Output
JobID Start End Elapsed ReqCPUS AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
7588 2022-09-29T22:34:14 2022-09-29T22:34:35 00:00:21 6 billing=6,cpu=6,mem=3000M,nod+
7588.batch 2022-09-29T22:34:14 2022-09-29T22:34:35 00:00:21 6 cpu=6,mem=3000M,node=1
7588.extern 2022-09-29T22:34:14 2022-09-29T22:34:35 00:00:21 6 billing=6,cpu=6,mem=3000M,nod+
7588.0 2022-09-29T22:34:14 2022-09-29T22:34:35 00:00:21 2 cpu=2,mem=1000M,node=1
7588.1 2022-09-29T22:34:14 2022-09-29T22:34:35 00:00:21 2 cpu=2,mem=1000M,node=1
7588.2 2022-09-29T22:34:14 2022-09-29T22:34:35 00:00:21 2 cpu=2,mem=1000M,node=1
Explanation
Take a closer look at the start and end time of each job step, one can infer that all tasks ran independently in parallel. It started at the same time. You’d also notice that the order in which the job steps were specified is different from the order of the output. In the batch script: 1,2,3
and in output: 3,1,2
Advanced Srun Parallelism
Sometimes users may meed to run the same program n times
. This means that job script will contains n job steps
. For example, to run the pervious Python example 20 times with different arguments, you need to have 20 job steps (20 sruns
). This means same srun is repeated with different arguments. Instead of typing 20 srun
statements, you can use the for loop to run those runs with different arguments as shown bellow.
#!/bin/bash
#SBATCH --job-name pytest
#SBATCH --output pytest.out
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=500M
#SBATCH --partition=interactive
#SBATCH --time=0-00:8:00
# load modules
module load spack/2022a gcc/12.1.0-2022a-gcc_8.5.0-ivitefn python/3.9.12-2022a-gcc_12.1.0-ys2veed
for i in {1..20}; do
while [ "$(jobs -p | wc -l)" -ge "$SLURM_NTASKS" ]; do
sleep 30
done
srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "$i" &
done
wait
The above bash script runs the Python program 20 times using different argument. The for loop
is used to do that. The while loop
checks if there is enough resources to run the job steps. If not, it will stay in the busy waiting. For example, the first three job steps start running at the same time, because the requested resources are enough to run three steps only in parallel. The next three steps will be waiting for the resources to be released by the running steps. In short, three steps will be running in parallel at the same time.
Submit the job and check its statistics once it’s finished.
sacct -j 7577 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30
Output
JobID Start End Elapsed ReqCPUS AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
7577 2022-09-29T21:33:34 2022-09-29T21:36:55 00:03:21 6 billing=6,cpu=6,mem=3000M,nod+
7577.batch 2022-09-29T21:33:34 2022-09-29T21:36:55 00:03:21 6 cpu=6,mem=3000M,node=1
7577.extern 2022-09-29T21:33:34 2022-09-29T21:36:55 00:03:21 6 billing=6,cpu=6,mem=3000M,nod+
7577.0 2022-09-29T21:33:35 2022-09-29T21:33:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.1 2022-09-29T21:33:35 2022-09-29T21:33:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.2 2022-09-29T21:33:35 2022-09-29T21:33:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.3 2022-09-29T21:34:05 2022-09-29T21:34:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.4 2022-09-29T21:34:05 2022-09-29T21:34:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.5 2022-09-29T21:34:05 2022-09-29T21:34:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.6 2022-09-29T21:34:35 2022-09-29T21:34:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.7 2022-09-29T21:34:35 2022-09-29T21:34:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.8 2022-09-29T21:34:35 2022-09-29T21:34:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.9 2022-09-29T21:35:05 2022-09-29T21:35:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.10 2022-09-29T21:35:05 2022-09-29T21:35:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.11 2022-09-29T21:35:05 2022-09-29T21:35:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.12 2022-09-29T21:35:35 2022-09-29T21:35:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.13 2022-09-29T21:35:35 2022-09-29T21:35:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.14 2022-09-29T21:35:35 2022-09-29T21:35:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.15 2022-09-29T21:36:05 2022-09-29T21:36:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.16 2022-09-29T21:36:05 2022-09-29T21:36:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.17 2022-09-29T21:36:05 2022-09-29T21:36:25 00:00:20 2 cpu=2,mem=1000M,node=1
7577.18 2022-09-29T21:36:35 2022-09-29T21:36:55 00:00:20 2 cpu=2,mem=1000M,node=1
7577.19 2022-09-29T21:36:35 2022-09-29T21:36:55 00:00:20 2 cpu=2,mem=1000M,node=1
The above output shows that the first 3 steps(7577.0, 7577.1, 7577.2) start running at the same time 21:33:35
. The next three steps start running in parallel after the previous three finishes execution and the resource are available.
Using --multi-prog
--multi-prog
runs a job with different programs and different arguments for each task. In this case, the executable program specified is actually a configuration file specifying the executable and arguments for each task.
The above example in the Passing Multiple Arguments
section can be done differently by passing --multi-prog
flags to the srun
command. Specify the external file that contains the number of tasks to execute. This can also be used to run different executables at the same time. Consider the below example.
#!/bin/bash
#SBATCH --job-name pytest
#SBATCH --output pytest.out
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=500M
#SBATCH --partition=interactive
#SBATCH --time=0-00:8:00
srun --ntasks=3 -l --multi-prog ./file.conf
0 python pyScript.py 1
1 python pyScript.py 2
2 python pyScript.py 3
This file contains the instruction and lists the steps (tasks) to be run. The numbering will begin with zero followed by the executables. Note that the executables could vary.
Output
1: Hello From "2" on host "discovery-c35.cluster.local" at 22:37:11.
0: Hello From "1" on host "discovery-c34.cluster.local" at 22:37:11.
2: Hello From "3" on host "discovery-g14.cluster.local" at 22:37:11.
Explanation
In the resource request section of the batch script, 3 tasks
with 2 CPU
and 500mb
of RAM were requested and allocated to each task for 08 minutes
.
The srun
command informs Slurm to run the multiple programs specified in an external file and to treat each of the steps in the file as an individual task. However, make sure that the total number of tasks specified as a flag --ntasks=3
is equal to the total number of steps in the file.conf. It should also be equal to the total number of tasks specified in the resource request section --ntasks=3
.
IMPORTANT: Make sure that the number of tasks (--ntasks) is greater than or equal the total number of steps in the configration file.
The -l
flag passed to the srun command means that it should prepend the task number to lines of the result file as shown in the output above.