Untitled :: High Performance Computing

Parallel Execution

Example - Batch script

#!/bin/bash

#SBATCH --job-name parallel   ## name that will show up in the queue
#SBATCH --output slurm-%j.out   ## filename of the output; the %j is equal to jobID; default is slurm-[jobID].out
#SBATCH --ntasks=3  ## number of tasks (analyses) to run
#SBATCH --cpus-per-task=2  ## the number of threads allocated to each task
#SBATCH --mem-per-cpu=1G   # memory per CPU core
#SBATCH --partition=normal  ## the partitions to run in (comma seperated)
#SBATCH --time=0-00:10:00  ## time for analysis (day-hour:min:sec)

# Execute job steps
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 10; echo 'hello 1'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 20; echo 'hello 2'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 30; echo 'hello 3'" &
wait

The above script will start three job steps. The first step starts the Linux sleep command run for 10 seconds, and then print Hello 1. The second step starts the Linux sleep command run for 20 seconds, and then print Hello 2. The third step starts the Linux sleep command run for 30 seconds, and then print Hello 3. This script is supposed to print:

Hello 1
Hello 2
Hello 3

To check the statistics of the job, run the sacct command.

sacct -j 7217 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30

Output

JobID                      Start                 End    Elapsed  ReqCPUS                      AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
7217         2022-09-27T23:07:40 2022-09-27T23:08:11   00:00:31        6  billing=6,cpu=6,mem=6G,node=1
7217.batch   2022-09-27T23:07:40 2022-09-27T23:08:11   00:00:31        6            cpu=6,mem=6G,node=1
7217.extern  2022-09-27T23:07:40 2022-09-27T23:08:11   00:00:31        6  billing=6,cpu=6,mem=6G,node=1
7217.0       2022-09-27T23:07:40 2022-09-27T23:07:51   00:00:11        2            cpu=2,mem=2G,node=1
7217.1       2022-09-27T23:07:40 2022-09-27T23:08:01   00:00:21        2            cpu=2,mem=2G,node=1
7217.2       2022-09-27T23:07:40 2022-09-27T23:08:11   00:00:31        2            cpu=2,mem=2G,node=1

Explanation

In the above example, there are 3 job steps and the statistics show that all the job steps (7217.0, 7217.1, 7217.2) started executing at the same time 3:07:40 but finished at different times. This means that the job steps were executed simultaneously.

The ampersand (&) symbol at the end of every srun command is used to run commands simultaneously. It removes the blocking feature of the srun command which makes it interactive but non-blocking. It’s vital to use the wait command when using ampersand to run commands simultaneously. This is because it ensures that a given task doesn’t cancel itself due to the completion of another task or sibling tasks. In other words, without the wait command, task 0 would cancel itself, given task 1, or 2 completed successfully.

Note that the total number of tasks in the above job script is 3. Also, each job step will run only once (srun --ntasks=1). The above script requested 2 CPUs for each task (#SBATCH --cpus-per-task=2). --cpus-per-task is set at the srun level to get the correct value. The enviromental variable SLURM_CPUS_PER_TASK is the number of CPUs allocated to the batch step.

Note that --cpus-per-task is set to 2. If you change it to an odd number, the three job steps in the above script won’t run in parallel. Two steps run in parallel and then you will get an error that looks like:

srun: Job 18701 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for job 18701

If you check the statistics of the job, you will find that 2 steps run in parallel, and once those steps finish execution, the third one started. This is because the multithreading is enabled by default and individual srun commands don’t share CPU cores. With Hyper-Threading, there are 2 threads per core. When you request 2 CPUs, it’s will give you 2 threads on the same core. On the other hand, if you request 3 CPUs SLURM will give you 2 cores, and the remaining 4 threads on the second core won’t be used.

To overcome this issue, you need to either make --cpus-per-task even or disable multithreading. The following script shows how to disable multithreading when --cpus-per-task is odd

#!/bin/bash

#SBATCH --job-name parallel   ## name that will show up in the queue
#SBATCH --output slurm-%j.out   ## filename of the output; the %j is equal to jobID; default is slurm-[jobID].out
#SBATCH --ntasks=3  ## number of tasks (analyses) to run
#SBATCH --cpus-per-task=1  ## the number of threads allocated to each task
#SBATCH --mem-per-cpu=1G   # memory per CPU core
#SBATCH --partition=normal  ## the partitions to run in (comma seperated)
#SBATCH --time=0-00:10:00  ## time for analysis (day-hour:min:sec)
#SBATCH --hint=nomultithread

# Execute job steps
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 10; echo 'hello 1'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 20; echo 'hello 2'" &
srun --ntasks=1 --nodes=1 --cpus-per-task=$SLURM_CPUS_PER_TASK bash -c "sleep 30; echo 'hello 3'" &
wait

After submitting the job and the job is finished check the statistics of the job:

sacct -j 7217 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30

Output

JobID                      Start                 End    Elapsed  ReqCPUS                      AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
18704        2022-10-14T01:50:11 2022-10-14T01:50:42   00:00:31        3  billing=3,cpu=3,mem=3G,node=1
18704.batch  2022-10-14T01:50:11 2022-10-14T01:50:42   00:00:31        3            cpu=3,mem=3G,node=1
18704.extern 2022-10-14T01:50:11 2022-10-14T01:50:42   00:00:31        3  billing=3,cpu=3,mem=3G,node=1
18704.0      2022-10-14T01:50:12 2022-10-14T01:50:42   00:00:30        2            cpu=2,mem=1G,node=1
18704.1      2022-10-14T01:50:12 2022-10-14T01:50:42   00:00:30        2            cpu=2,mem=1G,node=1
18704.2      2022-10-14T01:50:12 2022-10-14T01:50:42   00:00:30        2            cpu=2,mem=1G,node=1

The above statistics show that the three steps started at the same time (in parallel).

Summary

srun in a submission script is used to create job steps. It’s used to launch the processes. If you have a parallel MPI program, srun takes care of creating all the MPI processes. Prefixing srun to your job steps causes the script to be executed on the compute nodes. The -ntasks flag in the srun command is similar to the --ntasks in the #SBATCH directives.

Passing Multiple Arguments

The below example shows how you can pass multiple arguments to your program and make it execute in parallel.

Example

Create a python script with the following content:

#!/bin/python3

import sys
import platform
from datetime import datetime
from time import sleep

# sleep for seconds
 12 sleep(20)
current_time = datetime.now()
dt_format = current_time.strftime("%H:%M:%S")


print('Hello From "{}" on host "{}" at {}.'.format(sys.argv[1],platform.node(), dt_format))

The above python program gets the current time, hostname and an argument from user to print Hello From "argument" on host "hostname" at current time.

To run three different instances of the python program, you need to create a job script having three job steps (srun commands) with the required arguments as shown in the following script:

#!/bin/bash

#SBATCH --job-name pytest
#SBATCH --output pytest.out
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=500M
#SBATCH --partition=interactive
#SBATCH --time=0-00:8:00

# load modules
module load spack/2022a  gcc/12.1.0-2022a-gcc_8.5.0-ivitefn python/3.9.12-2022a-gcc_12.1.0-ys2veed

# job steps
srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "1" &
srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "2" &
srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "3" &
wait

In the resource request section of the batch script, 3 tasks with 2 cpus per task and 500mb of RAM were requested and allocated to each task for 8 minutes.

In the job steps section, the job steps were compartmentalized by specifying how each step should be treated by Slurm (number of processes per step). Srun won’t inherit the --cpus-per-task value requested by salloc or sbatch. It must be requested again with the call to srun or set with the SLARM_CPUS_PER_TASK environment variable if desired for the task(s). The python command was called against the python script with different arguments ("1", "2", and "3").

After the job is completed, you can check the output file.

cat pytest.out
Hello From "3" on host "discovery-c34.cluster.local" at 22:34:14.
Hello From "1" on host "discovery-c34.cluster.local" at 22:34:14.
Hello From "2" on host "discovery-c34.cluster.local" at 22:34:14.

The output file shows that the three steps started at the same time (in parallel).

To check the job statistics, run the sacct command.

sacct -j 7588 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30

Stats Output

JobID                      Start                 End    Elapsed  ReqCPUS                      AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
7588         2022-09-29T22:34:14 2022-09-29T22:34:35   00:00:21        6 billing=6,cpu=6,mem=3000M,nod+
7588.batch   2022-09-29T22:34:14 2022-09-29T22:34:35   00:00:21        6         cpu=6,mem=3000M,node=1
7588.extern  2022-09-29T22:34:14 2022-09-29T22:34:35   00:00:21        6 billing=6,cpu=6,mem=3000M,nod+
7588.0       2022-09-29T22:34:14 2022-09-29T22:34:35   00:00:21        2         cpu=2,mem=1000M,node=1
7588.1       2022-09-29T22:34:14 2022-09-29T22:34:35   00:00:21        2         cpu=2,mem=1000M,node=1
7588.2       2022-09-29T22:34:14 2022-09-29T22:34:35   00:00:21        2         cpu=2,mem=1000M,node=1

Explanation Take a closer look at the start and end time of each job step, one can infer that all tasks ran independently in parallel. It started at the same time. You’d also notice that the order in which the job steps were specified is different from the order of the output. In the batch script: 1,2,3 and in output: 3,1,2

Advanced Srun Parallelism

Sometimes users may meed to run the same program n times. This means that job script will contains n job steps. For example, to run the pervious Python example 20 times with different arguments, you need to have 20 job steps (20 sruns). This means same srun is repeated with different arguments. Instead of typing 20 srun statements, you can use the for loop to run those runs with different arguments as shown bellow.

 #!/bin/bash

#SBATCH --job-name pytest
#SBATCH --output pytest.out
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=500M
#SBATCH --partition=interactive
#SBATCH --time=0-00:8:00

# load modules
module load spack/2022a  gcc/12.1.0-2022a-gcc_8.5.0-ivitefn python/3.9.12-2022a-gcc_12.1.0-ys2veed

for i in {1..20}; do
    while [ "$(jobs -p | wc -l)" -ge "$SLURM_NTASKS" ]; do
        sleep 30
    done
    srun --ntasks=1 --cpus-per-task=$SLURM_CPUS_PER_TASK python pyScript.py "$i" &
done
wait

The above bash script runs the Python program 20 times using different argument. The for loop is used to do that. The while loop checks if there is enough resources to run the job steps. If not, it will stay in the busy waiting. For example, the first three job steps start running at the same time, because the requested resources are enough to run three steps only in parallel. The next three steps will be waiting for the resources to be released by the running steps. In short, three steps will be running in parallel at the same time.

Submit the job and check its statistics once it’s finished.

sacct -j 7577 --format=JobID,Start,End,Elapsed,REQCPUS,ALLOCTRES%30

Output

JobID                      Start                 End    Elapsed  ReqCPUS                      AllocTRES
------------ ------------------- ------------------- ---------- -------- ------------------------------
7577         2022-09-29T21:33:34 2022-09-29T21:36:55   00:03:21        6 billing=6,cpu=6,mem=3000M,nod+
7577.batch   2022-09-29T21:33:34 2022-09-29T21:36:55   00:03:21        6         cpu=6,mem=3000M,node=1
7577.extern  2022-09-29T21:33:34 2022-09-29T21:36:55   00:03:21        6 billing=6,cpu=6,mem=3000M,nod+
7577.0       2022-09-29T21:33:35 2022-09-29T21:33:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.1       2022-09-29T21:33:35 2022-09-29T21:33:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.2       2022-09-29T21:33:35 2022-09-29T21:33:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.3       2022-09-29T21:34:05 2022-09-29T21:34:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.4       2022-09-29T21:34:05 2022-09-29T21:34:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.5       2022-09-29T21:34:05 2022-09-29T21:34:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.6       2022-09-29T21:34:35 2022-09-29T21:34:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.7       2022-09-29T21:34:35 2022-09-29T21:34:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.8       2022-09-29T21:34:35 2022-09-29T21:34:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.9       2022-09-29T21:35:05 2022-09-29T21:35:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.10      2022-09-29T21:35:05 2022-09-29T21:35:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.11      2022-09-29T21:35:05 2022-09-29T21:35:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.12      2022-09-29T21:35:35 2022-09-29T21:35:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.13      2022-09-29T21:35:35 2022-09-29T21:35:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.14      2022-09-29T21:35:35 2022-09-29T21:35:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.15      2022-09-29T21:36:05 2022-09-29T21:36:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.16      2022-09-29T21:36:05 2022-09-29T21:36:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.17      2022-09-29T21:36:05 2022-09-29T21:36:25   00:00:20        2         cpu=2,mem=1000M,node=1
7577.18      2022-09-29T21:36:35 2022-09-29T21:36:55   00:00:20        2         cpu=2,mem=1000M,node=1
7577.19      2022-09-29T21:36:35 2022-09-29T21:36:55   00:00:20        2         cpu=2,mem=1000M,node=1

The above output shows that the first 3 steps(7577.0, 7577.1, 7577.2) start running at the same time 21:33:35. The next three steps start running in parallel after the previous three finishes execution and the resource are available.

Using --multi-prog

--multi-prog runs a job with different programs and different arguments for each task. In this case, the executable program specified is actually a configuration file specifying the executable and arguments for each task. The above example in the Passing Multiple Arguments section can be done differently by passing --multi-prog flags to the srun command. Specify the external file that contains the number of tasks to execute. This can also be used to run different executables at the same time. Consider the below example.

Slurm Script 'script.sh'
File 'file.conf'

#!/bin/bash

#SBATCH --job-name pytest
#SBATCH --output pytest.out
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=500M
#SBATCH --partition=interactive
#SBATCH --time=0-00:8:00

srun --ntasks=3 -l --multi-prog ./file.conf

0 python pyScript.py 1
1 python pyScript.py 2
2 python pyScript.py 3

This file contains the instruction and lists the steps (tasks) to be run. The numbering will begin with zero followed by the executables. Note that the executables could vary.

Output

1: Hello From "2" on host "discovery-c35.cluster.local" at 22:37:11.
0: Hello From "1" on host "discovery-c34.cluster.local" at 22:37:11.
2: Hello From "3" on host "discovery-g14.cluster.local" at 22:37:11.

Explanation

In the resource request section of the batch script, 3 tasks with 2 CPU and 500mb of RAM were requested and allocated to each task for 08 minutes.

The srun command informs Slurm to run the multiple programs specified in an external file and to treat each of the steps in the file as an individual task. However, make sure that the total number of tasks specified as a flag --ntasks=3 is equal to the total number of steps in the file.conf. It should also be equal to the total number of tasks specified in the resource request section --ntasks=3. IMPORTANT: Make sure that the number of tasks (--ntasks) is greater than or equal the total number of steps in the configration file.

The -l flag passed to the srun command means that it should prepend the task number to lines of the result file as shown in the output above.