Slurm Job Tasks
Slurm encapsulates resources using the idea of jobs and tasks. A job can span multiple compute nodes and is the sum of all task resources. Tasks are a subset of resources in a job and can only exists on a single compute node. In practice you build a job based on the number of tasks you want to run, in parallel, and how many resources you want for each task.
The --ntasks
flag is the number of tasks in a job or job step.
When used in a SBATCH script, it specifies the maximum amount of tasks that can be executed in parallel (at the same time). It also dictates how many resources are assigned to the job.
When --ntasks
is greater then 1 a job may be assigned multiple nodes but the size of any 1 task can’t exceed a single node. When --ntasks
is exactly 1, the default, the submission script will use all resources assigned to the job without needing to use the srun
command.
The default behavior of submission scripts is to use all the assigned resources of the first node in the job. This is non-blocking and doesn’t count against your maximum number of concurrent tasks.
The srun
Command
srun
is the Slurm command used to launch tasks within a job. These tasks may run on the same node as the submission script or on a different node assigned to the current job.
Slurm records metrics about jobs and includes individualized metrics for each task. This is useful for profiling things such as memory usage of different parts of a job. These metrics can be accessed, once a job is completed, with the sacct
command and the job id number.
Another way to use srun
is directly from the login node without a job submission script. This will interactively run a program on a compute node and return it’s output directly to your terminal. In the background Slurm takes the srun
call and automatically creates a job. This is great for interactive work but falls flat when you need to wait in queue or want to be able to close your terminal.
Program Execution with srun
Example 1: Execute the same job step twice
#!/bin/bash
#SBATCH --job-name=TestJob
#SBATCH --ntasks=2
#SBATCH --time=00:01:00
srun echo "Hello!"
Output
Hello
Hello
Explanation
--ntasks=2
is equal to 2 processes. Which means the number of things or tasks to carry out. Because, two tasks are specified in the job steps section, srun echo "Hello!"
will be executed twice.
The two tasks don’t run in parallel. |
Example 2: Running jobs interactively with srun command
-
Login to Discovery and create the file
program.py
within your home directory and paste the code below then save.
Python script (program.py)
txt = "Running jobs interactively with srun command"
print(txt)
-
Load the python module. You can run the
module spider python
command to choose from the list of all python versions on Discovery.
module load spack/2022a gcc/12.1.0-2022a-gcc_8.5.0-ivitefn python/3.9.12-2022a-gcc_12.1.0-ys2veed
-
After the python module is loaded, the python script can be executed on the compute nodes using the srun command.
srun -n 1 --time=00:10:00 --partition=normal python program.py
The -n
flag specifies the number of tasks (--ntasks
) to run followed by the --time
flag, for the duration and the --partition
flag, for what partition(normal, backfill, interactive
, so on. ) to run your job in.
Output
Running jobs interactively with srun command
You should see the output above printed on console after successfully executing the srun command.