Slurm Job Tasks

Slurm encapsulates resources using the idea of jobs and tasks. A job can span multiple compute nodes and is the sum of all task resources. Tasks are a subset of resources in a job and can only exists on a single compute node. In practice you build a job based on the number of tasks you want to run, in parallel, and how many resources you want for each task.

The --ntasks flag is the number of tasks in a job or job step.

When used in a SBATCH script, it specifies the maximum amount of tasks that can be executed in parallel (at the same time). It also dictates how many resources are assigned to the job.

When --ntasks is greater then 1 a job may be assigned multiple nodes but the size of any 1 task can’t exceed a single node. When --ntasks is exactly 1, the default, the submission script will use all resources assigned to the job without needing to use the srun command.

The default behavior of submission scripts is to use all the assigned resources of the first node in the job. This is non-blocking and doesn’t count against your maximum number of concurrent tasks.

The srun Command

srun is the Slurm command used to launch tasks within a job. These tasks may run on the same node as the submission script or on a different node assigned to the current job.

Slurm records metrics about jobs and includes individualized metrics for each task. This is useful for profiling things such as memory usage of different parts of a job. These metrics can be accessed, once a job is completed, with the sacct command and the job id number.

Another way to use srun is directly from the login node without a job submission script. This will interactively run a program on a compute node and return it’s output directly to your terminal. In the background Slurm takes the srun call and automatically creates a job. This is great for interactive work but falls flat when you need to wait in queue or want to be able to close your terminal.

Program Execution with srun

Example 1: Execute the same job step twice

#!/bin/bash

#SBATCH --job-name=TestJob
#SBATCH --ntasks=2
#SBATCH --time=00:01:00

srun echo "Hello!"

Output

Hello
Hello

Explanation

--ntasks=2 is equal to 2 processes. Which means the number of things or tasks to carry out. Because, two tasks are specified in the job steps section, srun echo "Hello!" will be executed twice.

The two tasks don’t run in parallel.

Example 2: Running jobs interactively with srun command

  1. Login to Discovery and create the file program.py within your home directory and paste the code below then save.

Python script (program.py)

txt = "Running jobs interactively with srun command"
print(txt)
  1. Load the python module. You can run the module spider python command to choose from the list of all python versions on Discovery.

module load spack/2022a  gcc/12.1.0-2022a-gcc_8.5.0-ivitefn python/3.9.12-2022a-gcc_12.1.0-ys2veed
  1. After the python module is loaded, the python script can be executed on the compute nodes using the srun command.

srun -n 1 --time=00:10:00 --partition=normal python program.py

The -n flag specifies the number of tasks (--ntasks) to run followed by the --time flag, for the duration and the --partition flag, for what partition(normal, backfill, interactive, so on. ) to run your job in.

Output

Running jobs interactively with srun command

You should see the output above printed on console after successfully executing the srun command.