How Slurm Works?
In creating a Slurm script, there are 4 main parts that are mandatory in order for your job to be successfully processed.
Breakdown of Bash Script
-
Shebang The Shebang command tells the shell (which interprets UNIX commands) to interpret and run the Slurm script using the bash (Bourne-again shell) shell.
This line should always be added at the very top of your SBATCH/Slurm script.
#!/bin/bash
-
Resource Request In this section, the amount of resources required for the job to run on the compute nodes is specified. This informs Slurm about the name of the job, output filename, amount of RAM, Nos. of CPUs, nodes, tasks, time, and other parameters to be used for processing the job.
These SBATCH commands are also know as SBATCH directives and must be preceded with a pound sign and should be in an uppercase format as shown below.
#SBATCH --job-name=TestJob #SBATCH --output=TestJob.out #SBATCH --time=1-00:10:00 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=500M
-
Dependencies In this section, loads all the software that your project needs to run the program scripts. For example, if you’re working on a Python project, you’d definitely require the Python software or module to interpret and run your code. Please visit the link → Module Environments and Commands page for more details about using modules on Discovery.
module load python
-
Job Steps Specify the list of tasks to be carried out.
srun echo "Start process" srun hostname srun sleep 30 srun echo "End process"
Putting it all together
Please note that the lines with the double pound signs (##) are comments when used in batch scripts.
## Shebang
#!/bin/bash
## Resource Request
#SBATCH --job-name=TestJob
#SBATCH --output=TestJob.out
#SBATCH --time=1-00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=500M
## Job Steps
srun echo "Start process"
srun hostname
srun sleep 30
srun echo "End process"
In the script above, 1 Node
, 1 CPU
, 500MB
of memory per CPU, 10 minutes
of a wall time for the tasks (Job steps) were requested. Note that all the job steps that begin with the srun command will execute sequentially as one task by one CPU only.
The first job step will run the Linux echo command and output Start process
. The next job step(2) will echo the hostname of the compute node that executed the job. The next job step will execute the Linux sleep command for 30
seconds. The final job step will just echo out End process
. Note that these job steps executed sequentially and not in parallel.
It’s important to set a limit on the total run time of the job allocation. This helps the Slurm manager to handle prioritization and queuing efficiently. The above one is a very simple script that takes less than a second. Hence, it’s important to specify the run-time limit so that Slurm doesn’t see the job as one that requires a lot of time to execute.
It’s important to keep all #SBATCH lines together and at the top of the script. No bash code or variables settings should be done until after the #SBATCH lines. |
#SBATCH Directives/Flags
These are the commands that will be useful for your job submission.
Your script should begin with shebang command |
SBATCH is used to submit a job script for later execution. It defines queue, time, notifications, name, code, and set-up. SBATCH scripts are unique in how they’re read. In shell scripts, any line that starts with # is considered a comment.
Any comment that starts with the word |
#!/bin/sh
#SBATCH --job-name myJobName ## The name that will show up in the queue
#SBATCH --output myJobName-%j.out ## Filename of the output; default is slurm-[joblD].out
#SBATCH --partition normal ## The partition to run in; default = normal
#SBATCH --nodes 1 ## Number of nodes to use; default = 1
#SBATCH --ntasks 3 ## Number of tasks (analyses) to run; default = 1
#SBATCH --cpus-per-task 16 ## The num of threads the code will use; default = 1
#SBATCH --mem-per-cpu 700M ## Memory per allocated CPU
#SBATCH --time 0-00:10:00 ## Time for analysis (day-hour:min:sec)
#SBATCH --mail-user yourlD@nmsu.edu ## Your email address
#SBATCH --mail-type BEGIN ## Slurm will email you when your job starts
#SBATCH --mail-type END ## Slurm will email you when your job ends
#SBATCH --mail-type FAIL ## Slurm will email you when your job fails
#SBATCH --get-user-env ## Passes along environmental settings
The "cpus-per-task" value times the "ntasks" value needs to be in the range of the "nodes" thread value. For example, 1 node is a max of 48 threads, so "cpus-per-task" value times the "ntasks" must be less than or equal to 48, otherwise you will get back an error. |
|
Specifies a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just |
|
Instructs Slurm to connect the batch script’s standard output directly to the filename. If not specified, the default filename is |
|
Requests a specific partition for the resource allocation ( |
|
Requests a number of nodes assigned to the job. If this parameter isn’t specified, the default behavior is to assign enough nodes to satisfy the requirements of the --ntasks and --cpus-per-task options. However, assume that you specified one node (--node 1) and 32 tasks (--ntasks 32) in your job script. This means that your job requires 32 CPUs to run. Now the problem with this is that if there is no single node with that many CPUs, your job will fail with a resource error because you restricted it to one node. Therefore, it’s advisable that the --node directive be left out in your job submission scripts. |
|
This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and offer enough resources. The default is 1 task per node, but note that the --cpus-per-task option will change this default. |
|
Advises the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to assign one processor per task. For instance, consider an application that has 4 tasks, each requiring 3 processors. If HPC cluster is comprised of quad-processors nodes and simply ask for 12 processors, the controller might give only 3 nodes. However, by using the --cpus-per-task=3 options, the controller knows that each task requires 3 processors on the same node, and the controller will grant allocation of 4 nodes, one for each of the 4 tasks. |
|
This is the minimum memory required per allocated CPU. Note: It’s highly recommended that the users must specify |
|
Sets a limit on the total run time of the job allocation. If the requested time limit exceeds the partition’s time limit, the job will be left in a PENDING state (possibly indefinitely). The default time limit is the partition’s default time limit. A time limit of zero requests that no time limit be imposed. The acceptable time format is |
|
Defines user who will receive email notification of state changes as defined by --mail-type. |
|
Notifies user by email when certain event types occur. Valid type values are BEGIN, END, FAIL. The user to be notified is indicated with --mail-user. |