Basic Slurm Commands
The table below shows the list and descriptions of the mostly used Slurm commands.
Commands | Syntax | Description |
---|---|---|
|
|
Submit a batch script to Slurm for processing. |
|
|
Show information about your job(s) in the queue. The command when run without the |
|
|
Run jobs interactively on the cluster. |
|
|
End or cancel a queued job. |
|
|
Show information about current and previous jobs. |
|
|
Get information about the resources on available nodes that make up the HPC cluster. |
Slurm Script Main Parts
In creating a Slurm script, there are 4 main parts that are mandatory in order for your job to be successfully processed.
-
Shebang The Shebang command tells the shell (which interprets the UNIX commands) to interpret and run the Slurm script using the bash (Bourne-again shell) shell.
This line should always be added at the very top of your SBATCH/Slurm script.
#!/bin/bash
-
Resource Request In this section, the amount of resources required for the job to run on the compute nodes are specified. This informs Slurm about the name of the job, output filename, amount of RAM, Nos. of CPUs, nodes, tasks, time, and other parameters to be used for processing the job.
These SBATCH commands are also know as SBATCH directives and must be preceded with a pound sign and should be in an uppercase format as shown below.
#SBATCH --job-name=TestJob #SBATCH --output=TestJob.out #SBATCH --time=1-00:10:00 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=500M
-
Dependencies Load all the software that the project depends on to execute. For example, if you are working on a python project, you’d definitely require the python software or module to interpret and run your code. Please visit the link → Module Environments and Commands page for more details about using modules on Discovery.
module load python
-
Job Steps Specify the list of tasks to be carried out.
srun echo "Start process" srun hostname srun sleep 30 srun echo "End process"
Putting it all together
Please note that the lines with the double pound signs (##) are comments when used in batch scripts.
## Shebang
#!/bin/bash
## Resource Request
#SBATCH --job-name=TestJob
#SBATCH --output=TestJob.out
#SBATCH --time=1-00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=500M
## Job Steps
srun echo "`Start process`"
srun hostname
srun sleep 30
srun echo "`End process`"
In the script above, 1 Node
with 1 CPU
, 500MB
of memory per CPU, 10 minutes
of Walltime was requested for the tasks (Job steps). Note that all the job steps that begin with the srun command will execute sequentially as one task by one CPU only.
The first job step will run the Linux echo command and output Start process
. The next job step(2) will echo the hostname of the compute node that executed the job. Then, the next job step will execute the Linux sleep command for 30
seconds. The final job step will just echo out End process
. Note that these job steps executed sequentially and not in parallel.
It’s important to set a limit on the total run time of the job allocation, this helps the Slurm manager to handle prioritization and queuing efficiently. The above example is a very simple script which takes less than a second. Hence, it’s important to specify the run time limit so that Slurm doesn’t see the job as one that requires a lot of time to execute.
SBATCH Directives Explained
sbatch
is used to submit a job script for later execution.
Lines that begin with #SBATCH in all caps is treated as a command by Slurm. This means that to comment out a Slurm command, you need to append a second another pound sign # to the SBATCH command (#SBATCH means Slurm command, ##SBATCH means comment). |
The SBATCH lines in the script below contains directives and it’s recommended to use as a default for all your job submissions. However, the directive with the --mail
parameters are optional.
#SBATCH --job-name myJobName
#SBATCH --output myJobName.o%j
#SBATCH --partition normal
#SBATCH --ntasks 3
#SBATCH --cpus-per-task 16
#SBATCH --mem-per-cpu 700M
#SBATCH --time 0-00:10:00
#SBATCH --mail-user yourlD@nmsu.edu
#SBATCH --mail-type BEGIN
#SBATCH --mail-type END
#SBATCH --mail-type FAIL
#SBATCH --get-user-env
Parameters Explained
Directives | Description |
---|---|
|
Specifies a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just |
|
Instructs Slurm to connect the batch script’s standard output directly to the filename. If not specified, the default filename is |
|
Requests a specific partition for the resource allocation ( |
|
This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and offer enough resources. The default is 1 task per node, but note that the --cpus-per-task option will change this default. |
|
Advises the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to assign one processor per task. For instance, consider an application that has 4 tasks, each requiring 3 processors. If the HPC cluster is comprised of quad-processors nodes and simply ask for 12 processors, the controller might give only 3 nodes. However, by using the --cpus-per-task=3 options, the controller knows that each task requires 3 processors on the same node. Hence, the controller will grant allocation of 4 nodes, one for each of the 4 tasks. |
|
This is the minimum memory required per allocated CPU. Note: It’s highly recommended to specify |
|
Sets a limit on the total run time of the job allocation. If the requested time limit exceeds the partition’s time limit, the job will be left in a PENDING state (possibly indefinitely). The default time limit is the partition’s default time limit. A time limit of zero requests that no time limit be imposed. The acceptable time format is days-hours:minutes:seconds. Note: It’s mandatory to specify a time in your script. Jobs that don’t specify a time will be given a default time of 1-minute after which the job will be killed. This modification has been done to implement the new backfill scheduling algorithm and it won’t affect partition wall time. |
|
Defines user who will receive email notification of state changes as defined by --mail-type. |
|
Notifies user by email when certain event types occur. Valid type values are BEGIN, END, FAIL. The user to be notified is indicated with --mail-user. The values of the --mail-type directive can be declared in one line like so: |
|
Tells |
Creating and Submitting Jobs
Consider you’ve a script in one of the programming languages such as Python, MATLAB, C, or Java. How would you execute it using Slurm?
The below section explains a step by step process to creating and submitting a simple job. Also, the SBATCH script is created and used for the execution of a python script or project.
-
Login to Discovery
-
Create a new folder in your home directory called
myproject
and switch into the directory$ mkdir myproject && cd myproject
-
Create a new file called
script.sh
andscript.py
then copy and paste the codes in thescript.sh
andscript.py
tabs below respectively.$ vi script.sh && chmod +x script.sh
The latter command above, after the double ampersand
chmod +x script.sh
, makes the file executable after saving and exiting from the text editor$ vi script.py
#!/bin/bash #SBATCH --job-name=maxFib ## Name of the job #SBATCH --output=maxFib.out ## Output file #SBATCH --time=10:00 ## Job Duration #SBATCH --ntasks=1 ## Number of tasks (analyses) to run #SBATCH --cpus-per-task=1 ## The number of threads the code will use #SBATCH --mem-per-cpu=100M ## Real memory(MB) per CPU required by the job. ## Load the python interpreter module load python ## Execute the python script and pass the argument/input '90' srun python script.py 90
Here, 1 CPU with 100mb memory per CPU and 10 minutes of Walltime was requested for the task (Job steps). If the
--ntasks
is set to two, this means that the python program will be executed twice.Note that the number of tasks requested of Slurm is the number of processes that will be started by srun. After your script has been submitted and resources allocated, srun immediately executes the script on the remote host. It’s actually used to launch the processes. If your program is a parallel MPI program, srun takes care of creating all the MPI processes. If not, srun will run your program as many times as specified by the --ntasks option.
import sys import os if len(sys.argv) != 2: print('Usage: %s MAXIMUM' % (os.path.basename(sys.argv[0]))) sys.exit(1) maximum = int(sys.argv[1]) n1 = 1 n2 = 1 while n2 <= maximum: n1, n2 = n2, n1 + n2 print('The greatest Fibonacci number up to %d is %d' % (maximum, n1))
The python program accepts an integer value as an argument and then finds the greatest Fibonacci number closest to the value you provided.
-
Now, submit the batch script with the following command.
$ sbatch script.sh
After the job has been submitted, you should get an output similar to the one below but with a different
jobid
.Submitted batch job 215578
You can use the command below to check the progress of your submitted job in the queue.
syntax:
squeue -u <your username>
$ squeue -u vaduaka
Output
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 215578 normal maxFib vaduaka R 0:01 1 discovery-c3
-
Once your job has completed and no longer in the queue, you can run the
ls
command to show the list of files in your working directory.$ ls maxFib.out script.py script.sh
Now a new file called
maxFib.out
was generated and if you view its output with thecat
command, you should see something similar to the output below.$ cat maxFib.out
Output
The greatest Fibonacci number up to 90 is 89
Showing Information on Jobs
The sacct
command
To view the statistics of a completed job use the sacct
command.
syntax: sacct -j <job id>
or sacct -j <job id> --format=<params>
$ sacct -j 215578
Output
~~~~~~ JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
215578 maxFib normal nmsu 1 COMPLETED 0:0
215578.batch batch nmsu 1 COMPLETED 0:0
215578.0 python nmsu 1 COMPLETED 0:0
You can get statistics (accounting data) on completed jobs by passing either the jobID or username flags. Here, the command sacct -j 215578
is used to show statistics about the completed job. This shows information such as: the partition your job executed on, the account, and number of allocated CPUS per job steps. Also, the exit code and status (Completed, Pending, Failed, so on) for all jobs and job steps were displayed.
The first column describes the job IDs of the several job steps. Rows 1 and 2 are default job steps, with the first being the job script as a whole and the second being the SBATCH directives. The third row 215578.0
contains the information about the first process which ran using srun. Assuming if there are more srun commands the sub job IDs would increment as follows 215578.1 215578.2
.
-
You can also pass other parameters to the
sacct
command to retrieve extra details about the job.$ sacct -j 215578 --format=JobID,Start,End,Elapsed,NCPUS
Output
~~~~~~~JobID Start End Elapsed NCPUS ------------ ------------------- ------------------- ---------- ---------- 215578 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1 215578.batch 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1 215578.0 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1
On the output above, you can see information about the Start and End timestamp, number of CPUs, and the Elapsed time of the job.
-
You can also retrieve information about jobs that ran at a given period of time by passing a start or end time flags like so
sacct --starttime=2020-09-01 --format=jobid,jobname,exit,group,maxrss,comment,partition,nnodes,ncpus
Output
~~~~~~~JobID JobName ExitCode Group MaxRSS Partition NNodes AllocCPUS State ------------ ---------- -------- --------- ---------- ---------- -------- ---------- ---------- 213974 test 0:0 vaduaka normal 1 3 COMPLETED 213974.batch batch 0:0 0 1 3 COMPLETED 213974.exte+ extern 0:0 0 1 3 COMPLETED 213974.0 python 0:0 0 1 1 COMPLETED 213974.1 python 0:0 0 1 1 COMPLETED 213974.2 python 0:0 0 1 1 COMPLETED 215576 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215576.batch batch 0:0 0 1 1 COMPLETED 215576.exte+ extern 0:0 88K 1 1 COMPLETED 215576.0 python 0:0 0 1 1 COMPLETED 215577 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215577.batch batch 0:0 0 1 1 COMPLETED 215577.exte+ extern 0:0 84K 1 1 COMPLETED 215577.0 python 0:0 0 1 1 COMPLETED 215578 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215578.batch batch 0:0 0 1 1 COMPLETED 215578.exte+ extern 0:0 0 1 1 COMPLETED 215578.0 python 0:0 0 1 1 COMPLETED 215665 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215665.batch batch 0:0 0 1 1 COMPLETED 215665.exte+ extern 0:0 92K 1 1 COMPLETED 215665.0 python 0:0 0 1 1 COMPLETED
On the output above, you can see information about the job steps that was carried out throughout the entirety of the job. Also, the name of the job, exit-code, user group, the maximum resident set size of all tasks in job (size of RAM used at each task) were displayed. Furthermore, partition, number of nodes used, number of allocated CPUs, and state of the job were also shown.
For more details about using the
sacct
please use theman sacct
command.$ man sacct
To view a list of possible parameters you could pass to retrieve specific job details, use the sacct -e command.
$ sacct -e
Output
Account AdminComment AllocCPUS AllocGRES AllocNodes AllocTRES AssocID AveCPU AveCPUFreq AveDiskRead AveDiskWrite AvePages AveRSS AveVMSize BlockID Cluster Comment Constraints ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW DBIndex DerivedExitCode Elapsed ElapsedRaw Eligible End ExitCode Flags GID Group JobID JobIDRaw JobName Layout MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask MaxVMSize MaxVMSizeNode MaxVMSizeTask McsLabel MinCPU MinCPUNode MinCPUTask NCPUS NNodes NodeList NTasks Priority Partition QOS QOSRAW Reason ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ReqCPUS ReqGRES ReqMem ReqNodes ReqTRES Reservation ReservationId Reserved ResvCPU ResvCPURAW Start State Submit Suspended SystemCPU SystemComment Timelimit TimelimitRaw TotalCPU TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot UID User UserCPU WCKey WCKeyID WorkDir
The scontrol
command
For detailed information about a running/pending job, use the scontrol
command.
syntax: scontrol show jobid=<job id>
| scontrol show jobid <jobid>
$ scontrol show jobid 215578
Output
JobId=215578 JobName=maxFib
UserId=vaduaka(681432) GroupId=vaduaka(681432) MCS_label=N/A
Priority=191783 Nice=0 Account=nmsu QOS=normal
JobState=RUNNING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A
SubmitTime=2020-09-04T09:53:11 EligibleTime=2020-09-04T09:53:11
AccrueTime=2020-09-04T09:53:11
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-09-04T09:53:11
Partition=normal AllocNode:Sid=10.88.37.12:12411
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=100M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=100M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/vaduaka/python/fibonacci/script.sh
WorkDir=/home/vaduaka/python/fibonacci
StdErr=/home/vaduaka/python/fibonacci/maxFib.out
StdIn=/dev/null
StdOut=/home/vaduaka/python/fibonacci/maxFib.out
Power=
MailUser=(null) MailType=NONE
The information shown from running the scontrol
command works only on jobs in their running
state. To see all other options, run man scontrol
in your terminal.
The squeue
command
squeue is useful for viewing the status of jobs in the queue and how resources are being allocated. It answers questions like, Has resources been allocated to my job yet? How long has my job been running?
Syntax: squeue -u <username>
$ squeue -u vaduaka
Output
~JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
219373 normal camelCas vaduaka PD 0:00 1 (Resources)
219370 normal maxFib vaduaka R 0:01 1 discovery-c14
219371 normal camelCas vaduaka R 0:01 1 discovery-c14
219372 normal maxFib vaduaka R 0:01 1 discovery-c14
The information shown from running the squeue
command shows only your own jobs in the queue. For this reason, the -u
flag and username
were passed as an argument.
If you want to see a list of all jobs in the queue, you can use only the squeue
command. This will reveal a list of all the jobs running on the partition you are authorized to access. You wouldn’t be able to see all other jobs running on other partitions except you use the --all
flag.
-
Example with
squeue
$ squeue
Output summary
~JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 218983 normal run.sh viviliu PD 0:00 1 (Resources) 219407 normal run.sh viviliu PD 0:00 1 (Priority) 217794 normal JackNema cvelasco R 1-05:28:58 1 discovery-c14 218985 normal HWE gsmithvi R 1:03:57 1 discovery-c12 215745 normal S809f bryanbar R 5-03:25:57 3 discovery-c[9,11,13] 217799 normal LPT pcg1996 R 1-05:15:04 6 discovery-c[2-4,6-8] 214915 normal run.sh viviliu R 4-19:25:13 2 discovery-c[1,6] 216157 backfill BatComp jmwils R 2-05:48:53 1 discovery-g10 218982 normal run.sh viviliu R 4:52:15 4 discovery-c[4,8,10,12]
Job queue headers explained
Header
Description
JOBID
A unique identifier that’s used by many Slurm commands when actions must be taken about one particular job.
PARTITION
The partition where the job is being executed.
NAME
The name of your job. This is because of setting the
--jobname
parameter in your SBATCH script.USER
The user who submitted the job or job owner.
STATE
The current state of the job in an abbreviated form.
Code Status Description PD
PENDING
Jobs awaiting resource allocation.
CG
COMPLETING
Job is done executing and has some ongoing processes that are being finalized.
CD
COMPLETED
Job has completed successfully.
R
RUNNING
Job has been allocated resources and is being processed by the compute node(s).
F
FAILED
The job terminated with a non-zero code and stopped executing.
TIME
The job duration so far. This starts reading only when the job has entered the running state.
NODES
The amount of nodes used by the job.
NODELIST(REASON)
This is actually holds node list information and the reasons why a job is in a state other than the running state.
Requesting Resources
The sinfo
command
The sinfo command is used to view partition and node information for a system running Slurm. This command can answer questions: How many nodes are at maximum? What are my chances of getting on soon?
Syntax: sinfo
or sinfo --[optional flags]
sinfo
Output
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 7-01:00:00 11 mix discovery-c[1-5,8-13]
normal* up 7-01:00:00 4 idle discovery-c[6-7,14-15]
gpu up 7-01:00:00 1 mix discovery-g[1,16]
interactive up 1-01:00:00 4 idle discovery-c[34-35],discovery-g[14-15]
backfill up 14-02:00:0 13 mix discovery-c[1-5,8-13,16],discovery-g[1,16]
backfill up 14-02:00:0 39 idle discovery-c[6-7,14-15,17-35],discovery-g[2-15],discovery-c[37-38]
The output above shows a list of the partitions on the Discovery cluster that you are only authorized to use.
Example with --all
flag
sinfo --all
Output
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 7-01:00:00 8 mix discovery-c[1-2,8-13]
normal* up 7-01:00:00 7 idle discovery-c[3-7,14-15]
gpu up 7-01:00:00 1 mix discovery-g[1,16]
interactive up 1-01:00:00 2 idle* discovery-c[34-35]
interactive up 1-01:00:00 2 idle discovery-g[14-15]
backfill up 14-02:00:0 2 idle* discovery-c[34-35]
backfill up 14-02:00:0 10 mix discovery-c[1-2,8-13,16],discovery-g[1,16]
backfill up 14-02:00:0 40 idle discovery-c[3-7,14-15,17-33],discovery-g[2-15],discovery-c[37-38]
iiplab up 7-01:00:00 1 idle discovery-g7
cfdlab up 7-01:00:00 1 mix discovery-c16
cfdlab up 7-01:00:00 14 idle discovery-c[17-25],discovery-g[2-6]
cfdlab-debug up 1:00:00 1 mix discovery-c16
cfdlab-debug up 1:00:00 14 idle discovery-c[17-25],discovery-g[2-6]
osg up 1-01:00:00 10 mix discovery-c[1-2,8-13,16],discovery-g1
osg up 1-01:00:00 38 idle discovery-c[3-7,14-15,17-33],discovery-g[2-13],discovery-c[37-38]
covid19 up 1-01:00:00 10 mix discovery-c[1-2,8-13,16],discovery-g1
covid19 up 1-01:00:00 38 idle discovery-c[3-7,14-15,17-33],discovery-g[2-13],discovery-c[37-38]
The output above shows a list of the entire partition on the Discovery cluster.
Header | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
|
The list of the cluster’s partitions. It’s a set of compute nodes grouped logically |
||||||||
|
The active state of the partition. (up, down, idle) |
||||||||
|
The maximum job execution |
||||||||
|
The total number of nodes per partition. |
||||||||
|
|
||||||||
|
The list of nodes per partition. |
For more details about partitions on Discovery please visit the link → Partitions in Discovery