Basic Slurm Commands

The table below shows the list and descriptions of the mostly used Slurm commands.

Commands Syntax Description

sbatch

sbatch <job-id>

Submit a batch script to Slurm for processing.

squeue

squeue -u

Show information about your job(s) in the queue. The command when run without the -u flag, shows a list of your job(s) and all other jobs in the queue.

srun

srun <resource-parameters>

Run jobs interactively on the cluster.

skill/scancel

scancel <job-id>

End or cancel a queued job.

sacct

sacct

Show information about current and previous jobs.

sinfo

sinfo

Get information about the resources on available nodes that make up the HPC cluster.

Slurm Script Main Parts

In creating a Slurm script, there are 4 main parts that are mandatory in order for your job to be successfully processed.

  1. Shebang The Shebang command tells the shell (which interprets the UNIX commands) to interpret and run the Slurm script using the bash (Bourne-again shell) shell.

    This line should always be added at the very top of your SBATCH/Slurm script.

    #!/bin/bash
  2. Resource Request In this section, the amount of resources required for the job to run on the compute nodes are specified. This informs Slurm about the name of the job, output filename, amount of RAM, Nos. of CPUs, nodes, tasks, time, and other parameters to be used for processing the job.

    These SBATCH commands are also know as SBATCH directives and must be preceded with a pound sign and should be in an uppercase format as shown below.

    #SBATCH --job-name=TestJob
    #SBATCH --output=TestJob.out
    #SBATCH --time=1-00:10:00
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=1
    #SBATCH --mem-per-cpu=500M
  3. Dependencies Load all the software that the project depends on to execute. For example, if you are working on a python project, you’d definitely require the python software or module to interpret and run your code. Please visit the link → Module Environments and Commands page for more details about using modules on Discovery.

    module load python
  4. Job Steps Specify the list of tasks to be carried out.

    srun echo "Start process"
    srun hostname
    srun sleep 30
    srun echo "End process"

Putting it all together

Please note that the lines with the double pound signs (##) are comments when used in batch scripts.

## Shebang
#!/bin/bash

## Resource Request
#SBATCH --job-name=TestJob
#SBATCH --output=TestJob.out
#SBATCH --time=1-00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=500M

## Job Steps
srun echo "`Start process`"
srun hostname
srun sleep 30
srun echo "`End process`"

In the script above, 1 Node with 1 CPU, 500MB of memory per CPU, 10 minutes of Walltime was requested for the tasks (Job steps). Note that all the job steps that begin with the srun command will execute sequentially as one task by one CPU only.

The first job step will run the Linux echo command and output Start process. The next job step(2) will echo the Hostname of the compute node that executed the job. Then, the next job step will execute the Linux sleep command for 30 seconds. The final job step will just echo out End process. Note that these job steps executed sequentially and not in parallel.

It’s important to set a limit on the total run time of the job allocation, this helps the Slurm manager to handle prioritization and queuing efficiently. The above example is a very simple script which takes less than a second. Hence, it’s important to specify the run time limit so that Slurm doesn’t see the job as one that requires a lot of time to execute.

SBATCH Directives Explained

sbatch is used to submit a job script for later execution.

Lines that begin with #SBATCH in all caps is treated as a command by Slurm. This means that to comment out a Slurm command, you need to append a second another pound sign # to the SBATCH command (#SBATCH means Slurm command, ##SBATCH means comment).

The SBATCH lines in the script below contains directives and it’s recommended to use as a default for all your job submissions. However, the directive with the --mail parameters are optional.

#SBATCH --job-name myJobName
#SBATCH --output myJobName.o%j
#SBATCH --partition normal
#SBATCH --ntasks 3
#SBATCH --cpus-per-task 16
#SBATCH --mem-per-cpu 700M
#SBATCH --time 0-00:10:00
#SBATCH --mail-user yourlD@nmsu.edu
#SBATCH --mail-type BEGIN
#SBATCH --mail-type END
#SBATCH --mail-type FAIL
#SBATCH --get-user-env

Parameters Explained

Directives                   Description

--job-name

Specifies a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just sbatch if the script is read on sbatch’s standard input.

--output

Instructs Slurm to connect the batch script’s standard output directly to the filename. If not specified, the default filename is slurm-jobID.out.

--partition

Requests a specific partition for the resource allocation (gpu, interactive, normal). If not specified, the default partition is normal.

--ntasks

This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and offer enough resources. The default is 1 task per node, but note that the --cpus-per-task option will change this default.

--cpus-per-task

Advises the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to assign one processor per task. For instance, consider an application that has 4 tasks, each requiring 3 processors. If the HPC cluster is comprised of quad-processors nodes and simply ask for 12 processors, the controller might give only 3 nodes. However, by using the --cpus-per-task=3 options, the controller knows that each task requires 3 processors on the same node. Hence, the controller will grant allocation of 4 nodes, one for each of the 4 tasks.

--mem-per-cpu

This is the minimum memory required per allocated CPU. Note: It’s highly recommended to specify --mem-per-cpu. If not, the default setting of 500MB will be assigned per CPU.

--time

Sets a limit on the total run time of the job allocation. If the requested time limit exceeds the partition’s time limit, the job will be left in a PENDING state (possibly indefinitely). The default time limit is the partition’s default time limit. A time limit of zero requests that no time limit be imposed. The acceptable time format is days-hours:minutes:seconds. Note: It’s mandatory to specify a time in your script. Jobs that don’t specify a time will be given a default time of 1-minute after which the job will be killed. This modification has been done to implement the new backfill scheduling algorithm and it won’t affect partition wall time.

--mail-user

Defines user who will receive email notification of state changes as defined by --mail-type.

--mail-type

Notifies user by email when certain event types occur. Valid type values are BEGIN, END, FAIL. The user to be notified is indicated with --mail-user. The values of the --mail-type directive can be declared in one line like so: --mail-type BEGIN, END, FAIL

--get-user-env

Tells sbatch to retrieve the login environment variables. Be aware that any environment variables already set in sbatch environment will take precedence over any environment variables in the user’s login environment. Clear any environment variables before calling sbatch that you don’t want to be propagated to the spawned program.

Creating and Submitting Jobs

Consider you’ve a script in one of the programming languages such as Python, MatLab, C, or Java. How would you execute it using Slurm?

The below section explains a step by step process to creating and submitting a simple job. Also, the SBATCH script is created and used for the execution of a python script or project.

  1. Login to Discovery

  2. Create a new folder in your home directory called myproject and switch into the directory

    $ mkdir myproject && cd myproject
  3. Create a new file called script.sh and script.py then copy and paste the codes in the script.sh and script.py tabs below respectively.

    $ vi script.sh && chmod +x script.sh

    The latter command above, after the double ampersand chmod +x script.sh, makes the file executable after saving and exiting from the text editor

    $ vi script.py
    • script.sh

    • script.py

    #!/bin/bash
    
    #SBATCH --job-name=maxFib      ## Name of the job
    #SBATCH --output=maxFib.out    ## Output file
    #SBATCH --time=10:00           ## Job Duration
    #SBATCH --ntasks=1             ## Number of tasks (analyses) to run
    #SBATCH --cpus-per-task=1      ## The number of threads the code will use
    #SBATCH --mem-per-cpu=100M     ## Real memory(MB) per CPU required by the job.
    
    ## Load the python interpreter
    module load python
    
    ## Execute the python script and pass the argument/input '90'
    srun python script.py 90

    Here, 1 CPU with 100mb memory per CPU and 10 minutes of Walltime was requested for the task (Job steps). If the --ntasks is set to two, this means that the python program will be executed twice.

    Note that the number of tasks requested of Slurm is the number of processes that will be started by srun. After your script has been submitted and resources allocated, srun immediately executes the script on the remote host. It’s actually used to launch the processes. If your program is a parallel MPI program, srun takes care of creating all the MPI processes. If not, srun will run your program as many times as specified by the --ntasks option.

    import sys
    import os
    
    if len(sys.argv) != 2:
      print('Usage: %s MAXIMUM' % (os.path.basename(sys.argv[0])))
             sys.exit(1)
    
    maximum = int(sys.argv[1])
    
    n1 = 1
    n2 = 1
    
    while n2 <= maximum:
      n1, n2 = n2, n1 + n2
    
    print('The greatest Fibonacci number up to %d is %d' % (maximum, n1))

    The python program accepts an integer value as an argument and then finds the greatest Fibonacci number closest to the value you provided.

  4. Now, submit the batch script with the following command.

    $ sbatch script.sh

    After the job has been submitted, you should get an output similar to the one below but with a different jobid.

    Submitted batch job 215578

    You can use the command below to check the progress of your submitted job in the queue.

    syntax: squeue -u <your username>

    $ squeue -u vaduaka

    Output

    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    215578    normal   maxFib  vaduaka  R       0:01      1 discovery-c3
  5. Once your job has completed and no longer in the queue, you can run the ls command to show the list of files in your working directory.

    $ ls
    maxFib.out  script.py  script.sh

    Now a new file called maxFib.out was generated and if you view its output with the cat command, you should see something similar to the output below.

    $ cat maxFib.out

    Output

    The greatest Fibonacci number up to 90 is 89

Showing Information on Jobs

The sacct command

To view the statistics of a completed job use the sacct command.

syntax: sacct -j <job id> or sacct -j <job id> --format=<params>

$ sacct -j 215578

Output

~~~~~~ JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
215578           maxFib     normal       nmsu          1  COMPLETED      0:0
215578.batch      batch                  nmsu          1  COMPLETED      0:0
215578.0         python                  nmsu          1  COMPLETED      0:0

You can get statistics (accounting data) on completed jobs by passing either the jobID or username flags. Here, the command sacct -j 215578 is used to show statistics about the completed job. This shows information such as: the partition your job executed on, the account, and number of allocated CPUS per job steps. Also, the exit code and status (Completed, Pending, Failed, so on) for all jobs and job steps were displayed.

The first column describes the job IDs of the several job steps. Rows 1 and 2 are default job steps, with the first being the job script as a whole and the second being the SBATCH directives. The third row 215578.0 contains the information about the first process which ran using srun. Assuming if there are more srun commands the sub job IDs would increment as follows 215578.1 215578.2.

  • You can also pass other parameters to the sacct command to retrieve extra details about the job.

    $ sacct -j 215578 --format=JobID,Start,End,Elapsed,NCPUS

    Output

    ~~~~~~~JobID               Start                 End    Elapsed      NCPUS
    ------------ ------------------- ------------------- ---------- ----------
    215578       2020-09-04T09:53:11 2020-09-04T09:53:11   00:00:00          1
    215578.batch 2020-09-04T09:53:11 2020-09-04T09:53:11   00:00:00          1
    215578.0     2020-09-04T09:53:11 2020-09-04T09:53:11   00:00:00          1

    On the output above, you can see information about the Start and End timestamp, number of CPUs, and the Elapsed time of the job.

  • You can also retrieve information about jobs that ran at a given period of time by passing a start or end time flags like so

    sacct --starttime=2020-09-01 --format=jobid,jobname,exit,group,maxrss,comment,partition,nnodes,ncpus

    Output

    ~~~~~~~JobID    JobName ExitCode     Group     MaxRSS  Partition   NNodes  AllocCPUS      State
    ------------ ---------- -------- --------- ---------- ---------- -------- ---------- ----------
    213974             test      0:0   vaduaka                normal        1          3  COMPLETED
    213974.batch      batch      0:0                    0                   1          3  COMPLETED
    213974.exte+     extern      0:0                    0                   1          3  COMPLETED
    213974.0         python      0:0                    0                   1          1  COMPLETED
    213974.1         python      0:0                    0                   1          1  COMPLETED
    213974.2         python      0:0                    0                   1          1  COMPLETED
    215576           maxFib      0:0   vaduaka                normal        1          1  COMPLETED
    215576.batch      batch      0:0                    0                   1          1  COMPLETED
    215576.exte+     extern      0:0                  88K                   1          1  COMPLETED
    215576.0         python      0:0                    0                   1          1  COMPLETED
    215577           maxFib      0:0   vaduaka                normal        1          1  COMPLETED
    215577.batch      batch      0:0                    0                   1          1  COMPLETED
    215577.exte+     extern      0:0                  84K                   1          1  COMPLETED
    215577.0         python      0:0                    0                   1          1  COMPLETED
    215578           maxFib      0:0   vaduaka                normal        1          1  COMPLETED
    215578.batch      batch      0:0                    0                   1          1  COMPLETED
    215578.exte+     extern      0:0                    0                   1          1  COMPLETED
    215578.0         python      0:0                    0                   1          1  COMPLETED
    215665           maxFib      0:0   vaduaka                normal        1          1  COMPLETED
    215665.batch      batch      0:0                    0                   1          1  COMPLETED
    215665.exte+     extern      0:0                  92K                   1          1  COMPLETED
    215665.0         python      0:0                    0                   1          1  COMPLETED

    On the output above, you can see information about the job steps that was carried out throughout the entirety of the job. Also, the name of the job, exit-code, user group, the maximum resident set size of all tasks in job (size of RAM used at each task) were displayed. Furthermore, partition, number of nodes used, number of allocated CPUs, and state of the job were also shown.

    For more details about using the sacct please use the man sacct command.

    $ man sacct

    To view a list of possible parameters you could pass to retrieve specific job details, use the sacct -e command.

    $ sacct -e

    Output

        Account             AdminComment        AllocCPUS           AllocGRES
        AllocNodes          AllocTRES           AssocID             AveCPU
        AveCPUFreq          AveDiskRead         AveDiskWrite        AvePages
        AveRSS              AveVMSize           BlockID             Cluster
        Comment             Constraints         ConsumedEnergy      ConsumedEnergyRaw
        CPUTime             CPUTimeRAW          DBIndex             DerivedExitCode
        Elapsed             ElapsedRaw          Eligible            End
        ExitCode            Flags               GID                 Group
        JobID               JobIDRaw            JobName             Layout
        MaxDiskRead         MaxDiskReadNode     MaxDiskReadTask     MaxDiskWrite
        MaxDiskWriteNode    MaxDiskWriteTask    MaxPages            MaxPagesNode
        MaxPagesTask        MaxRSS              MaxRSSNode          MaxRSSTask
        MaxVMSize           MaxVMSizeNode       MaxVMSizeTask       McsLabel
        MinCPU              MinCPUNode          MinCPUTask          NCPUS
        NNodes              NodeList            NTasks              Priority
        Partition           QOS                 QOSRAW              Reason
        ReqCPUFreq          ReqCPUFreqMin       ReqCPUFreqMax       ReqCPUFreqGov
        ReqCPUS             ReqGRES             ReqMem              ReqNodes
        ReqTRES             Reservation         ReservationId       Reserved
        ResvCPU             ResvCPURAW          Start               State
        Submit              Suspended           SystemCPU           SystemComment
        Timelimit           TimelimitRaw        TotalCPU            TRESUsageInAve
        TRESUsageInMax      TRESUsageInMaxNode  TRESUsageInMaxTask  TRESUsageInMin
        TRESUsageInMinNode  TRESUsageInMinTask  TRESUsageInTot      TRESUsageOutAve
        TRESUsageOutMax     TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin
        TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot     UID
        User                UserCPU             WCKey               WCKeyID
        WorkDir

The scontrol command

For detailed information about a running/pending job, use the scontrol command.

syntax: scontrol show jobid=<job id> | scontrol show jobid <jobid>

$ scontrol show jobid 215578

Output

JobId=215578 JobName=maxFib
UserId=vaduaka(681432) GroupId=vaduaka(681432) MCS_label=N/A
Priority=191783 Nice=0 Account=nmsu QOS=normal
JobState=RUNNING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A
SubmitTime=2020-09-04T09:53:11 EligibleTime=2020-09-04T09:53:11
AccrueTime=2020-09-04T09:53:11
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-09-04T09:53:11
Partition=normal AllocNode:Sid=10.88.37.12:12411
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=100M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=100M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/vaduaka/python/fibonacci/script.sh
WorkDir=/home/vaduaka/python/fibonacci
StdErr=/home/vaduaka/python/fibonacci/maxFib.out
StdIn=/dev/null
StdOut=/home/vaduaka/python/fibonacci/maxFib.out
Power=
MailUser=(null) MailType=NONE

The information shown from running the scontrol command works only on jobs in their running state. To see all other options, run man scontrol in your terminal.

The squeue command

squeue is useful for viewing the status of jobs in the queue and how resources are being allocated. It answers questions like, Has resources been allocated to my job yet? How long has my job been running?

Syntax: squeue -u <username>

$ squeue -u vaduaka

Output

~JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
219373    normal camelCas  vaduaka PD       0:00      1 (Resources)
219370    normal   maxFib  vaduaka  R       0:01      1 discovery-c14
219371    normal camelCas  vaduaka  R       0:01      1 discovery-c14
219372    normal   maxFib  vaduaka  R       0:01      1 discovery-c14

The information shown from running the squeue command shows only your own jobs in the queue. For this reason, the -u flag and username were passed as an argument.

If you want to see a list of all jobs in the queue, you can use only the squeue command. This will reveal a list of all the jobs running on the partition you are authorized to access. You wouldn’t be able to see all other jobs running on other partitions except you use the --all flag.

  • Example with squeue

    $ squeue

    Output summary

    ~JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    218983    normal   run.sh  viviliu PD       0:00      1 (Resources)
    219407    normal   run.sh  viviliu PD       0:00      1 (Priority)
    217794    normal JackNema cvelasco  R 1-05:28:58      1 discovery-c14
    218985    normal      HWE gsmithvi  R    1:03:57      1 discovery-c12
    215745    normal    S809f bryanbar  R 5-03:25:57      3 discovery-c[9,11,13]
    217799    normal      LPT  pcg1996  R 1-05:15:04      6 discovery-c[2-4,6-8]
    214915    normal   run.sh  viviliu  R 4-19:25:13      2 discovery-c[1,6]
    216157  backfill  BatComp   jmwils  R 2-05:48:53      1 discovery-g10
    218982    normal   run.sh  viviliu  R    4:52:15      4 discovery-c[4,8,10,12]

    Job queue headers explained

    Header

    Description

    JOBID

    A unique identifier that’s used by many Slurm commands when actions must be taken about one particular job.

    PARTITION

    The partition where the job is being executed.

    NAME

    The name of your job. This is because of setting the --jobname parameter in your SBATCH script.

    USER

    The user who submitted the job or job owner.

    STATE

    The current state of the job in an abbreviated form.

    Code Status Description

    PD

    PENDING

    Jobs awaiting resource allocation.

    CG

    COMPLETING

    Job is done executing and has some ongoing processes that are being finalized.

    CD

    COMPLETED

    Job has completed successfully.

    R

    RUNNING

    Job has been allocated resources and is being processed by the compute node(s).

    F

    FAILED

    The job terminated with a non-zero code and stopped executing.

    TIME

    The job duration so far. This starts reading only when the job has entered the running state.

    NODES

    The amount of nodes used by the job.

    NODELIST(REASON)

    This is actually holds node list information and the reasons why a job is in a state other than the running state.

Requesting Resources

The sinfo command

The sinfo command is used to view partition and node information for a system running Slurm. This command can answer questions: How many nodes are at maximum? What are my chances of getting on soon?

Syntax: sinfo or sinfo --[optional flags]

sinfo

Output

PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*        up 7-01:00:00     11    mix discovery-c[1-5,8-13]
normal*        up 7-01:00:00      4   idle discovery-c[6-7,14-15]
gpu            up 7-01:00:00      1    mix discovery-g[1,16]
interactive    up 1-01:00:00      4   idle discovery-c[34-35],discovery-g[14-15]
backfill       up 14-02:00:0     13    mix discovery-c[1-5,8-13,16],discovery-g[1,16]
backfill       up 14-02:00:0     39   idle discovery-c[6-7,14-15,17-35],discovery-g[2-15],discovery-c[37-38]

The output above shows a list of the partitions on the Discovery cluster that you are only authorized to use.

Example with --all flag

sinfo --all

Output

PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*         up 7-01:00:00      8    mix discovery-c[1-2,8-13]
normal*         up 7-01:00:00      7   idle discovery-c[3-7,14-15]
gpu             up 7-01:00:00      1    mix discovery-g[1,16]
interactive     up 1-01:00:00      2  idle* discovery-c[34-35]
interactive     up 1-01:00:00      2   idle discovery-g[14-15]
backfill        up 14-02:00:0      2  idle* discovery-c[34-35]
backfill        up 14-02:00:0     10    mix discovery-c[1-2,8-13,16],discovery-g[1,16]
backfill        up 14-02:00:0     40   idle discovery-c[3-7,14-15,17-33],discovery-g[2-15],discovery-c[37-38]
iiplab          up 7-01:00:00      1   idle discovery-g7
cfdlab          up 7-01:00:00      1    mix discovery-c16
cfdlab          up 7-01:00:00     14   idle discovery-c[17-25],discovery-g[2-6]
cfdlab-debug    up    1:00:00      1    mix discovery-c16
cfdlab-debug    up    1:00:00     14   idle discovery-c[17-25],discovery-g[2-6]
osg             up 1-01:00:00     10    mix discovery-c[1-2,8-13,16],discovery-g1
osg             up 1-01:00:00     38   idle discovery-c[3-7,14-15,17-33],discovery-g[2-13],discovery-c[37-38]
covid19         up 1-01:00:00     10    mix discovery-c[1-2,8-13,16],discovery-g1
covid19         up 1-01:00:00     38   idle discovery-c[3-7,14-15,17-33],discovery-g[2-13],discovery-c[37-38]

The output above shows a list of the entire partition on the Discovery cluster.

Header Description

PARTITION

The list of the cluster’s partitions. It’s a set of compute nodes grouped logically

AVAIL

The active state of the partition. (up, down, idle)

TIMELIMIT

The maximum job execution walltime per partition.

NODES

The total number of nodes per partition.

STATE

State

Description

mix

Only part of the node is allocated to one or more jobs and the rest in an Idle state.

alloc

The entire resource on the node(s) is being utilized

idle

The node is in an idle start and has none of it’s resources being used..

NODELIST(REASON)

The list of nodes per partition.

For more details about partitions on Discovery please visit the link → Partitions in Discovery

Terminating Jobs

The scancel or skill command

The scancel command is used to kill or end the current state(Pending, running) of your job in the queue.

Syntax: scancel <jobid> or skill <jobid>

scancel 219373

Or

skill 219373

Please note that a user can’t delete the jobs of another user.