Slurm Commands

Commands List

Slurm offers many commands you can use to interact with the system and retrieve helpful information about your job. Below is the list of the common Slurm commands:

Commands Syntax Description

sbatch

sbatch <job-id>

Submit a batch script to Slurm for processing.

squeue

squeue -u

Show information about your job(s) in the queue. The command when run without the -u flag, shows a list of your job(s) and all other jobs in the queue.

srun

srun <resource-parameters>

Run jobs interactively on the cluster.

skill/scancel

scancel <job-id>

End or cancel a queued job.

sacct

sacct

Show information about current and previous jobs.

sinfo

sinfo

Get information about the resources on available nodes that make up the HPC cluster.

The screenshot below depicts an example use of all the commands

Slurm commands

The sinfo Command

The sinfo command gives an overview of the resources offered by the cluster. It lists the partitions that are available. A partition is a set of compute nodes grouped logically. This command can answer questions: How many nodes are at maximum? What are my chances of getting on soon?

Syntax: sinfo or sinfo --[optional flags]

sinfo

Output

PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*        up 7-01:00:00      8    mix discovery-c[1-2,8-13]
normal*        up 7-01:00:00      7   idle discovery-c[3-7,14-15]
gpu            up 7-01:00:00      1    mix discovery-g[1,16]
interactive    up 1-01:00:00      4   idle discovery-c[34-35],discovery-g[14-15]
backfill       up 14-02:00:0     10    mix discovery-c[1-2,8-13,16],discovery-g[1,16]
backfill       up 14-02:00:0     42   idle discovery-c[3-7,14-15,17-35],discovery-g[2-15],discovery-c[37-38]

The output above shows a list of the partitions on the Discovery cluster that you are only authorized to use.

Example with --all flag

sinfo --all

Output

PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*         up 7-01:00:00     11    mix discovery-c[1-5,8-13]
normal*         up 7-01:00:00      4   idle discovery-c[6-7,14-15]
gpu             up 7-01:00:00      1    mix discovery-g[1,16]
interactive     up 1-01:00:00      4   idle discovery-c[34-35],discovery-g[14-15]
backfill        up 14-02:00:0     13    mix discovery-c[1-5,8-13,16],discovery-g1
backfill        up 14-02:00:0     39   idle discovery-c[6-7,14-15,17-35],discovery-g[2-15],discovery-c[37-38]
iiplab          up 7-01:00:00      1   idle discovery-g7
cfdlab          up 7-01:00:00      1    mix discovery-c16
cfdlab          up 7-01:00:00     14   idle discovery-c[17-25],discovery-g[2-6]
cfdlab-debug    up    1:00:00      1    mix discovery-c16
cfdlab-debug    up    1:00:00     14   idle discovery-c[17-25],discovery-g[2-6]
osg             up 1-01:00:00     13    mix discovery-c[1-5,8-13,16],discovery-g[1,16]
osg             up 1-01:00:00     35   idle discovery-c[6-7,14-15,17-33],discovery-g[2-13],discovery-c[37-38]
covid19         up 1-01:00:00     13    mix discovery-c[1-5,8-13,16],discovery-g[1,16]
covid19         up 1-01:00:00     35   idle discovery-c[6-7,14-15,17-33],discovery-g[2-13],discovery-c[37-38]

The output above shows a list of the entire partition on the Discovery cluster.

Header Explained

Header Description

PARTITION

The list of the cluster’s partitions. It’s a set of compute nodes grouped logically

AVAIL

The active state of the partition. (up, down, idle)

TIMELIMIT

The maximum job execution walltime per partition.

NODES

The total number of nodes per partition.

STATE

State

Description

mix

Only part of the node is allocated to one or more jobs and the rest in an idle state.

alloc

The entire resource on the node(s) is being utilized

idle

The node is in an idle start and has none of its resources being used.

NODELIST(REASON)

The list of nodes per partition.

The squeue Command

The squeue command is used to show how resources are currently allocated (For example, which node is used by which job). It shows the list of jobs which are currently running ® or waiting for resources (pending- PD). It answers questions like: Where are you in the queue? How long has my job been running?

squeue

Syntax: squeue -u <username>

$ squeue -u vaduaka

Output

~JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
219373    normal camelCas  vaduaka PD       0:00      1 (Resources)
219370    normal   maxFib  vaduaka  R       0:01      1 discovery-c14
219371    normal camelCas  vaduaka  R       0:01      1 discovery-c14
219372    normal   maxFib  vaduaka  R       0:01      1 discovery-c14

The information shown from running the squeue command shows only your own jobs in the queue because the -u flag and username was passed as an argument.

If you want to see a list of all jobs in the queue, you can use only the squeue command. This will reveal a list of all the jobs running on the partition you are authorized to access. You wouldn’t be able to see all other jobs running on other partitions except you use the --all flag.

  • Example with squeue

    $ squeue

    Output summary

    ~JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    218983    normal   run.sh  viviliu PD       0:00      1 (Resources)
    219407    normal   run.sh  viviliu PD       0:00      1 (Priority)
    217794    normal JackNema cvelasco  R 1-05:28:58      1 discovery-c14
    218985    normal      HWE gsmithvi  R    1:03:57      1 discovery-c12
    215745    normal    S809f bryanbar  R 5-03:25:57      3 discovery-c[9,11,13]
    217799    normal      LPT  pcg1996  R 1-05:15:04      6 discovery-c[2-4,6-8]
    214915    normal   run.sh  viviliu  R 4-19:25:13      2 discovery-c[1,6]
    216157  backfill  BatComp   jmwils  R 2-05:48:53      1 discovery-g10
    218982    normal   run.sh  viviliu  R    4:52:15      4 discovery-c[4,8,10,12]

Job queue header explained

JOBID

A unique identifier that’s used by many Slurm commands when actions must be taken about one particular job.

PARTITION

The node partition your job is running on.

NAME

The name of your job

USER

The owner of the job

STATUS

The status of the job. (R)Running (PD)Pending (CG)Completing

TIME

The time the job has been running until now.

NODE

The number of nodes which are allocated to the job, while the NODELIST column lists the nodes which have been allocated for running jobs. For pending jobs, that column gives the reason why the job is pending. In the example, job 218983 is pending because requested resources (CPUs, or other) aren’t available in enough amounts, while job 219407 is waiting for another job whose priority is higher, to run.

The scancel Command

Use the scancel command to delete a job, for example, scancel 8603 deletes the job with ID 8603.  A user can delete his/her own jobs at any time, whether the job is pending (waiting in the queue) or running.  The command skill will do the same.

Syntax: scancel <jobid> or skill <jobid>

scancel 219373

Or

skill 219373
A user can’t delete the jobs of another user.

The srun Command

srun is a multipurpose command that can be used in a submission script to create job steps and used to launch processes interactively via the terminal. For instance, If you have a parallel MPI program, srun takes care of creating all the MPI processes. Prefixing srun to your job steps causes the script to be executed on the compute nodes.

The srun command can be used in two ways:

  1. In job scripts

  2. Interactively

Srun used in a Job Script

#!/bin/bash

#SBATCH --job-name testJob
#SBATCH --output testJob.out
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --time 10:00
#SBATCH --mem-per-cpu 100M

srun echo "Start process"
srun hostname
srun sleep 30
srun echo "End process"
The srun command in this context will run your program as many times as specified by the --ntasks. For example, if --ntasks=2, every command in the job step will be executed twice.

Srun used Interactively

$ srun -n 1 --time=1:00:00 --partition=normal ./test.sh
The -n flag specifies the number of tasks (--ntasks) to run followed by the --time flag, for the duration and the --partition flag, for what partition(`normal, backfill, and so on. ) to run your job in.

Srun Bash (Login Shell)

srun can also be used to run a login shell on the compute nodes and run commands interactively. Consider the below example.

srun -N 1 -n 2 -p backfill -t 00:02:00 --pty /bin/bash----

Break down of the parts of the command into a table format:

-N

The number of nodes to use. Same as --nodes

-n

The number of tasks to run. Same as --ntasks

-p

The partition to use. Same as --partition

-t

The execution wall time. Same as --time

--pty

The pseudoterminal (/bin/bash)

Basically, the command above simply means that you want to run a login shell (/bin/bash) on the compute nodes. When the command is executed, you’ll automatically get an interactive session on one of the compute nodes after which you can then run your commands interactively. You are also put into the working directory from which you ran the launched session. It’s crucial to specify the --pty flag for this to work as intended.

Now that you are logged into one of the compute nodes, whatever command you run. For Example, srun hostname will be executed twice (-n 2), based on the value of the srun flags declared above. You can keep running as many commands as you want. But, bear in mind that your interactive session on the compute node you are currently on, will be killed when the wall time specified above elapses. Please see the example below.

srun-command.png

From the above image, one can infer that the session switch from the login node to one of the compute nodes after the srun command gets executed. You can also see the output of the second command executed on the compute node Hello from discovery-c3.cluster.local. Note that the last command sends your output to a file named myhostnameFile.out which eventually appears in the working directory from which you ran the launched session.

For more details on other useful srun flags please visit Slurm’S documentation on srun here point_right https://slurm.schedmd.com/srun.html

Srun for Parallel Execution

srun is mostly used in the context of executing a parallel job on a cluster managed by Slurm. It first creates a resource allocation in which to run the parallel job.

#!/bin/bash

#SBATCH --job-name testJob
#SBATCH --output testJob.out
#SBATCH --ntasks 4
#SBATCH --cpus-per-task 1
#SBATCH --mem-per-cpu 500M
#SBATCH --time 00:10:00


srun -n 1 sleep 10 &
srun -n 1 sleep 20 &
srun -n 2 hostname &
wait

In the example above, there are 3 job steps and 4 tasks in total which is equal to the total number of tasks in the srun declarations. Each task gets 1 CPU and each CPU gets 500mb of memory. This means that the last srun command will be executed twice.

The ampersand symbol at the end of every srun command is used to run commands simultaneously. This removes the blocking feature of the srun command which makes it interactive but non-blocking. It’s vital to use the wait command when using ampersand to run commands simultaneously. This is because it ensures that a given task doesn’t cancel itself due to the completion of another task. In other words, without the wait command, task 0 would cancel itself, given task 1 or 2 completed successfully.

The total number of tasks in the job step(srun -n flags) must be equal to the --ntasks value. Here, the -n flag and its value (1) in a serial execution context is used to specify the number of times a given program should be executed. For example, if srun -n 2 hostname is specified, the output will show hostname twice.

Srun for MPI Applications

The code below depicts an example of an MPI job resource request.

#!/bin/bash

#SBATCH --job-name=testJob
#SBATCH --output=testJob.out
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu 600M
#SBATCH --time 00:10:00

## Load MPI module
module load mpich/3.3.1-gcc-9.2.0-upe2lbc

## Number of threads to use
export OMP_NUM_THREADS=8

## Run MPI program
srun --mpi=pmi2 ./path/to/yourApp

This would request 8 tasks, which corresponds to 8 MPI ranks. Each of these tasks will have one core each and therefore constrained to run on one server --ntasks-per-node. Note that on the last line, the application gets executed with srun --mpi=pmi2 and not mpiexec or mpirun. This just ensures that the process mapping is consistent between MPI and Slurm. Also, the lines that begin with the double pound signs are comments that will definitely be ignored during processing. OMP_NUM_THREADS just regulates the number of threads to be used.

The sbatch Command

The main way to run jobs on "Discovery" is by submitting a script with the SBATCH command. The parameters or Slurm directives specified in the file script.sh will then be run as soon as the required resources are available. For example, assume that the batch file is named example.sh. To submit an SBATCH script, type the following command:

sbatch example.sh

Output

Submitted batch job 9932