Job Array

Job arrays allow you to leverage SLURM’s ability to create multiple jobs from one script. For example, instead of having 5 submission scripts to run the same job step with different arguments, you can have one script to run the 5 job steps at once. Many of the situations where this is useful include:

  1. Running the same analysis program multiple times against different files or data sets.

  2. Running the same program multiple times with arguments.

  3. Running a single program multiple times to analyzing a single data file.

For more details, visit the Slurm job Array Support.

How to Use Job Arrays

To use job arrays in a submission script, you need to use the SLURM directive --array. You need to add SBATCH --array=x-y to the script, where x is the index of the first job and y is the index of the last one. The task id range specified in the option argument may be:

  1. Submit a job array with comma separated index values: #SBATCH --array=2,4,6 # (2, 4, 6) 3 jobs

  2. Submit a job array with index values form x-y: #SBATCH --array=1-15 # (1, 2, 3, …, 15) 15 jobs

  3. Submit a job array with index values between 10 and 20 with a step size of 2 : #SBATCH --array=10-20:2 # (10, 12, 14, … 20) 6 jobs

Job array size is limited to the partition maximum submitted jobs. For example, you can’t have a job array range (1-22) to be submitted on the normal partition, because the maximum submitted jobs is 20.

The maximum number of concurrent running jobs from the job array can be limited using the % separator. For example, #SBATCH --array=1-20%4 limits the number of simultaneously running tasks from this job array to 4.

Environment Variables

Slurm Environment Variable Description

SLURM_ARRAY_JOB_ID

The first job ID of the array.

SLURM_ARRAY_TASK_ID

The job array index value.

SLURM_ARRAY_TASK_COUNT

The number of tasks in the job array.

SLURM_ARRAY_TASK_MAX

The highest job array index value.

SLURM_ARRAY_TASK_MIN

the lowest job array index value

For example, #SBATCH --array=1-3 will create job array with three jobs. The resulting variables set for each job may look something like this:

# 1
SLURM_JOB_ID=36
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1

# 2
SLURM_JOB_ID=37
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=2
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1

# 3
SLURM_JOB_ID=38
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=3
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1

Output Files

#SBATCH --array=x-y will create two variables %A and %a which may be used to name the files that catch stdin, stderr and stdout. %A will be replaced by the value of SLURM_ARRAY_JOB_ID and %a will be replaced by the value of SLURM_ARRAY_TASK_ID. For example,

#SBATCH --array=1-3
#SBATCH --output=array_example_%A_%a.out
#SBATCH --error=array_example_%A_%a.err

will create 3 output files and 3 error files. If the array’s job id is 101, the file array_example_101_taskID will be written for the three tasks of the job array (array_example_101_1.out, array_example_101_2.out, etc…​).

Squeue Command

The squeue command reports all jobs associated with a single job array on one line, based on job state, and uses an expression to indicate the array_task_id values (11-20).

$squeue -u user
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
     24628_[11-20]    normal    array    user PD       0:00      1 (QOSMaxJobsPerUserLimit)
           24628_1    normal    array    user  R       0:03      1 discovery-c2
           24628_2    normal    array    user  R       0:03      1 discovery-c3
           24628_3    normal    array    user  R       0:03      1 discovery-c15
           24628_4    normal    array    user  R       0:03      1 discovery-c15
           24628_5    normal    array    user  R       0:03      1 discovery-c15
           24628_6    normal    array    user  R       0:03      1 discovery-c15
           24628_7    normal    array    user  R       0:03      1 discovery-c15
           24628_8    normal    array    user  R       0:03      1 discovery-c15
           24628_9    normal    array    user  R       0:03      1 discovery-c15
          24628_10    normal    array    user  R       0:03      1 discovery-c15

Use the --array or -r option with the squeue command to display one job/task per line.

$squeue -u user -r
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          24668_11    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_12    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_13    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_14    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_15    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_16    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_17    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_18    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_19    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
          24668_20    normal    array    user  PD       0:00      1 (QOSMaxJobsPerUserLimit)
           24668_1    normal    array    user  R        0:07      1 discovery-c2
           24668_2    normal    array    user  R        0:07      1 discovery-c3
           24668_3    normal    array    user  R        0:07      1 discovery-c15
           24668_4    normal    array    user  R        0:07      1 discovery-c15
           24668_5    normal    array    user  R        0:07      1 discovery-c15
           24668_6    normal    array    user  R        0:07      1 discovery-c15
           24668_7    normal    array    user  R        0:07      1 discovery-c15
           24668_8    normal    array    user  R        0:07      1 discovery-c15
           24668_9    normal    array    user  R        0:07      1 discovery-c15
          24668_10    normal    array    user  R        0:07      1 discovery-c15

scancel Command

You can use the scancel command to cancel all or specific elements of a job array.

  • Cancel an individual job in a job array:

    scancel 24668_7
  • Cancel a subset job of a job array:

    scancel 24668_[7-10]
  • Cancel the complete job array:

    scancel 24668

Examples

Example1

The following job submission script, SLURM will create 20 array tasks, each requesting 2 CPUs and 1G memory because each task will inherit the resource directives specified at the top of the script. Each task sleeps for 10 seconds and then prints the task ID for the related task in an output file for the related task. In other words, the script will create 20 output files one for each task.

#!/bin/bash

#SBATCH --job-name=array
#SBATCH --array=1-20
#SBATCH --time=01:00:00
#SBATCH --partition=normal
#SBATCH --ntasks=1
#SBATCH --mem=1G
#SBATCH --output=array_%A-%a.out

# Print the task id.
srun bash -c "sleep 10; echo 'My SLURM_ARRAY_TASK_ID:' $SLURM_ARRAY_TASK_ID"

List all files in your current working directory.

array_24809-10.out  array_24809-16.out  array_24809-2.out  array_24809-8.out
array_24809-11.out  array_24809-17.out  array_24809-3.out  array_24809-9.out
array_24809-12.out  array_24809-18.out  array_24809-4.out  subJobArray.sh
array_24809-13.out  array_24809-19.out  array_24809-5.out
array_24809-14.out  array_24809-1.out   array_24809-6.out
array_24809-15.out  array_24809-20.out  array_24809-7.out

To check the content of the output file of the task 24809_14, print the content of the file array_24809-14.out.

$cat array_24809-14.out
My SLURM_ARRAY_TASK_ID: 14

To cancel the task 8 while it’s running, you simply run:

scancel 24809_8

To cancel the complete job array, run:

scancel 24809