Job Array
Job arrays allow you to leverage SLURM’s ability to create multiple jobs from one script. For example, instead of having 5 submission scripts to run the same job step with different arguments, you can have one script to run the 5 job steps at once. Many of the situations where this is useful include:
-
Running the same analysis program multiple times against different files or data sets.
-
Running the same program multiple times with arguments.
-
Running a single program multiple times to analyzing a single data file.
For more details, visit the Slurm job Array Support.
How to Use Job Arrays
To use job arrays in a submission script, you need to use the SLURM directive --array
. You need to add SBATCH --array=x-y
to the script, where x
is the index of the first job and y
is the index of the last one.
The task id range specified in the option argument may be:
-
Submit a job array with comma separated index values: #SBATCH --array=2,4,6 # (2, 4, 6) 3 jobs
-
Submit a job array with index values form x-y: #SBATCH --array=1-15 # (1, 2, 3, …, 15) 15 jobs
-
Submit a job array with index values between 10 and 20 with a step size of 2 : #SBATCH --array=10-20:2 # (10, 12, 14, … 20) 6 jobs
Job array size is limited to the partition maximum submitted jobs. For example, you can’t have a job array range (1-22) to be submitted on the normal partition, because the maximum submitted jobs is 20. |
The maximum number of concurrent running jobs from the job array can be limited using the %
separator. For example, #SBATCH --array=1-20%4
limits the number of simultaneously running tasks from this job array to 4.
Environment Variables
Slurm Environment Variable | Description |
---|---|
SLURM_ARRAY_JOB_ID |
The first job ID of the array. |
SLURM_ARRAY_TASK_ID |
The job array index value. |
SLURM_ARRAY_TASK_COUNT |
The number of tasks in the job array. |
SLURM_ARRAY_TASK_MAX |
The highest job array index value. |
SLURM_ARRAY_TASK_MIN |
the lowest job array index value |
For example, #SBATCH --array=1-3
will create job array with three
jobs. The resulting variables set for each job may look something like this:
# 1
SLURM_JOB_ID=36
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
# 2
SLURM_JOB_ID=37
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=2
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
# 3
SLURM_JOB_ID=38
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=3
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
Output Files
#SBATCH --array=x-y
will create two variables %A
and %a
which may be used to name the files that catch stdin, stderr and stdout. %A
will be replaced by the value of SLURM_ARRAY_JOB_ID
and %a
will be replaced by the value of SLURM_ARRAY_TASK_ID
.
For example,
#SBATCH --array=1-3
#SBATCH --output=array_example_%A_%a.out
#SBATCH --error=array_example_%A_%a.err
will create 3 output files and 3 error files. If the array’s job id is 101, the file array_example_101_taskID
will be written for the three tasks of the job array (array_example_101_1.out
, array_example_101_2.out
, etc…).
Squeue Command
The squeue command reports all jobs associated with a single job array on one line, based on job state, and uses an expression to indicate the array_task_id
values (11-20).
$squeue -u user
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
24628_[11-20] normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24628_1 normal array user R 0:03 1 discovery-c2
24628_2 normal array user R 0:03 1 discovery-c3
24628_3 normal array user R 0:03 1 discovery-c15
24628_4 normal array user R 0:03 1 discovery-c15
24628_5 normal array user R 0:03 1 discovery-c15
24628_6 normal array user R 0:03 1 discovery-c15
24628_7 normal array user R 0:03 1 discovery-c15
24628_8 normal array user R 0:03 1 discovery-c15
24628_9 normal array user R 0:03 1 discovery-c15
24628_10 normal array user R 0:03 1 discovery-c15
Use the --array
or -r
option with the squeue command to display one job/task per line.
$squeue -u user -r
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
24668_11 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_12 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_13 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_14 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_15 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_16 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_17 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_18 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_19 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_20 normal array user PD 0:00 1 (QOSMaxJobsPerUserLimit)
24668_1 normal array user R 0:07 1 discovery-c2
24668_2 normal array user R 0:07 1 discovery-c3
24668_3 normal array user R 0:07 1 discovery-c15
24668_4 normal array user R 0:07 1 discovery-c15
24668_5 normal array user R 0:07 1 discovery-c15
24668_6 normal array user R 0:07 1 discovery-c15
24668_7 normal array user R 0:07 1 discovery-c15
24668_8 normal array user R 0:07 1 discovery-c15
24668_9 normal array user R 0:07 1 discovery-c15
24668_10 normal array user R 0:07 1 discovery-c15
scancel
Command
You can use the scancel
command to cancel all or specific elements of a job array.
-
Cancel an individual job in a job array:
scancel 24668_7
-
Cancel a subset job of a job array:
scancel 24668_[7-10]
-
Cancel the complete job array:
scancel 24668
Examples
Example1
The following job submission script, SLURM will create 20 array tasks, each requesting 2 CPUs and 1G memory because each task will inherit the resource directives specified at the top of the script. Each task sleeps for 10 seconds and then prints the task ID for the related task in an output file for the related task. In other words, the script will create 20 output files one for each task.
#!/bin/bash
#SBATCH --job-name=array
#SBATCH --array=1-20
#SBATCH --time=01:00:00
#SBATCH --partition=normal
#SBATCH --ntasks=1
#SBATCH --mem=1G
#SBATCH --output=array_%A-%a.out
# Print the task id.
srun bash -c "sleep 10; echo 'My SLURM_ARRAY_TASK_ID:' $SLURM_ARRAY_TASK_ID"
List all files in your current working directory.
array_24809-10.out array_24809-16.out array_24809-2.out array_24809-8.out
array_24809-11.out array_24809-17.out array_24809-3.out array_24809-9.out
array_24809-12.out array_24809-18.out array_24809-4.out subJobArray.sh
array_24809-13.out array_24809-19.out array_24809-5.out
array_24809-14.out array_24809-1.out array_24809-6.out
array_24809-15.out array_24809-20.out array_24809-7.out
To check the content of the output file of the task 24809_14
, print the content of the file array_24809-14.out
.
$cat array_24809-14.out
My SLURM_ARRAY_TASK_ID: 14
To cancel the task 8 while it’s running, you simply run:
scancel 24809_8
To cancel the complete job array, run:
scancel 24809