How to use slurm on Joker

Slurm is the job scheduler currently implemented on Joker. All users are required to use slurm to submit their jobs to utilize the compute nodes for program execution. The submitted jobs may be put on hold and not start execution right away, but the system is configured to ensure all users have fair access to the available resources and those submissions will run as soon as their position in the queue and the available resources have been met.

There are several basic commands that user can use to manage their jobs:

  1. sacct is used to report job or job step accounting information about active or completed jobs; use sacct -h for more information.
  2. sinfo reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options; use sinfo –help for more information.
  3. squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order; use squeue –help for more information.
  4. scancel is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step; use scancel –help for more information.

 

To submit a job to the queue, use sbatch.

To run a job interactively, use srun. Be aware: srun can be called inside of an sbatch script, but it doesn’t function in the same way.

Some frequently asked questions can be found here. A more detail tutorial is provided by the University of Utah.