How to use slurm on Discovery — sbatch

To submit a job to the queue, use sbatch script generator or follow instructions below.

sbatch is used to submit a job script for later execution. The script will typically contain one or more commands to launch parallel tasks, use sbatch -h for more information.

NOTE: slurm is sloppy with its word usage.  For the computer literate, each node consists of 2 CPUs with Y number of cores that can be threaded resulting in 2xYx2 threads.  For slurm, each node has 2xYx2 CPUs (also referred to as cores)…  This can cause a lot of confusion for those who understand the differences between the definitions of CPU, core, and thread.  Please understand that the true thread is referred to as CPU by slurm. (This may cause you a headache, and for that we blame the slurm developers).

Also note: Some programs don’t recognize threads.  In this case, if you want to occupy the whole node, you will need to reserve the maximum number of threads, but your program will only read the number of cores available (# threads/slurm CPUs divided by 2).  For example, Matlab doesn’t recognize threads, so if you reserve 48 threads (–cpus-per-task 48), when you check the number of slaves for the program Matlab will return 24.
Below are examples how to write scripts and submit to slurm using sbatch:

  • Example 1 — Simple submission; “boiler plate” example
    • The first step in creating a batch job is to write a batch file. This is a simple shell script that tells Slurm how and what to do for your job. In the example below, let’s assume the batch file is named example1.sh.
      #!/bin/sh
      #SBATCH --job-name myJobName ##name that will show up in the queue
      #SBATCH --output myJobName.o%j ##filename of the output; the "%j" will append the jobID to the end of the name making the output files unique despite the sane job name; default is slurm-[jobID].out
      #SBATCH --partition normal ##the partition to run in [options: normal, gpu, debug]; default = normal
      #SBATCH	--nodes 1 ##number of nodes to use; default = 1
      #SBATCH --ntasks 3 ##number of tasks (analyses) to run; default = 1
      #SBATCH --cpus-per-task 16 ##the number of threads the code will use; default = 1
      #SBATCH	--time 0-00:05:00 ##time for analysis (day-hour:min:sec) -- Max walltime will vary by partition; time formats are: "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds"
      #SBATCH --mail-user yourID@nmsu.edu   ##your email address
      #SBATCH --mail-type BEGIN ##slurm will email you when your job starts
      #SBATCH --mail-type END ##slurm will email you when your job ends
      #SBATCH --mail-type FAIL ##slurm will email you when your job fails
      #SBATCH --get-user-env ##passes along environmental settings 
      
      module load myprogram
      
      ##-- After all the modules/programs needed are called, put in your code
      ## myprogram input4myprogram 
      ## Ex:
      /bin/hostname

      sbatch scripts are unique in how they are read. In shell scripts, any line that starts with # is considered a comment. Any comment that starts with the word SBATCH in all caps is treated as a command by slurm. This means that to comment out a slurm command, put a second # at the beginning of the line (ex: #SBATCH means slurm command, ##SBATCH means skip). Note: the “cpus-per-task” value times the “ntasks” value needs to be within the range of the “nodes” thread value.  ex: 1 node is a max of 48 threads, so “cpus-per-task” value times the “ntasks” must be less than 48, otherwise you will get back an error.)

    • To submit a SBATCH script to slurm, simply type “sbatch [inputSbatchScript].sh”.
      [user@Discovery ~]$ sbatch example1.sh
      Submitted batch job 253296
      [user@Discovery ~]$ ls
      example1.sh  myJobName.o253296

      From the example a file named myJobName.o253296 has been created. This is the output from our job, slurm always creates an output file for batch jobs when they start to execute. By default the output file will be named slurm-<job #>.out, unless otherwise specified.

    • Looking into the output file:
      [user@Discovery ~]$ cat myJobName.o253296
      Discovery-c12

      The /bin/hostname command on a Unix system prints the name of system its being run on. So we can see that Slurm ran our job the compute node named Discovery-12.

  • Example 2 – Picking a queue

    Discovery has three different queues, normal, gpu, and debug.

    The normal queue has a limit of 12 nodes (496 threads), but getting this number of resources may take a while to get. The normal queue is the default queue, if you don’t specify a queue your job will be queued in the normal queue. The gpu queue gives you access to the GPU and is the least used at the moment.  The debug queue has access to 2 nodes (Discovery-g1 and Discovery-c1), but a much shortened walltime.

  • Example 3 – How to use a program that you installed and run programs in series
    #!/bin/sh
    #SBATCH --ntasks 1
    #SBATCH --nodes 1
    
    module load matlab/r2015a
    
    cd somePath/nextPartOfPath
    
    matlab <input for matlab>
    ./path2myProgram/myFavProgram <input for my program>
    
    

    How to read this script: In this example, we request 1 node and 1 task.  The default is 1 “cpu-per-task” (read 1 thread per task), so we have requested 1 thread on 1 node. As these are the only parameters we specified, the others are default, including the job name, job output, etc.  Also note that we have not requested sulrm to let us know about the state of our submission, so when we start the run or if we error out, we won’t know until we log back onto Discovery.

    What this script does: 1, Loads the matlab module. 2, Changes directories.  The sbatch command remembers what directory you were in when submitting the job.  To maintain good notes, it’s useful to write directory changes/staring locations into the sbatch script. 3, Runs a script using matlab. 4, Runs myFavProgram. To run a program installed in my environment (ie somewhere within my home directory), you need to preface it with “./” and give the path to the executable. You don’t have to include “/home/userID/” and can start the path from after this location. Note: This script is designed such that the matlab process will run before myFavProgram does (in series).  This also means that if matlab errors out, the script will not get to myFavProgram.  This type of setup is good if the second one is dependent upon one before (ex: the output of program 1 is the input for program 2).

  • Example 4 – How to run programs in parallel (very useful if you need lots of small-resource, independent jobs run)
    #!/bin/sh
    #SBATCH --nodes 1
    #SBATCH --ntasks 2 
    #SBATCH --cpus-per-task 24 
    
    module load matlab/r2015a
    
    srun --preserve-env --multi-prog ./myfile.conf
    

    How to read this script: In this example, we request 1 node, 2 tasks and 24 cpus-per-task (threads).  This means we have requested 32 threads on 1 node and plan on running 3 tasks. As these are the only parameters we specified, the others are default.  Note: We are running 2 programs/analyses for the price of 1 job.

    What this script does: 1, Loads the matlab module. 2, Calls srun on our conf file.  srun allows us to use the “–preserve-env” flag with means whatever is in our environment, including parameters we might have changed, are preserved.  The “–multi-prog” flag tells srun that we will be calling multiple programs.  Calling srun like this means that we will be running our two programs in parallel.  In this case the two programs will be started at the same time, but will finish (or error out), independent of each other.  The resources of the programs must match.  In this case, both programs will get 24 threads to work with.  If one needed 24 and the other 16, we would need to submit two independent sbatch jobs.

    myfile.conf:

    0 matlab <path2inputfile/inputFile>
    1 ./path2myProgram/myFavProgram <input for my program>
    

    The above is the myfile.conf. The first column is the number of the task (in Linux/Unix, always start with 0, not 1!), the second column is the program to be run, and the third column is the input for the program. If you have a way of monitoring the output of your programs (ex: things are being continuously written to an output file), you can watch both files increase in size. If the programs are run in series (see: Example 3) you will see the first file increase in size until complete and then the second one will appear and grow.

  • Example 5
    #!/bin/sh
    #SBATCH -n 1
    #SBATCH -N 1
    srun tar zxf julia-0.3.11.tar.gz
    echo "prefix=/software/julia-0.3.11" > julia/Make.user
    cd julia
    srun make

    The example is a batch file used to compile the Julia programming language on Discovery. The -n 1 tells Slurm we’re going to have one task per job step. Each time we invoke srun, that is a job step. If we had set -n 2, then srun would start the tar command twice because we asked for two tasks per step. In this case the tar command will be run on one compute node. The next two commands don’t have the srun prefix. This is an important step, anything they do will affect the environment for any later commands run via srun. For example, the step “cd julia” changes the current working directory to “julia” This directory was created by the tar command. By default, batch jobs will start in whatever directory you were in when you issued the batch command. If we change the current working directory, then when srun starts the make command, that will be run from the julia directory.

  • Example 6 – MPI jobs

    In this example we’ll run a MPI program on 20 cores.

    #!/bin/sh
    #SBATCH -n 20
    #SBATCH --mpi=pmi2
    #SBATCH -o myoutputfile.txt
    module load mpi/mpich-x86_64
    mpirun -np 20 mpiprogram < inputfile.txt

    First note that we’ve asked for 20 tasks at each job step (anything that starts with srun). For MPI programs it is one CPU per process, so -n 20 will create 20 processes using 20 cores. Next we ask slurm to use the “pmi2” MPI type. This is the appropriate type for MPICH programs. Finally we tell Slurm that the output should be written to “myoutputfile.txt” instead of “slurm-<job #>.txt”. Next we use the module command (more details about the module command are provided later in this document) to load the MPICH environment. This will adjust your PATH, and LD_LIBRARY_PATH to include MPICH. Those settings will be passed on to anything run via srun. Finally we launch our program with the mpirun command.

  • Example 7 – OpenMP and other multithreaded jobs

    OpenMP differs from MPI jobs in that we have a single process using multiple threads. So we want to tell Slurm to give us as many cores as necessary, up to the maximum number of cores in a single node, but only launch a single process.

    #!/bin/sh
    #SBATCH -n 1
    #SBATCH -N 1
    #SBATCH -c 16
    ./multithreaded_program

    In this batch file, we ask Slurm for one node (-N 1), one process/task (-n 1), and 16 cores assigned to that task (-c 16).

    #!/bin/sh
    #SBATCH -n 2
    #SBATCH -N 1
    #SBATCH -c 8
    srun ./multithreaded_program

    In this case, we’ve asked slurm to run two instances of our program (-n 2), each getting 8 cores (-c 8). Since each compute node has 16 cores, and we specified -N 1 they will run on the same node. If we didn’t specify -N 1, Slurm would be free to run the two processes on different nodes.