Showing Job Statistics
Showing Information on Jobs
The sacct command
To view the statistics of a completed job use the sacct command.
syntax: sacct -j <job id> or sacct -j <job id> --format=<params>
$ sacct -j 215578Output
~~~~~~ JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
215578           maxFib     normal       nmsu          1  COMPLETED      0:0
215578.batch      batch                  nmsu          1  COMPLETED      0:0
215578.0         python                  nmsu          1  COMPLETED      0:0You can get statistics (accounting data) on completed jobs by passing either the jobID or username flags. Here, the command sacct -j 215578 is used to show statistics about the completed job. This shows information such as: the partition your job executed on, the account, and number of allocated CPUS per job steps. Also, the exit code and status (Completed, Pending, Failed, so on) for all jobs and job steps were displayed.
The first column describes the job IDs of the several job steps. Rows 1 and 2 are default job steps, with the first being the job script as a whole and the second being the SBATCH directives. The third row 215578.0 contains the information about the first process which ran using srun. Assuming if there are more srun commands the sub job IDs would increment as follows 215578.1 215578.2.
- 
You can also pass other parameters to the sacctcommand to retrieve extra details about the job.$ sacct -j 215578 --format=JobID,Start,End,Elapsed,NCPUSOutput ~~~~~~~JobID Start End Elapsed NCPUS ------------ ------------------- ------------------- ---------- ---------- 215578 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1 215578.batch 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1 215578.0 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1On the output above, you can see information about the Start and End timestamp, number of CPUs, and the Elapsed time of the job. 
- 
You can also retrieve information about jobs that ran at a given period of time by passing a start or end time flags like so sacct --starttime=2020-09-01 --format=jobid,jobname,exit,group,maxrss,comment,partition,nnodes,ncpusOutput ~~~~~~~JobID JobName ExitCode Group MaxRSS Partition NNodes AllocCPUS State ------------ ---------- -------- --------- ---------- ---------- -------- ---------- ---------- 213974 test 0:0 vaduaka normal 1 3 COMPLETED 213974.batch batch 0:0 0 1 3 COMPLETED 213974.exte+ extern 0:0 0 1 3 COMPLETED 213974.0 python 0:0 0 1 1 COMPLETED 213974.1 python 0:0 0 1 1 COMPLETED 213974.2 python 0:0 0 1 1 COMPLETED 215576 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215576.batch batch 0:0 0 1 1 COMPLETED 215576.exte+ extern 0:0 88K 1 1 COMPLETED 215576.0 python 0:0 0 1 1 COMPLETED 215577 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215577.batch batch 0:0 0 1 1 COMPLETED 215577.exte+ extern 0:0 84K 1 1 COMPLETED 215577.0 python 0:0 0 1 1 COMPLETED 215578 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215578.batch batch 0:0 0 1 1 COMPLETED 215578.exte+ extern 0:0 0 1 1 COMPLETED 215578.0 python 0:0 0 1 1 COMPLETED 215665 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215665.batch batch 0:0 0 1 1 COMPLETED 215665.exte+ extern 0:0 92K 1 1 COMPLETED 215665.0 python 0:0 0 1 1 COMPLETEDOn the output above, you can see information about the job steps that was carried out throughout the entirety of the job. Also, the name of the job, exit-code, user group, the maximum resident set size of all tasks in job (size of RAM used at each task) were displayed. Furthermore, partition, number of nodes used, number of allocated CPUs, and state of the job were also shown. For more details about using the sacctplease use theman sacctcommand.$ man sacctTo view a list of possible parameters you could pass to retrieve specific job details, use the sacct -e command. $ sacct -eOutput Account AdminComment AllocCPUS AllocGRES AllocNodes AllocTRES AssocID AveCPU AveCPUFreq AveDiskRead AveDiskWrite AvePages AveRSS AveVMSize BlockID Cluster Comment Constraints ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW DBIndex DerivedExitCode Elapsed ElapsedRaw Eligible End ExitCode Flags GID Group JobID JobIDRaw JobName Layout MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask MaxVMSize MaxVMSizeNode MaxVMSizeTask McsLabel MinCPU MinCPUNode MinCPUTask NCPUS NNodes NodeList NTasks Priority Partition QOS QOSRAW Reason ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ReqCPUS ReqGRES ReqMem ReqNodes ReqTRES Reservation ReservationId Reserved ResvCPU ResvCPURAW Start State Submit Suspended SystemCPU SystemComment Timelimit TimelimitRaw TotalCPU TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot UID User UserCPU WCKey WCKeyID WorkDir