Showing Job Statistics
Showing Information on Jobs
The sacct
command
To view the statistics of a completed job use the sacct
command.
syntax: sacct -j <job id>
or sacct -j <job id> --format=<params>
$ sacct -j 215578
Output
~~~~~~ JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
215578 maxFib normal nmsu 1 COMPLETED 0:0
215578.batch batch nmsu 1 COMPLETED 0:0
215578.0 python nmsu 1 COMPLETED 0:0
You can get statistics (accounting data) on completed jobs by passing either the jobID or username flags. Here, the command sacct -j 215578
is used to show statistics about the completed job. This shows information such as: the partition your job executed on, the account, and number of allocated CPUS per job steps. Also, the exit code and status (Completed, Pending, Failed, so on) for all jobs and job steps were displayed.
The first column describes the job IDs of the several job steps. Rows 1 and 2 are default job steps, with the first being the job script as a whole and the second being the SBATCH directives. The third row 215578.0
contains the information about the first process which ran using srun. Assuming if there are more srun commands the sub job IDs would increment as follows 215578.1 215578.2
.
-
You can also pass other parameters to the
sacct
command to retrieve extra details about the job.$ sacct -j 215578 --format=JobID,Start,End,Elapsed,NCPUS
Output
~~~~~~~JobID Start End Elapsed NCPUS ------------ ------------------- ------------------- ---------- ---------- 215578 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1 215578.batch 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1 215578.0 2020-09-04T09:53:11 2020-09-04T09:53:11 00:00:00 1
On the output above, you can see information about the Start and End timestamp, number of CPUs, and the Elapsed time of the job.
-
You can also retrieve information about jobs that ran at a given period of time by passing a start or end time flags like so
sacct --starttime=2020-09-01 --format=jobid,jobname,exit,group,maxrss,comment,partition,nnodes,ncpus
Output
~~~~~~~JobID JobName ExitCode Group MaxRSS Partition NNodes AllocCPUS State ------------ ---------- -------- --------- ---------- ---------- -------- ---------- ---------- 213974 test 0:0 vaduaka normal 1 3 COMPLETED 213974.batch batch 0:0 0 1 3 COMPLETED 213974.exte+ extern 0:0 0 1 3 COMPLETED 213974.0 python 0:0 0 1 1 COMPLETED 213974.1 python 0:0 0 1 1 COMPLETED 213974.2 python 0:0 0 1 1 COMPLETED 215576 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215576.batch batch 0:0 0 1 1 COMPLETED 215576.exte+ extern 0:0 88K 1 1 COMPLETED 215576.0 python 0:0 0 1 1 COMPLETED 215577 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215577.batch batch 0:0 0 1 1 COMPLETED 215577.exte+ extern 0:0 84K 1 1 COMPLETED 215577.0 python 0:0 0 1 1 COMPLETED 215578 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215578.batch batch 0:0 0 1 1 COMPLETED 215578.exte+ extern 0:0 0 1 1 COMPLETED 215578.0 python 0:0 0 1 1 COMPLETED 215665 maxFib 0:0 vaduaka normal 1 1 COMPLETED 215665.batch batch 0:0 0 1 1 COMPLETED 215665.exte+ extern 0:0 92K 1 1 COMPLETED 215665.0 python 0:0 0 1 1 COMPLETED
On the output above, you can see information about the job steps that was carried out throughout the entirety of the job. Also, the name of the job, exit-code, user group, the maximum resident set size of all tasks in job (size of RAM used at each task) were displayed. Furthermore, partition, number of nodes used, number of allocated CPUs, and state of the job were also shown.
For more details about using the
sacct
please use theman sacct
command.$ man sacct
To view a list of possible parameters you could pass to retrieve specific job details, use the sacct -e command.
$ sacct -e
Output
Account AdminComment AllocCPUS AllocGRES AllocNodes AllocTRES AssocID AveCPU AveCPUFreq AveDiskRead AveDiskWrite AvePages AveRSS AveVMSize BlockID Cluster Comment Constraints ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW DBIndex DerivedExitCode Elapsed ElapsedRaw Eligible End ExitCode Flags GID Group JobID JobIDRaw JobName Layout MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask MaxVMSize MaxVMSizeNode MaxVMSizeTask McsLabel MinCPU MinCPUNode MinCPUTask NCPUS NNodes NodeList NTasks Priority Partition QOS QOSRAW Reason ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ReqCPUS ReqGRES ReqMem ReqNodes ReqTRES Reservation ReservationId Reserved ResvCPU ResvCPURAW Start State Submit Suspended SystemCPU SystemComment Timelimit TimelimitRaw TotalCPU TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot UID User UserCPU WCKey WCKeyID WorkDir