Partition QoS vs User QoS
Partition QoS
For every partition, there is a Quality of Service
which has different parameters like MaxJobs, MaxSubmitJobs, etc defined for the partition. This has an effect on the jobs submitted by the user on the partition. Run the below scontrol
command for normal partition.
scontrol show partition normal
Output:
PartitionName=normal
AllowGroups=discovery-users_normal,pkgmgr AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=p-normal
DefaultTime=00:01:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=7-01:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=discovery-c[1-15]
PriorityJobFactor=1 PriorityTier=25 RootOnly=NO ReqResv=NO OverSubscribe=FORCE:1
OverTimeLimit=NONE PreemptMode=SUSPEND
State=UP TotalCPUs=624 TotalNodes=15 SelectTypeParameters=NONE
JobDefaults=DefCpuPerGPU=4
DefMemPerCPU=512 MaxMemPerNode=UNLIMITED
The above output shows that the QoS for the normal partition is ` QoS = p-normal `. Run the below command to find information about the p-normal QoS.
Syntax: sacctmgr show qos where name =<qos-name> format=<headername1,headername2,….n>
sacctmgr show qos where name=p-normal format=name,maxJobs,maxSubmit
Output:
~~~~~~Name MaxJobs MaxSubmit
---------- ------- ---------
p-normal 10 20
The output shows some parameters which are defined for the QoS p-normal
for the normal partition. It shows that MaxJobs limit is 10
which means you can have two jobs actively running. The MaxSubmit limit is 20
which means that you can submit a maximum of 20 jobs to the normal partition. However, 10 jobs will be in the running state and 10 jobs will be in the queue.
In similar manner, there is a different QoS defined for every partition in HPC. The below table shows the QoS information for the other partitions.
QoS | MaxJobs | MaxSubmit | Flags |
---|---|---|---|
p-normal |
10 |
20 |
DenyOnLimit |
p-gpu |
2 |
4 |
DenyOnLimit |
p-interactive |
3 |
3 |
DenyOnLimit |
p-backfill |
10 |
20 |
DenyOnLimit |
User QoS
Similar to partition QoS, there is a User QoS available in Supercomputing which can be attached to the user. But in Discovery, there is no User QoS available or defined. For more information about the QoS, refer to the following page https://slurm.schedmd.com/qos.html