Partition Details
What are Partitions?
Partitions are work queues that have a set of rules/policies and computational nodes included in it to run the jobs. The available partitions are normal, interactive, backfill
and so on. Run the below command to find the available list of partitions in discovery.
Syntax: sinfo
$ sinfo
Output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 7-01:00:00 30 mix discovery-c[2-14,16-25,28,31],discovery-g[2,5,8-10]
normal* up 7-01:00:00 13 alloc discovery-c[15,26-27,29-30,32-33,37-38],discovery-g[1,6,11,16]
normal* up 7-01:00:00 4 idle discovery-g[3-4,12-13]
interactive up 1-01:00:00 1 mix discovery-c34
interactive up 1-01:00:00 3 idle discovery-c35,discovery-g[14-15]
backfill up 14-02:00:0 30 mix discovery-c[2-14,16-25,28,31],discovery-g[2,5,8-10]
backfill up 14-02:00:0 15 alloc discovery-c[15,26-27,29-30,32-33,36-38],discovery-g[1,6-7,11,16]
backfill up 14-02:00:0 4 idle discovery-g[3-4,12-13]
The output shows the list of all the available partitions in discovery as of February 2024.
The state alloc
denotes that the nodes are allocated for the jobs.
The state mix
implies that some CPUs in the nodes are allocated while others remain idle.
Some partitions in discovery are condo partitions which are restricted to certain researchers and lab groups. |
normal
It’s the default queue. Some important information about the normal partition can be found below.
Parameter | values |
---|---|
Maximum Walltime |
7–01:00:00 (7 days and 1 hour) |
Nodes |
discovery-c[1-33, 37-38], discovery-g[1-6, 8-13, 16] |
Total Nodes |
33 |
Maximum Jobs(Running) |
10 |
Maximum Submitted Jobs |
20 |
Maximum jobs is the highest number of jobs that can actively run at a time in a partition. Maximum Submitted Jobs is the maximum number of jobs you can submit to a partition. In normal partition, you can submit 20 jobs but only 10 jobs will be actively running and the remaining 10 jobs will be in the queue. |
interactive
This partition is ideal for running interactive jobs.
Parameter | values |
---|---|
Maximum wall-time |
1–01:00:00 (1 day and 1 hour) |
Nodes |
discovery-c[34–35], discovery-g[14–15] |
Total Nodes |
4 |
Maximum Jobs(Running) |
3 |
Maximum Submitted Jobs |
3 |
Maximum CPU Per Job |
16 |
Maximum Memory per Job |
64G |
backfill
This partition scavenges nodes from all partitions to use. It has the lowest priority of all the partitions. The jobs submitted to the backfill partition may be stopped and requeued multiple times depending on the demand of high priority jobs.
Parameter | values |
---|---|
Maximum wall-time |
14–02:00:00 (14 days and 2 hours) |
Nodes |
discovery-c[1-33, 36-38], discovery-g[1-13, 16] |
Total Nodes |
54 |
Maximum Jobs(Running) |
10 |
Maximum Submitted Jobs |
20 |
HPC team are exploring to get the best out of the backfill queue and your valuable suggestions are always welcome. |
Condo Partitions
Some partitions in Discovery are condo partitions and are restricted for certain team/research group. New partitions are getting added and the below table shows the list of the condo partitions.
Partition | Owned By |
---|---|
|
Dr. Brungard’s Lab |
|
Dr. Gross’s Lab |
|
Dr. Boucheron’s Lab |
Details about each condo partition are as follows:
Partition | Details | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|
||||||||||
|
|
||||||||||
|
|
As of the 2019 update of the cluster, new partitions have been added. Most of these are restricted use and users will only see partitions that they have access to. |
Wall time vs. CPU time
CPU time isn’t wall time (as in a clock on the wall). The CPU time is the total execution time or runtime for which the CPU was dedicated to a process. The CPU must service many processes every second, not just yours, so your process only gets small task slices between processing other requests. Each of those small task slices is counted toward the total execution time. However, the time between processing your requests, while the CPU is processing someone else’s request, is NOT
counted towards your CPU time.
For example, wall time ("real" time) can be equal to the sum of CPU time ("user" time), system time (kernel mode), and time-sliced to other processes.
Partition |
Wall Time |
# of Nodes |
Memory (GB) |
|
Normal |
7d 1h |
15 |
64/128 |
|
GPU |
7d 1h |
1 |
64 |
|
backfill |
14d 2h |
38 |
64/128/192/256 |
|
interactive |
24 hour |
4 |
384/512 |